Towards creating longer genetic sequences with GANs: Generation in principal component space - BioInformatique Access content directly
Conference Papers Year : 2024

Towards creating longer genetic sequences with GANs: Generation in principal component space

Abstract

Synthetic data generation via generative modeling has recently become a prominent research field in genomics, with applications ranging from functional sequence design to high-quality, privacy-preserving artificial in silico genomes. Following a body of work on Artificial Genomes (AGs) created via various generative models trained with raw genomic input, we propose a conceptually different approach to address the issues of scalability and complexity of genomic data generation in very high dimensions. Our method combines dimensionality reduction, achieved by Principal Component Analysis (PCA), and a Generative Adversarial Network (GAN) learning in this reduced space. We compare the quality of AGs generated by our approach with AGs generated by the established models and report improvements on capturing population structure and linkage disequilibrium.
Fichier principal
Vignette du fichier
Szatkownik_PCA_GAN_for_genomics_camera_ready-2.pdf (4.27 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-04419057 , version 1 (26-01-2024)

Identifiers

  • HAL Id : hal-04419057 , version 1

Cite

Antoine Szatkownik, Cyril Furtlehner, Guillaume Charpiat, Burak Yelmen, Flora Jay. Towards creating longer genetic sequences with GANs: Generation in principal component space. MLCB 2023 - 18th Conference on Machine Learning in Computational Biology, Nov 2023, Seattle, United States. ⟨hal-04419057⟩
74 View
28 Download

Share

Gmail Facebook X LinkedIn More