Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 28;23(7):3701.
doi: 10.3390/ijms23073701.

Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome

Affiliations

Generative Adversarial Networks for Creating Synthetic Nucleic Acid Sequences of Cat Genome

Debapriya Hazra et al. Int J Mol Sci. .

Abstract

Nucleic acids are the basic units of deoxyribonucleic acid (DNA) sequencing. Every organism demonstrates different DNA sequences with specific nucleotides. It reveals the genetic information carried by a particular DNA segment. Nucleic acid sequencing expresses the evolutionary changes among organisms and revolutionizes disease diagnosis in animals. This paper proposes a generative adversarial networks (GAN) model to create synthetic nucleic acid sequences of the cat genome tuned to exhibit specific desired properties. We obtained the raw sequence data from Illumina next generation sequencing. Various data preprocessing steps were performed using Cutadapt and DADA2 tools. The processed data were fed to the GAN model that was designed following the architecture of Wasserstein GAN with gradient penalty (WGAN-GP). We introduced a predictor and an evaluator in our proposed GAN model to tune the synthetic sequences to acquire certain realistic properties. The predictor was built for extracting samples with a promoter sequence, and the evaluator was built for filtering samples that scored high for motif-matching. The filtered samples were then passed to the discriminator. We evaluated our model based on multiple metrics and demonstrated outputs for latent interpolation, latent complementation, and motif-matching. Evaluation results showed our proposed GAN model achieved 93.7% correlation with the original data and produced significant outcomes as compared to existing models for sequence generation.

Keywords: WGAN-GP; cat genome; generative adversarial networks; motif matching; nucleic acid sequences; promoter classification; promoter prediction; synthetic genome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest regarding the design of this study, analyses and writing of this manuscript.

Figures

Figure 1
Figure 1
Overview of the proposed model.
Figure 2
Figure 2
(a) Architecture of the Generator and (b) Architecture of the Discriminator.
Figure 3
Figure 3
(a) Proposed model for evaluator and (b) Proposed model for predictor.
Figure 4
Figure 4
AUC-ROC Curve for sigma-24 prediction.
Figure 5
Figure 5
AUC-ROC Curve for sigma-32 prediction.
Figure 6
Figure 6
AUC-ROC Curve for sigma-54 prediction.
Figure 7
Figure 7
Linear interpolation between two randomly selected points C1 and C2. (a) Generated sequences, (b) Linear interpolation.
Figure 8
Figure 8
Experimentation result for motif-matching.
Figure 9
Figure 9
Experimentation result for latent space-complementation. (a) Complementary nucleotide for G, (b) Complementary nucleotide for C, (c) Complementary nucleotide for T, (d) Complementary nucleotide for A.

Similar articles

Cited by

References

    1. Griffin H.G., Griffin A.M. DNA sequencing. Appl. Biochem. Biotechnol. 1993;38:147–159. doi: 10.1007/BF02916418. - DOI - PubMed
    1. Church G.M., Gilbert W. Genomic sequencing. Proc. Natl. Acad. Sci. USA. 1984;81:1991–1995. doi: 10.1073/pnas.81.7.1991. - DOI - PMC - PubMed
    1. Nouws S., Bogaerts B., Verhaegen B., Denayer S., Piérard D., Marchal K., Roosens N.H., Vanneste K., De Keersmaecker S.C. Impact of DNA extraction on whole genome sequencing analysis for characterization and relatedness of Shiga toxin-producing Escherichia coli isolates. Sci. Rep. 2020;10:14649. doi: 10.1038/s41598-020-71207-3. - DOI - PMC - PubMed
    1. Dias R., Torkamani A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. 2019;11:70. doi: 10.1186/s13073-019-0689-8. - DOI - PMC - PubMed
    1. Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014;27 doi: 10.48550/ARXIV.1406.2661. - DOI

MeSH terms

LinkOut - more resources