Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 22;50(3):e16.
doi: 10.1093/nar/gkab1099.

From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data

Affiliations

From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data

Daniele Raimondi et al. Nucleic Acids Res. .

Abstract

In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation ≥0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches. We followed this interpretation approach to identify 36 novel genes that are likely to be associated with flowering traits, finding evidence for 6 of them in the existing literature.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The architecture of Galiana. The (17, N = 27 655) tensorial representation of each genome is used as input. Each 17-dimensional vector representing the gene i is processed by the G module. The iterated applications of the G module produces the (1, 27 655) representation of the mutational load on each gene, which is used as input for the fully connected module, which consists of a FF NN with two layers with 50 neurons each. Finally, the last layer implements the multi-task regression over the 288 real-valued phenotypes.
Figure 2.
Figure 2.
Scatter plots showing the prediction results for 16 of the 75 significantly predicted phenotypes.
Figure 3.
Figure 3.
Plots visualizing the predicted correlations for four selected phenotypes while considering only AT samples located in Sweden, Italy, Spain and Russia (represented by the columns, from left to right). This shows that Galiana predicts also intra-nation phenotype dynamics and not only among AT samples belonging to different countries.
Figure 4.
Figure 4.
Visual comparison between our multi-phenotypic predictions (Galiana) and the single-phenotype models (GWAS_NNi) based on the known associated genes retrieved from (40). Galiana outperforms the GWAS-based NN 77% of the times on the 35 phenotypes on which the GWAS_NN was applicable.

References

    1. Raimondi D., Simm J., Arany A., Fariselli P., Cleynen I., Moreau Y.. An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients. NAR Genomics Bioinformatics. 2020; 2:lqaa011. - PMC - PubMed
    1. Daneshjou R., Wang Y., Bromberg Y., Bovo S., Martelli P.L., Babbi G., Lena P.D., Casadio R., Edwards M., Gifford D.et al. .. Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges. Hum. Mutat. 2017; 38:1182–1192. - PMC - PubMed
    1. Fröhlich H., Balling R., Beerenwinkel N., Kohlbacher O., Kumar S., Lengauer T., Maathuis M.H., Moreau Y., Murphy S.A., Przytycka T.M.et al. .. From hype to reality: data science enabling personalized medicine. BMC Med. 2018; 16:1–15. - PMC - PubMed
    1. Manolio T.A., Collins F.S., Cox N.J., Goldstein D.B., Hindorff L.A., Hunter D.J., McCarthy M.I., Ramos E.M., Cardon L.R., Chakravarti A.et al. .. Finding the missing heritability of complex diseases. Nature. 2009; 461:747–753. - PMC - PubMed
    1. Moreau Y., Tranchevent L.-C.. Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat. Rev. Genet. 2012; 13:523–536. - PubMed

Publication types