Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 7;11(1):23522.
doi: 10.1038/s41598-021-03041-0.

A computational pipeline for functional gene discovery

Affiliations

A computational pipeline for functional gene discovery

Aolani Colon et al. Sci Rep. .

Abstract

Many computational pipelines exist for the detection of differentially expressed genes. However, computational pipelines for functional gene detection rarely exist. We developed a new computational pipeline for functional gene identification from transcriptome profiling data. Key features of the pipeline include batch effect correction, clustering optimization by gap statistics, gene ontology analysis of clustered genes, and literature analysis for functional gene discovery. By leveraging this pipeline on RNA-seq datasets from two mouse retinal development studies, we identified 7 candidate genes involved in the formation of the photoreceptor outer segment. The expression of top three candidate genes (Pde8b, Laptm4b, and Nr1h4) in the outer segment of the developing mouse retina were experimentally validated by immunohistochemical analysis. This computational pipeline can accurately predict novel functional gene for a specific biological process, e.g., development of the outer segment and synapses of the photoreceptor cells in the mouse retina. This pipeline can also be useful to discover functional genes for other biological processes and in other organs and tissues.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Outline of computational pipeline for functional gene discovery. Schematic diagram illustrates the steps in the computational pipeline for functional gene discovery in retinal outer segment development as an example: 1. Sequence alignment to generate raw count matrix, 2. Correction for batch effect/variance to generate corrected count matrix, 3. Detection of differentially expressed genes (DEGs), 4. Normalization, 5. Determination of the optimal number of gene clusters/expression patterns, 6. Gene ontology analysis, 7. Discovery of novel functional genes via literature search, and 8. Experimental verification of computationally predicted novel functional genes.
Figure 2
Figure 2
Gene expression profile and gene enrichment associated with OS development. Plots of gene expression level as Z-score over developmental stages and gene enrichment for clusters #3 (A) and #8 (B). Each line represents the expression profile of a gene. Gray lines represent known photoreceptor OS genes, while colored lines represent predicted novel functional OS genes. Gene ontology (GO) term enrichment for clusters #3 and #8 was plotted on the right (A,B). The size of the circle represents the number of gene counts, and color represents the significance ranked by p-adjusted value. GO terms (cellular component, CC) associated with the three clusters, ranked by p-adjusted value with top 10 processes were listed. (C) Representative photomicrographs of mouse retina sections at postnatal day 14 (P14) immunostained with antibodies against Pde8b, Laptm4b, Nr1h4, and known photoreceptor markers for rods (Rhodopsin). OS outer segment, ONL outer nuclear layer, OPL outer plexiform layer, INL inner nuclear layer, IPL inner plexiform layer, GCL ganglion cell layer. Scale bar 20 µm.

References

    1. Kim D, Langmead B, Salzberg SL. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. - DOI - PMC - PubMed
    1. Dobin A, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. - DOI - PMC - PubMed
    1. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:1–21. doi: 10.1186/s13059-014-0550-8. - DOI - PMC - PubMed
    1. Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. - DOI - PMC - PubMed
    1. Vieth B, Parekh S, Ziegenhain C, Enard W, Hellmann I. A systematic evaluation of single cell RNA-seq analysis pipelines. Nat. Commun. 2019;10:1–11. doi: 10.1038/s41467-019-12266-7. - DOI - PMC - PubMed

Publication types