. 2022 Oct 3:13:997460.

doi: 10.3389/fgene.2022.997460. eCollection 2022.

Prediction of transcript isoforms in 19 chicken tissues by Oxford Nanopore long-read sequencing

Dailu Guan¹, Michelle M Halstead¹, Alma D Islas-Trejo¹, Daniel E Goszczynski¹, Hans H Cheng², Pablo J Ross¹, Huaijun Zhou¹

Affiliations

¹ Department of Animal Science, University of California Davis, Davis, CA, United States.
² USDA, ARS, USNPRC, Avian Disease and Oncology Laboratory, East Lansing, MI, United States.

PMID: 36246588
PMCID: PMC9561881
DOI: 10.3389/fgene.2022.997460

Prediction of transcript isoforms in 19 chicken tissues by Oxford Nanopore long-read sequencing

Dailu Guan et al. Front Genet. 2022.

. 2022 Oct 3:13:997460.

doi: 10.3389/fgene.2022.997460. eCollection 2022.

Authors

Dailu Guan¹, Michelle M Halstead¹, Alma D Islas-Trejo¹, Daniel E Goszczynski¹, Hans H Cheng², Pablo J Ross¹, Huaijun Zhou¹

Affiliations

¹ Department of Animal Science, University of California Davis, Davis, CA, United States.
² USDA, ARS, USNPRC, Avian Disease and Oncology Laboratory, East Lansing, MI, United States.

PMID: 36246588
PMCID: PMC9561881
DOI: 10.3389/fgene.2022.997460

Abstract

To identify and annotate transcript isoforms in the chicken genome, we generated Nanopore long-read sequencing data from 68 samples that encompassed 19 diverse tissues collected from experimental adult male and female White Leghorn chickens. More than 23.8 million reads with mean read length of 790 bases and average quality of 18.2 were generated. The annotation and subsequent filtering resulted in the identification of 55,382 transcripts at 40,547 loci with mean length of 1,700 bases. We predicted 30,967 coding transcripts at 19,461 loci, and 16,495 lncRNA transcripts at 15,512 loci. Compared to existing reference annotations, we found ∼52% of annotated transcripts could be partially or fully matched while ∼47% were novel. Seventy percent of novel transcripts were potentially transcribed from lncRNA loci. Based on our annotation, we quantified transcript expression across tissues and found two brain tissues (i.e., cerebellum and cortex) expressed the highest number of transcripts and loci. Furthermore, ∼22% of the transcripts displayed tissue specificity with the reproductive tissues (i.e., testis and ovary) exhibiting the most tissue-specific transcripts. Despite our wide sampling, ∼20% of Ensembl reference loci were not detected. This suggests that deeper sequencing and additional samples that include different breeds, cell types, developmental stages, and physiological conditions, are needed to fully annotate the chicken genome. The application of Nanopore sequencing in this study demonstrates the usefulness of long-read data in discovering additional novel loci (e.g., lncRNA loci) and resolving complex transcripts (e.g., the longest transcript for the TTN locus).

Keywords: annotation; chicken; long-read sequencing; nanopore; transcript isoform; transcriptome.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Data summary of 68 chicken Nanopore long-read transcriptome datasets. **(A)** Bivariate plot (De Coster et al., 2018) depicting read length (x-axis) and quality (y-axis) of Nanopore long-read transcriptome reads **(B)** Hierarchical clustering of 68 chicken Nanopore long-read transcriptome samples used in this study. The dendrogram is built based on gene expressions quantified with transcripts per million (TPM ≥0.1). The distance between individuals is indicated by 1-r, where r is the Pearson correlation coefficient. The red arrow indicates sample Cecum_CA, which did not cluster with other cecal samples. **(C)** Correlation between the number of sequencing reads (x-axis) and the number of expressed genes (y-axis, TPM >0.1). The Pearson’s correlation is 0.71 (p = 1.30 × 10⁻¹¹).

**FIGURE 2**
Transcript assembly using Nanopore long-read transcriptome data. **(A)** Comparisons of predicted transcripts against Ensembl (V102, vsEMBL) and NCBI annotation (V105, vsNCBI). The transcripts were classified according to the GffCompare software (Pertea and Pertea, 2020). The panels **(B,C)** depict the distributions of predicted transcript length and exon numbers, respectively. **(D)** A screenshot showing the predicted longest transcript, which is located on chromosome 7 (15,343,033-15,384,347). Blast analysis indicated the transcript matched to the *TTN* gene locus encoding the titin protein.

**FIGURE 3**
Characterization of assembled transcripts. **(A)** Number of loci in NCBI (V105), Ensembl (V102) and our annotations. **(B)** Pie chart depicting GffCompare types to Ensembl annotation (V102). **(C)** Number of transcripts as a function of protein-coding, lncRNA, and other non-coding loci. **(D)** Transcript expression measured as transcript per million (TPM) as a function of different types of transcripts classified by GffCompare tool. Exact match: GffCompare code “=”, which means the intron chains of our annotated transcripts can exactly match to reference annotations; Novel isoform: GffCompare codes ‘c,’ ‘k,’ ‘j,’ ‘m,’ ‘n,’ or ‘o’, which means predicted transcript cannot match a reference transcript but can match a reference gene; novel loci: GffCompare codes ‘i,’ ‘u,’ ‘y,’ or ‘x’, which means predicted transcript cannot match either a reference transcript or a reference locus. The type ‘y’ only has 134 transcripts, a small proportion that is not visible in the pie chart. Student’ t tests were carried out between two groups of transcripts, and p values were adjusted by using false discovery rate (FDR) method (Benjamini and Hochberg, 1995).

**FIGURE 4**
Analysis of tissue-specificity across tissues. **(A)** Tissue specificity index (TSI) as a function of different types of transcripts classified by GffCompare. Code “ = ” means the intron chains of our annotated transcripts can exactly match to reference annotations (Exact match); Codes ‘c,’ ‘k,’ ‘j,’ ‘m,’ ‘n,’ or ‘o’ mean predicted transcript cannot match a reference transcript but can match a reference gene (Novel isoform); Codes ‘i,’ ‘u,’ ‘y,’ or ‘x’ means predicted transcript cannot match either a reference transcript or a reference locus (novel loci). **(B)** Transcript expression measured as transcript per million (TPM) as a function of TSI. We grouped transcripts according to their expressions. **(C)** Number of tissue-specific transcripts in each tissue. **(D)** A screenshot showing a novel transcript only predicted by our data, which is located on chromosome 4 (52,482,563–52,492,561). **(E)** TPM expressions of the predicted lncRNA transcript shown in the panel **(D)**. The transcript is highly expressed in testes samples, but not any other tissue. The FEELnc predicted it as a sense intergenic lncRNA.

**FIGURE 5**
Functional enrichment of tissue-specific transcripts and differential alternative splicing analysis. **(A)** Heatmap depicting the negative log₁₀FDR (false discovery rate) values for the top 10 Gene Ontology (GO) Biological Process terms. At the right side, we show several examples of GO terms, as well as their FDR values. **(B)** Number of unique transcripts detected as a function of tissues added. Transcripts are categories into three types (see Methods). **(C)**. Sashimi plots of *CYB561A3* gene that showed DAS between heart (red) and testis (blue).

See this image and copyright information in PMC

Cited by

Genetic regulation of gene expression across multiple tissues in chickens.
Guan D, Bai Z, Zhu X, Zhong C, Hou Y, Zhu D; ChickenGTEx Consortium; Li H, Lan F, Diao S, Yao Y, Zhao B, Li X, Pan Z, Gao Y, Wang Y, Zou D, Wang R, Xu T, Sun C, Yin H, Teng J, Xu Z, Lin Q, Shi S, Shao D, Degalez F, Lagarrigue S, Wang Y, Wang M, Peng M, Rocha D, Charles M, Smith J, Watson K, Buitenhuis AJ, Sahana G, Lund MS, Warren W, Frantz L, Larson G, Lamont SJ, Si W, Zhao X, Li B, Zhang H, Luo C, Shu D, Qu H, Luo W, Li Z, Nie Q, Zhang X, Xiang R, Liu S, Zhang Z, Zhang Z, Liu GE, Cheng H, Yang N, Hu X, Zhou H, Fang L. Guan D, et al. Nat Genet. 2025 May;57(5):1298-1308. doi: 10.1038/s41588-025-02155-9. Epub 2025 Apr 8. Nat Genet. 2025. PMID: 40200121
Prediction of transcript isoforms and identification of tissue-specific genes in cucumber.
Wang W, Shen C, Wen X, Li A, Gao Q, Xu Z, Wei Y, Li Y, Guan D, Liu B. Wang W, et al. BMC Genomics. 2025 Jan 10;26(1):25. doi: 10.1186/s12864-025-11212-w. BMC Genomics. 2025. PMID: 39794760 Free PMC article.
Full-length transcriptome sequencing of pepper fruit during development and construction of a transcript variation database.
Liu Z, Yang B, Zhang T, Sun H, Mao L, Yang S, Dai X, Suo H, Zhang Z, Chen W, Chen H, Xu W, Dossa K, Zou X, Ou L. Liu Z, et al. Hortic Res. 2024 Jul 24;11(9):uhae198. doi: 10.1093/hr/uhae198. eCollection 2024 Sep. Hortic Res. 2024. PMID: 39257544 Free PMC article.
Enriched atlas of lncRNA and protein-coding genes for the GRCg7b chicken assembly and its functional annotation across 47 tissues.
Degalez F, Charles M, Foissac S, Zhou H, Guan D, Fang L, Klopp C, Allain C, Lagoutte L, Lecerf F, Acloque H, Giuffra E, Pitel F, Lagarrigue S. Degalez F, et al. Sci Rep. 2024 Mar 19;14(1):6588. doi: 10.1038/s41598-024-56705-y. Sci Rep. 2024. PMID: 38504112 Free PMC article.
The Abundant and Unique Transcripts and Alternative Splicing of the Artificially Autododecaploid London Plane (Platanus × acerifolia).
Yan X, Chen X, Li Y, Li Y, Wang F, Zhang J, Ning G, Bao M. Yan X, et al. Int J Mol Sci. 2023 Sep 23;24(19):14486. doi: 10.3390/ijms241914486. Int J Mol Sci. 2023. PMID: 37833935 Free PMC article.

See all "Cited by" articles

References

1. Amarasinghe S. L., Su S., Dong X., Zappia L., Ritchie M. E., Gouil Q. (2020). Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30. 10.1186/s13059-020-1935-5 - DOI - PMC - PubMed
1. Anders S., Pyl P. T., Huber W. (2015). HTSeq—A Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169. 10.1093/bioinformatics/btu638 - DOI - PMC - PubMed
1. Andersson L., Archibald A. L., Bottema C. D., Brauning R., Burgess S. C., Burt D. W., et al. (2015). Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol. 16, 57. 10.1186/s13059-015-0622-4 - DOI - PMC - PubMed
1. Baralle F. E., Giudice J. (2017). Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451. 10.1038/nrm.2017.27 - DOI - PMC - PubMed
1. Beiki H., Liu H., Huang J., Manchanda N., Nonneman D., Smith T. P. L., et al. (2019). Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data. BMC Genomics 20, 344. 10.1186/s12864-019-5709-y - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction of transcript isoforms in 19 chicken tissues by Oxford Nanopore long-read sequencing

Affiliations

Prediction of transcript isoforms in 19 chicken tissues by Oxford Nanopore long-read sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources