Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 3;24(5):4937.
doi: 10.3390/ijms24054937.

Full-Length Transcriptome of the Great Himalayan Leaf-Nosed Bats (Hipposideros armiger) Optimized Genome Annotation and Revealed the Expression of Novel Genes

Affiliations

Full-Length Transcriptome of the Great Himalayan Leaf-Nosed Bats (Hipposideros armiger) Optimized Genome Annotation and Revealed the Expression of Novel Genes

Mingyue Bao et al. Int J Mol Sci. .

Abstract

The Great Himalayan Leaf-nosed bat (Hipposideros armiger) is one of the most representative species of all echolocating bats and is an ideal model for studying the echolocation system of bats. An incomplete reference genome and limited availability of full-length cDNAs have hindered the identification of alternatively spliced transcripts, which slowed down related basic studies on bats' echolocation and evolution. In this study, we analyzed five organs from H. armiger for the first time using PacBio single-molecule real-time sequencing (SMRT). There were 120 GB of subreads generated, including 1,472,058 full-length non-chimeric (FLNC) sequences. A total of 34,611 alternative splicing (AS) events and 66,010 Alternative Polyadenylation (APA) sites were detected by transcriptome structural analysis. Moreover, a total of 110,611 isoforms were identified, consisting of 52% new isoforms of known genes and 5% of novel gene loci, as well as 2112 novel genes that have not been annotated before in the current reference genome of H. armiger. Furthermore, several key novel genes, including Pol, RAS, NFKB1, and CAMK4, were identified as being associated with nervous, signal transduction, and immune system processes, which may be involved in regulating the auditory nervous perception and immune system that helps bats to regulate in echolocation. In conclusion, the full-length transcriptome results optimized and replenished existing H. armiger genome annotation in multiple ways and offer advantages for newly discovered or previously unrecognized protein-coding genes and isoforms, which can be used as a reference resource.

Keywords: Hipposideros armiger; PacBio; SMRT sequencing; full-length transcriptome; genome annotation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Sequel full length transcriptome cDNA library construction, data processing, and bioinformatic analysis.
Figure 2
Figure 2
Length distributions of PacBio SMRT sequencing. (A) Number and length distributions of 1,630,747 CCS sequences. (B) Number and length distributions of 1,472,058 FLNC sequences. (C) Number and length distributions of 94,981 consistent sequence. The blue area is the number of reads at a certain length (corresponding to the left abscissa value), the black line corresponds to the ordinate on the right and is the number of reads greater than a certain length (corresponding to the right abscissa value).
Figure 3
Figure 3
Pie charts of gene expression. (A) Statistical results of consistency sequence classification quantity. (B) GMAP analysis of corrected reads to reference genome. (C) Classification of transcript isoforms identified. (D) Statistics of gene expression results.
Figure 4
Figure 4
Function annotation of all isoforms. (A) Venn diagram of annotations in KEGG, KOG, NR, and SwissProt databases. (B) Nr Homologous species distribution diagram of all isoforms. (C) Distribution of GO terms for all annotated isoforms in biological process, cellular component, and molecular function. (D) KOG enrichment of all isoforms. (E) KEGG pathways enrichment of all isoforms.
Figure 5
Figure 5
Participation of isoforms in various pathways. (A) The proportion of isoforms annotated in the Immune system pathway. (B) The proportion of isoforms annotated in the signal transduction pathway. (C) The proportion of isoforms annotated in the nervous system pathway.
Figure 6
Figure 6
Basic annotation information of novel genes. (A) Function annotation of novel genes in all databases. (B) Distribution of GO terms for all annotated novel genes in biological process, cellular component, and molecular function. (C) KEGG pathways of novel genes. (D) Network diagram of Immune system pathways and genes. (E) Network diagram of Nervous system pathways and genes. (F) Network diagram of Signal transduction pathways and genes.
Figure 7
Figure 7
Prediction of the coding sequences and SSR types. (A) The length of 3′ UTR. (B) The length of 5′ UTR. (C) Length distribution of CDSs. (D) Distribution map of SSR types.
Figure 8
Figure 8
AS and APA analysis of H. armiger full-length transcriptome. (A) Seven major AS events. (B) Number and categories of the AS events identified. (C) Distribution of the number of poly (A) sites per gene. (D) Two examples for the isoforms generated by the same gene (XM_019629832.1 (TIGD2) and XM_019637557.1 (ACTB), respectively). The transcripts in the reference genome of H. armiger and PacBio were respectively marked with dark blue and purple color. The gray horizontal line and the text above showed the chromosome ID, start, and end position of the terminal gene region.
Figure 9
Figure 9
The identification of lncRNA and TFs based on the SMRT sequencing in H. armiger. (A) Venn diagram of lncRNA predicted by CNCI, CPC, and Swissprot tools. (B) Proportions of different types of lncRNAs. (C) Length distribution of identified lncRNAs and mRNA in H. armiger. (D) The number and family of TFs were predicted by SMRT.
Figure 10
Figure 10
Optimization of H. armiger gene structure. (A) UTR length variation frequency distribution. (B,C) Isoforms spanning two or more reference genes. (D) An example classified as intron isoforms. (E) An example classified as Exonic overlap isoforms.

References

    1. Schoeppler D., Schnitzler H.U., Denzinger A. Precise Doppler shift compensation in the hipposiderid bat, Hipposideros armiger. Sci. Rep. 2018;8:4598. doi: 10.1038/s41598-018-22880-y. - DOI - PMC - PubMed
    1. Liu Y., Feng J., Metzner W. Different auditory feedback control for echolocation and communication in horseshoe bats. PLoS ONE. 2013;8:e62710. doi: 10.1371/journal.pone.0062710. - DOI - PMC - PubMed
    1. Warnecke M., Falk B., Moss C.F. Echolocation and flight behavior of the bat Hipposideros armiger terasensis in a structured corridor. J. Acoust. Soc. Am. 2018;144:806. doi: 10.1121/1.5050525. - DOI - PubMed
    1. Xu L., He C., Shen C., Jiang T., Shi L., Sun K., Berquist S.W., Feng J. Phylogeography and population genetic structure of the great leaf-nosed bat (Hipposideros armiger) in China. J. Hered. 2010;101:562–572. doi: 10.1093/jhered/esq039. - DOI - PubMed
    1. Chen Y., Liu Q., Su Q., Sun Y., Peng X., He X., Zhang L. ‘Compromise’ in Echolocation Calls between Different Colonies of the Intermediate Leaf-Nosed Bat (Hipposideros larvatus) PLoS ONE. 2016;11:e0151382. doi: 10.1371/journal.pone.0151382. - DOI - PMC - PubMed

Substances

LinkOut - more resources