Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 19;18(1):101.
doi: 10.1186/s12864-016-3432-5.

Proteomics informed by transcriptomics for characterising active transposable elements and genome annotation in Aedes aegypti

Affiliations

Proteomics informed by transcriptomics for characterising active transposable elements and genome annotation in Aedes aegypti

Kevin Maringer et al. BMC Genomics. .

Abstract

Background: Aedes aegypti is a vector for the (re-)emerging human pathogens dengue, chikungunya, yellow fever and Zika viruses. Almost half of the Ae. aegypti genome is comprised of transposable elements (TEs). Transposons have been linked to diverse cellular processes, including the establishment of viral persistence in insects, an essential step in the transmission of vector-borne viruses. However, up until now it has not been possible to study the overall proteome derived from an organism's mobile genetic elements, partly due to the highly divergent nature of TEs. Furthermore, as for many non-model organisms, incomplete genome annotation has hampered proteomic studies on Ae. aegypti.

Results: We analysed the Ae. aegypti proteome using our new proteomics informed by transcriptomics (PIT) technique, which bypasses the need for genome annotation by identifying proteins through matched transcriptomic (rather than genomic) data. Our data vastly increase the number of experimentally confirmed Ae. aegypti proteins. The PIT analysis also identified hotspots of incomplete genome annotation, and showed that poor sequence and assembly quality do not explain all annotation gaps. Finally, in a proof-of-principle study, we developed criteria for the characterisation of proteomically active TEs. Protein expression did not correlate with a TE's genomic abundance at different levels of classification. Most notably, long terminal repeat (LTR) retrotransposons were markedly enriched compared to other elements. PIT was superior to 'conventional' proteomic approaches in both our transposon and genome annotation analyses.

Conclusions: We present the first proteomic characterisation of an organism's repertoire of mobile genetic elements, which will open new avenues of research into the function of transposon proteins in health and disease. Furthermore, our study provides a proof-of-concept that PIT can be used to evaluate a genome's annotation to guide annotation efforts which has the potential to improve the efficiency of annotation projects in non-model organisms. PIT therefore represents a valuable new tool to study the biology of the important vector species Ae. aegypti, including its role in transmitting emerging viruses of global public health concern.

Keywords: Aedes aegypti; Genome annotation; Non-model organism; PIT; Proteomics informed by transcriptomics; Transposon.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
PIT identifies additional proteins in Ae. aegypti cells compared to ‘conventional’ proteomics. a Overview of the PIT pipeline. In ‘conventional’ proteomics (i), proteins detected by high-throughput LC-MS/MS from Ae. aegypti cell extracts are identified by comparison to mass spectra computationally predicted from protein or transcript annotations on the Ae. aegypti reference genome. (Annotated transcripts are in silico translated prior to mass spectra prediction). PIT identifies additional proteins by using RNA-seq to identify transcripts in RNA samples matched to protein isolates (ii). Transcripts are assembled de novo using Trinity software, translated in silico, and used for mass spectra prediction for peptide identification. From a single experimental sample, proteins are identified without the need for an annotated reference genome, and transcript abundance can be inferred from RNA-seq data. b Total unique proteins (i) and proteins with at least two recorded peptides (ii) identified in Aag2 cells based on the Ae. aegypti reference genome protein or transcript annotations, or using PIT. Percentages indicate the proportion of proteins identified only by PIT. c BLAST analysis of the PIT-identified proteome. Hits were mapped against the Ae. aegypti [taxid 7159], Culex quinquefasciatus [taxid 7176] (Culex) or Drosophila melanogaster [taxid 7227] (Drosophila) Ref-Seq databases. A subset of hits did not match annotated genes from these dipteran insects (non-insect). (i) Total PIT proteome, (ii) Translated ORFs from Trinity transcripts matched with at least two peptides
Fig. 2
Fig. 2
New annotation for the Ae. aegypti genome. a Vectorbase annotation of the PIT proteome that maps to Ae. aegypti (hypoth. denotes ‘hypothetical’ protein). b Previously published transcriptomic and proteomic evidence for the expression of our PIT-identified proteins that are either annotated or listed as ‘hypothetical’ or ‘conserved hypothetical’ (hypothetical) in Vectorbase (source data specified in Additional file 6). c Result of Vectorbase BLAST alignment against the Ae. aegypti genome of all 215 PIT transcripts initially marked as having homology to proteins from Cx. quinquefasciatus or D. melanogaster, but not Ae. aegypti (‘non-Aedes’ insect hits). d Mean transcript length of non-Aedes insect PIT hits that do (mapped) or do not (no match) map to the Ae. aegypti genome. Error bars represent standard error of the mean; * P < 0.001. e Examples of non-Aedes insect PIT transcripts mapping precisely (i), or as extensions or transcript variants (ii) of Ae. aegypti genes already annotated in Vectorbase (Trinity IDs 4627 and 1476 respectively). f Examples of non-Aedes insect PIT transcripts providing new genomic annotation as potentially novel ORFs (i to iii), or extensions or transcript variants of known ORFs (ii and iii) (Trinity IDs 3521, 1935 and 1124 respectively). (d and e) Contigs not shown in entirety; illustrations are modified Vectorbase BLAST alignments of representative PIT transcripts (images stylistically edited for clarity); <> indicates annotated transcript orientation; filled boxes represent regions of alignment to genome (exons)
Fig. 3
Fig. 3
Interrogation of the Ae. aegypti genome annotation using PIT. a Sequencing gaps surrounding 145 previously non-annotated proteins identified by PIT (‘new annotation’ in Fig. 2C) compared to a matched sample of annotated Ae. aegypti genes (Additional file 7). b Number of supercontigs within the Ae. aegypti genome assembly, or the subset containing new annotation from PIT, that have been mapped to chromosomal locations [48]. c Supercontigs to which our PIT hits align mapped to the three Ae. aegypti chromosomes (map modelled on [48]). The normalised ratio of PIT-containing supercontigs to total mapped supercontigs per chromosome is also specified. Full mapping data given in Additional file 3
Fig. 4
Fig. 4
Identification of proteins derived from mobile genetic elements using PIT. a Proportion of non-insect PIT hits (red) and PIT hits matching known Ae. aegypti genes (grey) that display >30% amino acid sequence similarity to known mosquito TEs (E-value <105) at increasing thresholds for % sequence coverage. Arrow indicates optimal threshold for TE identification, used for all remaining analyses unless otherwise specified. b As for (a), except that the proportion of PIT hits with >45% sequence coverage is plotted at different sequence similarity thresholds. c (i) Breakdown of the non-insect PIT proteome into virus-derived proteins, proteins with homology to known mosquito TEs (Additional file 4), and other non-classified proteins (other). Proteins with >95% amino acid homology and >95% sequence coverage were considered ‘exact matches’ to known TEs (Table 1). (ii) Identification of TEs in the Aedes PIT proteome. d as for (c), except only hits associated with two or more peptides were included in the analysis. e (i) Identification of TEs in the non-insect PIT hits using either TEfam, RepBase, or both databases as a reference. (ii) Identification of TEs in the Aedes PIT proteome using a combined TEfam and RepBase reference database
Fig. 5
Fig. 5
LTR retrotransposons are disproportionately active at the protein level compared to other TEs. a Schematic illustrating representative TEs previously identified in mosquitoes (not to scale). Filled boxes indicate protein-coding regions, open boxes and grey shading indicate non-translated regions and conserved non-coding domains (grey). Note that not all SINEs are tRNA-like. Bracketed ORFs are not always found in that TE subclass. Typical terminal amino acids are indicated for non-LTR retrotransposons, SINEs and helitrons. Env-like, envelope-like protein (incomplete); gag, group antigen; LTR, long-terminal repeat retrotransposons; MITE, miniature inverted repeat transposable element; non-LTR, non-LTR retrotransposon; PLE, Penelope-like element; pol, polymerase; SINE, short interspersed element; TIR, terminal inverted repeat. b Absolute abundance of proteins derived from known mosquito protein-coding TEs in the total PIT proteome (i) and for all PIT hits associated with two or more peptides (ii). c Relative enrichment of proteins from detected TEs using >30% (i), >40% (ii) and >50% (iii) amino acid identity as a threshold for TE discovery (>45% sequence coverage throughout). Enrichment is shown relative to the proportion of the Ae. aegypti reference genome that is comprised of sequences derived from each respective TE subclass (genome (%)) [8], relative to the total copy number of elements from each TE subclass within the Ae. aegypti reference genome (genome (copy #)) [8], and relative to the total number of entries for each TE subclass in the TEfam database used to identify TEs in our dataset. Enrichment is shown relative to LTR retrotransposons. d as for (c), except that only PIT hits identified through more than two peptides are shown
Fig. 6
Fig. 6
Protein expression from mobile genetic elements does not correlate with their genomic abundance. a-c Breakdown of elements from different TE clades/superfamilies detected in the total PIT proteome. (i) absolute number (clockwise in decreasing order of abundance), (ii) relative enrichment compared to copy number in the Ae. aegypti genome [8], (iii) relative enrichment compared to Ae. aegypti genome coverage (%) [8]. Yellow, overrepresented TEs; blue, underrepresented TEs. Genome copy number was not known for Mutator (c)
Fig. 7
Fig. 7
Protein expression across ORFs for TEs that encode multiple ORFs. a LTR retrotransposons and b non-LTR retrotransposons

Comment in

Similar articles

Cited by

References

    1. Armengaud J, Trapp J, Pible O, Geffard O, Chaumot A, Hartmann EM. Non-model organisms, a species endangered by proteogenomics. J Proteomics. 2014;105:5–18. doi: 10.1016/j.jprot.2014.01.007. - DOI - PubMed
    1. Evans VC, Barker G, Heesom KJ, Fan J, Bessant C, Matthews DA. De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat Methods. 2012;9:1207–1211. doi: 10.1038/nmeth.2227. - DOI - PMC - PubMed
    1. Wynne JW, Shiell BJ, Marsh GA, Boyd V, Harper JA, Heesom K, et al. Proteomics informed by transcriptomics reveals Hendra virus sensitizes bat cells to TRAIL-mediated apoptosis. Genome Biol. 2014;15:532. - PMC - PubMed
    1. Villar M, Popara M, Ayllón N, de Mera IG F, Mateos-Hernández L, Galindo RC, et al. A systems biology approach to the characterization of stress response in Dermacentor reticulatus tick unfed larvae. PLoS One. 2014;9:e89564. doi: 10.1371/journal.pone.0089564. - DOI - PMC - PubMed
    1. Mudenda L, Pierlé SA, Turse JE, Scoles GA, Purvine SO, Nicora CD, et al. Proteomics informed by transcriptomics identifies novel secreted proteins in Dermacentor andersoni saliva. Int J Parasitol. 2014;44:1029–37. doi: 10.1016/j.ijpara.2014.07.003. - DOI - PubMed

Publication types

LinkOut - more resources