. 2017 Jan 19;18(1):101.

doi: 10.1186/s12864-016-3432-5.

Proteomics informed by transcriptomics for characterising active transposable elements and genome annotation in Aedes aegypti

Kevin Maringer^{1

2

3}, Amjad Yousuf^{4

5}, Kate J Heesom⁶, Jun Fan⁷, David Lee⁴, Ana Fernandez-Sesma⁸, Conrad Bessant⁷, David A Matthews⁴, Andrew D Davidson⁹

Affiliations

¹ School of Cellular and Molecular Medicine, University of Bristol, Bristol, BS8 1TD, UK. K.Maringer@surrey.ac.uk.
² Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA. K.Maringer@surrey.ac.uk.
³ Present address: Department of Microbial Sciences, University of Surrey, Guildford, GU2 7XH, UK. K.Maringer@surrey.ac.uk.
⁴ School of Cellular and Molecular Medicine, University of Bristol, Bristol, BS8 1TD, UK.
⁵ College of Applied Medical Sciences, Taibah University, Medina, Kingdom of Saudi Arabia.
⁶ School of Biochemistry, University of Bristol, Bristol, BS8 1TD, UK.
⁷ School of Biological and Chemical Sciences, Queen Mary University of London, London, E1 4NS, UK.
⁸ Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA.
⁹ School of Cellular and Molecular Medicine, University of Bristol, Bristol, BS8 1TD, UK. Andrew.Davidson@bristol.ac.uk.

PMID: 28103802
PMCID: PMC5248466
DOI: 10.1186/s12864-016-3432-5

Proteomics informed by transcriptomics for characterising active transposable elements and genome annotation in Aedes aegypti

Kevin Maringer et al. BMC Genomics. 2017.

. 2017 Jan 19;18(1):101.

doi: 10.1186/s12864-016-3432-5.

Authors

Kevin Maringer^{1

2

3}, Amjad Yousuf^{4

5}, Kate J Heesom⁶, Jun Fan⁷, David Lee⁴, Ana Fernandez-Sesma⁸, Conrad Bessant⁷, David A Matthews⁴, Andrew D Davidson⁹

Affiliations

¹ School of Cellular and Molecular Medicine, University of Bristol, Bristol, BS8 1TD, UK. K.Maringer@surrey.ac.uk.
² Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA. K.Maringer@surrey.ac.uk.
³ Present address: Department of Microbial Sciences, University of Surrey, Guildford, GU2 7XH, UK. K.Maringer@surrey.ac.uk.
⁴ School of Cellular and Molecular Medicine, University of Bristol, Bristol, BS8 1TD, UK.
⁵ College of Applied Medical Sciences, Taibah University, Medina, Kingdom of Saudi Arabia.
⁶ School of Biochemistry, University of Bristol, Bristol, BS8 1TD, UK.
⁷ School of Biological and Chemical Sciences, Queen Mary University of London, London, E1 4NS, UK.
⁸ Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, 10029, NY, USA.
⁹ School of Cellular and Molecular Medicine, University of Bristol, Bristol, BS8 1TD, UK. Andrew.Davidson@bristol.ac.uk.

PMID: 28103802
PMCID: PMC5248466
DOI: 10.1186/s12864-016-3432-5

Abstract

Background: Aedes aegypti is a vector for the (re-)emerging human pathogens dengue, chikungunya, yellow fever and Zika viruses. Almost half of the Ae. aegypti genome is comprised of transposable elements (TEs). Transposons have been linked to diverse cellular processes, including the establishment of viral persistence in insects, an essential step in the transmission of vector-borne viruses. However, up until now it has not been possible to study the overall proteome derived from an organism's mobile genetic elements, partly due to the highly divergent nature of TEs. Furthermore, as for many non-model organisms, incomplete genome annotation has hampered proteomic studies on Ae. aegypti.

Results: We analysed the Ae. aegypti proteome using our new proteomics informed by transcriptomics (PIT) technique, which bypasses the need for genome annotation by identifying proteins through matched transcriptomic (rather than genomic) data. Our data vastly increase the number of experimentally confirmed Ae. aegypti proteins. The PIT analysis also identified hotspots of incomplete genome annotation, and showed that poor sequence and assembly quality do not explain all annotation gaps. Finally, in a proof-of-principle study, we developed criteria for the characterisation of proteomically active TEs. Protein expression did not correlate with a TE's genomic abundance at different levels of classification. Most notably, long terminal repeat (LTR) retrotransposons were markedly enriched compared to other elements. PIT was superior to 'conventional' proteomic approaches in both our transposon and genome annotation analyses.

Conclusions: We present the first proteomic characterisation of an organism's repertoire of mobile genetic elements, which will open new avenues of research into the function of transposon proteins in health and disease. Furthermore, our study provides a proof-of-concept that PIT can be used to evaluate a genome's annotation to guide annotation efforts which has the potential to improve the efficiency of annotation projects in non-model organisms. PIT therefore represents a valuable new tool to study the biology of the important vector species Ae. aegypti, including its role in transmitting emerging viruses of global public health concern.

Keywords: Aedes aegypti; Genome annotation; Non-model organism; PIT; Proteomics informed by transcriptomics; Transposon.

PubMed Disclaimer

Figures

**Fig. 1**
PIT identifies additional proteins in *Ae. aegypti* cells compared to ‘conventional’ proteomics. a Overview of the PIT pipeline. In ‘conventional’ proteomics (i), proteins detected by high-throughput LC-MS/MS from *Ae. aegypti* cell extracts are identified by comparison to mass spectra computationally predicted from protein or transcript annotations on the *Ae. aegypti* reference genome. (Annotated transcripts are *in silico* translated prior to mass spectra prediction). PIT identifies additional proteins by using RNA-seq to identify transcripts in RNA samples matched to protein isolates (ii). Transcripts are assembled *de novo* using Trinity software, translated *in silico*, and used for mass spectra prediction for peptide identification. From a single experimental sample, proteins are identified without the need for an annotated reference genome, and transcript abundance can be inferred from RNA-seq data. b Total unique proteins (i) and proteins with at least two recorded peptides (ii) identified in Aag2 cells based on the *Ae. aegypti* reference genome protein or transcript annotations, or using PIT. Percentages indicate the proportion of proteins identified only by PIT. c BLAST analysis of the PIT-identified proteome. Hits were mapped against the *Ae. aegypti* [taxid 7159], *Culex quinquefasciatus* [taxid 7176] (*Culex*) or *Drosophila melanogaster* [taxid 7227] (*Drosophila*) Ref-Seq databases. A subset of hits did not match annotated genes from these dipteran insects (non-insect). (i) Total PIT proteome, (ii) Translated ORFs from Trinity transcripts matched with at least two peptides

**Fig. 2**
New annotation for the *Ae. aegypti* genome. a Vectorbase annotation of the PIT proteome that maps to *Ae. aegypti* (hypoth. denotes ‘hypothetical’ protein). b Previously published transcriptomic and proteomic evidence for the expression of our PIT-identified proteins that are either annotated or listed as ‘hypothetical’ or ‘conserved hypothetical’ (hypothetical) in Vectorbase (source data specified in Additional file 6). c Result of Vectorbase BLAST alignment against the *Ae. aegypti* genome of all 215 PIT transcripts initially marked as having homology to proteins from *Cx. quinquefasciatus* or *D. melanogaster*, but not *Ae. aegypti* (‘non-*Aedes*’ insect hits). d Mean transcript length of non-*Aedes* insect PIT hits that do (mapped) or do not (no match) map to the *Ae. aegypti* genome. Error bars represent standard error of the mean; * P < 0.001. e Examples of non-*Aedes* insect PIT transcripts mapping precisely (i), or as extensions or transcript variants (ii) of *Ae. aegypti* genes already annotated in Vectorbase (Trinity IDs 4627 and 1476 respectively). f Examples of non-*Aedes* insect PIT transcripts providing new genomic annotation as potentially novel ORFs (i to iii), or extensions or transcript variants of known ORFs (ii and iii) (Trinity IDs 3521, 1935 and 1124 respectively). (d and e) Contigs not shown in entirety; illustrations are modified Vectorbase BLAST alignments of representative PIT transcripts (images stylistically edited for clarity); <> indicates annotated transcript orientation; *filled boxes* represent regions of alignment to genome (exons)

**Fig. 3**
Interrogation of the *Ae. aegypti* genome annotation using PIT. a Sequencing gaps surrounding 145 previously non-annotated proteins identified by PIT (‘new annotation’ in Fig. 2C) compared to a matched sample of annotated *Ae. aegypti* genes (Additional file 7). b Number of supercontigs within the *Ae. aegypti* genome assembly, or the subset containing new annotation from PIT, that have been mapped to chromosomal locations [48]. c Supercontigs to which our PIT hits align mapped to the three *Ae. aegypti* chromosomes (map modelled on [48]). The normalised ratio of PIT-containing supercontigs to total mapped supercontigs per chromosome is also specified. Full mapping data given in Additional file 3

**Fig. 4**
Identification of proteins derived from mobile genetic elements using PIT. a Proportion of non-insect PIT hits (*red*) and PIT hits matching known *Ae. aegypti* genes (*grey*) that display >30% amino acid sequence similarity to known mosquito TEs (E-value <10⁵) at increasing thresholds for % sequence coverage. *Arrow* indicates optimal threshold for TE identification, used for all remaining analyses unless otherwise specified. b As for (a), except that the proportion of PIT hits with >45% sequence coverage is plotted at different sequence similarity thresholds. c (i) Breakdown of the non-insect PIT proteome into virus-derived proteins, proteins with homology to known mosquito TEs (Additional file 4), and other non-classified proteins (other). Proteins with >95% amino acid homology and >95% sequence coverage were considered ‘exact matches’ to known TEs (Table 1). (ii) Identification of TEs in the *Aedes* PIT proteome. d as for (c), except only hits associated with two or more peptides were included in the analysis. e (i) Identification of TEs in the non-insect PIT hits using either TEfam, RepBase, or both databases as a reference. (ii) Identification of TEs in the *Aedes* PIT proteome using a combined TEfam and RepBase reference database

**Fig. 5**
LTR retrotransposons are disproportionately active at the protein level compared to other TEs. a Schematic illustrating representative TEs previously identified in mosquitoes (not to scale). *Filled boxes* indicate protein-coding regions, *open boxes* and *grey shading* indicate non-translated regions and conserved non-coding domains (*grey*). Note that not all SINEs are tRNA-like. Bracketed ORFs are not always found in that TE subclass. Typical terminal amino acids are indicated for non-LTR retrotransposons, SINEs and helitrons. Env-like, envelope-like protein (incomplete); gag, group antigen; LTR, long-terminal repeat retrotransposons; MITE, miniature inverted repeat transposable element; non-LTR, non-LTR retrotransposon; PLE, Penelope-like element; pol, polymerase; SINE, short interspersed element; TIR, terminal inverted repeat. b Absolute abundance of proteins derived from known mosquito protein-coding TEs in the total PIT proteome (i) and for all PIT hits associated with two or more peptides (ii). c Relative enrichment of proteins from detected TEs using >30% (i), >40% (ii) and >50% (iii) amino acid identity as a threshold for TE discovery (>45% sequence coverage throughout). Enrichment is shown relative to the proportion of the *Ae. aegypti* reference genome that is comprised of sequences derived from each respective TE subclass (genome (%)) [8], relative to the total copy number of elements from each TE subclass within the *Ae. aegypti* reference genome (genome (copy #)) [8], and relative to the total number of entries for each TE subclass in the TEfam database used to identify TEs in our dataset. Enrichment is shown relative to LTR retrotransposons. d as for (c), except that only PIT hits identified through more than two peptides are shown

**Fig. 6**
Protein expression from mobile genetic elements does not correlate with their genomic abundance. **a-c** Breakdown of elements from different TE clades/superfamilies detected in the total PIT proteome. (i) absolute number (clockwise in decreasing order of abundance), (ii) relative enrichment compared to copy number in the *Ae. aegypti* genome [8], (iii) relative enrichment compared to *Ae. aegypti* genome coverage (%) [8]. *Yellow*, overrepresented TEs; *blue*, underrepresented TEs. Genome copy number was not known for Mutator (c)

**Fig. 7**
Protein expression across ORFs for TEs that encode multiple ORFs. a LTR retrotransposons and b non-LTR retrotransposons

See this image and copyright information in PMC

Comment in

Proteomics technique opens new frontiers in mobilome research.
Davidson AD, Matthews DA, Maringer K. Davidson AD, et al. Mob Genet Elements. 2017 Aug 1;7(4):1-9. doi: 10.1080/2159256X.2017.1362494. eCollection 2017. Mob Genet Elements. 2017. PMID: 28932623 Free PMC article.

References

1. Armengaud J, Trapp J, Pible O, Geffard O, Chaumot A, Hartmann EM. Non-model organisms, a species endangered by proteogenomics. J Proteomics. 2014;105:5–18. doi: 10.1016/j.jprot.2014.01.007. - DOI - PubMed
1. Evans VC, Barker G, Heesom KJ, Fan J, Bessant C, Matthews DA. De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat Methods. 2012;9:1207–1211. doi: 10.1038/nmeth.2227. - DOI - PMC - PubMed
1. Wynne JW, Shiell BJ, Marsh GA, Boyd V, Harper JA, Heesom K, et al. Proteomics informed by transcriptomics reveals Hendra virus sensitizes bat cells to TRAIL-mediated apoptosis. Genome Biol. 2014;15:532. - PMC - PubMed
1. Villar M, Popara M, Ayllón N, de Mera IG F, Mateos-Hernández L, Galindo RC, et al. A systems biology approach to the characterization of stress response in Dermacentor reticulatus tick unfed larvae. PLoS One. 2014;9:e89564. doi: 10.1371/journal.pone.0089564. - DOI - PMC - PubMed
1. Mudenda L, Pierlé SA, Turse JE, Scoles GA, Purvine SO, Nicora CD, et al. Proteomics informed by transcriptomics identifies novel secreted proteins in Dermacentor andersoni saliva. Int J Parasitol. 2014;44:1029–37. doi: 10.1016/j.ijpara.2014.07.003. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Proteomics informed by transcriptomics for characterising active transposable elements and genome annotation in Aedes aegypti

Affiliations

Proteomics informed by transcriptomics for characterising active transposable elements and genome annotation in Aedes aegypti

Authors

Affiliations

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources