Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 2;5(12):e14202.
doi: 10.1371/journal.pone.0014202.

De novo transcriptome sequencing in Anopheles funestus using Illumina RNA-seq technology

Affiliations

De novo transcriptome sequencing in Anopheles funestus using Illumina RNA-seq technology

Jacob E Crawford et al. PLoS One. .

Abstract

Background: Anopheles funestus is one of the primary vectors of human malaria, which causes a million deaths each year in sub-Saharan Africa. Few scientific resources are available to facilitate studies of this mosquito species and relatively little is known about its basic biology and evolution, making development and implementation of novel disease control efforts more difficult. The An. funestus genome has not been sequenced, so in order to facilitate genome-scale experimental biology, we have sequenced the adult female transcriptome of An. funestus from a newly founded colony in Burkina Faso, West Africa, using the Illumina GAIIx next generation sequencing platform.

Methodology/principal findings: We assembled short Illumina reads de novo using a novel approach involving iterative de novo assemblies and "target-based" contig clustering. We then selected a conservative set of 15,527 contigs through comparisons to four Dipteran transcriptomes as well as multiple functional and conserved protein domain databases. Comparison to the Anopheles gambiae immune system identified 339 contigs as putative immune genes, thus identifying a large portion of the immune system that can form the basis for subsequent studies of this important malaria vector. We identified 5,434 1:1 orthologues between An. funestus and An. gambiae and found that among these 1:1 orthologues, the protein sequence of those with putative immune function were significantly more diverged than the transcriptome as a whole. Short read alignments to the contig set revealed almost 367,000 genetic polymorphisms segregating in the An. funestus colony and demonstrated the utility of the assembled transcriptome for use in RNA-seq based measurements of gene expression.

Conclusions/significance: We developed a pipeline that makes de novo transcriptome sequencing possible in virtually any organism at a very reasonable cost ($6,300 in sequencing costs in our case). We anticipate that our approach could be used to develop genomic resources in a diversity of systems for which full genome sequence is currently unavailable. Our An. funestus contig set and analytical results provide a valuable resource for future studies in this non-model, but epidemiologically critical, vector insect.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. De novo transcriptome assembly and analysis workflow.
Illumina reads were assembled in a series of ‘exploratory’ Velvet assemblies, the contig output of which was used in a ‘summary’ assembly. Following iterative assembly with Velvet, contigs were clustered and joined when possible, first using conspecific ESTs, then using the transcriptome of a closely related species. A final contig set was generated by selecting contigs based on bioinformatic support criteria. Illumina reads were then mapped to the final contig set and resulting alignments were used for expression profiling and polymorphism discovery. aSND refers to short nucleotide discrepancies including both single nucleotide polymorphisms and indels. bRPKM, or reads per kilobase per million mapped reads , was calculated for each contig and used to represent expression level.
Figure 2
Figure 2. Size distribution of contigs at three points of the assembly.
Note that the y-axis is broken between 10,000 and 40,000. White bars indicate the size distribution of contigs generated by the iterative Velvet assembly. Grey bars indicated the size distribution of contigs after ‘target-based’ clustering to both An. funestus ESTs and An. gambiae peptides. Black bars indicate the size distribution of the final contig set after quality filtering and bioinformatic analysis. The final contig set contains 15,527 contigs with an N50 of 1,753 bp.
Figure 3
Figure 3. Homology with Dipteran transcriptomes decreases with increasing phylogentic difference.
The number of An. funestus contigs with significant BLAST hits in pairwise comparisons to An. gambiae, Ae. aegypti, C. quinquefasciatus and D. melanogaster is plotted. Note that the y-axis only spans 9,000 to 15,000. The solid line indicates the total number of contigs with a significant BLAST hit in each comparison. The dashed line indicates the number of contigs with a significant BLAST hit in all comparisons as phylogenetic distance increases. The phylogenetic tree at the bottom of the panel depicts the evolutionary relationships between the Dipteran insects used in pairwise BLAST comparisons, with estimated divergence times (in millions of years) at each node (adapted from [53]).
Figure 4
Figure 4. Variation in transcript divergence among immune gene functional classes.
Protein sequence divergence was estimated as the proportion of aligned amino acids that differ between 1∶1 An. funestus:An. gambiae orthologues. As a class, immune gene orthologous pairs (dotted line indicates mean divergence between immune gene orthologues) are significantly more diverged than the transcriptome as a whole (solid line indicates mean divergence across the entire transcriptome; p-value = 4.8×10−5, Mann-Whitney U-test). The functional classes within the immune genes are not significantly different from each (p-values>0.05, pairwise Mann-Whitney U-tests).
Figure 5
Figure 5. PROTEIN DIVERGENCE is unevenly distributed among GO-Slim categories.
The heatplot shows proportion of 1∶1 orthologous pairs exhibiting Low, Intermediate and High protein divergence in GO-Slim functional categories. Protein divergence was estimated as the proportion of aligned amino acids that differed between the two orthologues and each orthologous pair was categorized as Low, Intermediate or High (Materials and Methods). Only categories whose proportion of each bin differed from expectations based on all orthologous pairs with a p value less than the Bonferroni-adjusted α of 5.26×10−4 are presented. The average expected proportions based on all orthologous pairs are presented at the top of the heatplot.

References

    1. WHO/UNICEF World Malaria Report. Geneva: World Health Organization; 2009.
    1. Anopheles Genomes Cluster Committee. Genome analysis of vectorial capacity in major Anopheles vectors of malaria parasites. 2008. VectorBase.org.
    1. Hudson HE. Sequencing breakthroughs for genomic ecology and evolutionary biology. Molecular Ecology Resources. 2008;8:3–17. - PubMed
    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. - PMC - PubMed
    1. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. - PMC - PubMed

Publication types