Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 23;12(3):e0173300.
doi: 10.1371/journal.pone.0173300. eCollection 2017.

Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut

Affiliations

Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut

Alix Armero et al. PLoS One. .

Abstract

The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Methodology and results obtained on coconut.
(A) Principal steps of the methodology to recover a target proteome, using transcriptomic data and a reference proteome from a related better studied organism. (B). Number of sequences generated on coconut by our methodology starting from four coconut transcriptomes. Sequences in red represent the number of new sequences found by BRANCH. Sequences in orange represent de novo transcriptomes assembled by Trinity. Protein products (PP) generated at step 5 (in dark red) combine information from both categories of sequences.
Fig 2
Fig 2. Analysis of sensitivity and specificity in Arabidopsis thaliana.
Curves of sensitivity (triangles) and specificity (stars) in Arabidopsis thaliana for different filtering parameters (step 4) using Arabidopsis lyrata (in red) and Capsella rubella (in blue) as reference species. The filtering parameters consist of three digits. The first represents the identity, the second is the coverage for target polypeptides covering more than 40% of the protein of expected group (reference), and the last is the coverage of target polypeptides covering 40% or less of the protein of the expected group.
Fig 3
Fig 3. Increase on the length of A. thaliana PPs.
Length distribution of polypeptides in step 4 (in orange) and 5 (in green) and distribution of Arabidopsis thaliana TAIR10 proteome (in blue). Protein lengths are indicated as discrete length ranges, from 0–200 to >2000 aa.
Fig 4
Fig 4. Scaffolding of coconut polypeptides homologous to oil palm ECERIFERUM 26-like protein.
(A) Snapshot of oil palm protein, coconut polypeptides and coconut PP in ‘Palmcomparomics’ Jbrowse. Polypeptides within red squares were used for scaffolding of coconut PP. (B) Functional domain identified in coconut sequences and oil palm protein. The sequence O64470.1 of Arabidopsis thaliana is representative of the condensation superfamily. (C) Alignment between oil palm protein, coconut polypeptides and coconut PP around the overlap region.

References

    1. Couvreur TL, Baker WJ. Tropical rain forest evolution: palms as a model group. BMC Biology. 2013;11(1):1–4. - PMC - PubMed
    1. Gunn BF, Baudouin L, Olsen KM. Independent Origins of Cultivated Coconut (Cocos nucifera L.) in the Old World Tropics. PLoS ONE. 2011. June 22;6(6):e21143 10.1371/journal.pone.0021143 - DOI - PMC - PubMed
    1. Cocos nucifera—GQuery: Global Cross-database NCBI search—NCBI [Internet]. [cited 2016 Aug 2]. Available from http://www.ncbi.nlm.nih.gov/gquery/?term=Cocos+nucifera
    1. Huang Y-Y, Lee C-P, Fu JL, Chang BC-H, Matzke AJM, Matzke M. De Novo Transcriptome Sequence Assembly from Coconut Leaves and Seeds with a Focus on Factors Involved in RNA-Directed DNA Methylation. G3: Genes|Genomes|Genetics. 2014. November;4(11):2147–57. - PMC - PubMed
    1. Fan H, Xiao Y, Yang Y, Xia W, Mason AS, Xia Z, et al. RNA-Seq Analysis of Cocos nucifera: Transcriptome Sequencing and De Novo Assembly for Subsequent Functional Genomics Approaches. PLoS ONE. 2013. March 29;8(3):e59997 10.1371/journal.pone.0059997 - DOI - PMC - PubMed

LinkOut - more resources