Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 1;8(10):giz126.
doi: 10.1093/gigascience/giz126.

Access to RNA-sequencing data from 1,173 plant species: The 1000 Plant transcriptomes initiative (1KP)

Affiliations

Access to RNA-sequencing data from 1,173 plant species: The 1000 Plant transcriptomes initiative (1KP)

Eric J Carpenter et al. Gigascience. .

Abstract

Background: The 1000 Plant transcriptomes initiative (1KP) explored genetic diversity by sequencing RNA from 1,342 samples representing 1,173 species of green plants (Viridiplantae).

Findings: This data release accompanies the initiative's final/capstone publication on a set of 3 analyses inferring species trees, whole genome duplications, and gene family expansions. These and previous analyses are based on de novo transcriptome assemblies and related gene predictions. Here, we assess their data and assembly qualities and explain how we detected potential contaminations.

Conclusions: These data will be useful to plant and/or evolutionary scientists with interests in particular gene families, either across the green plant tree of life or in more focused lineages.

Keywords: RNA; assemblies; contamination; genes; plants; transcriptome completeness.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
A, Overview of the total sequence percentage verified to be of contaminant origin (red), or inferred to be possible contaminants in other sequence libraries (grey) in all 1KP libraries, and libraries inferred to be contaminated through the 18S phylogenetic placement. B, 21 libraries in which >6% of the total sequences are potential contaminants. C, Heat map of inferred contaminant interactions between pairs of species; contaminated species are shown on the vertical axis and contaminating species on the horizontal axis.
Figure 2:
Figure 2:
Fraction of the gene sets found (complete + fragments) vs the number of scaffolds (≥300 bp) in the assemblies. For each sample, the fractions of the eukaryota and embryophyta sets found in the assemblies are calculated with BUSCO and the fraction of the CEGMA 248 set with the CRBB tool. All 3 sets are more completely recovered at higher scaffold counts, but the BUSCO embryophyta set is less complete in our samples.

References

    1. One Thousand Plant Transcriptomes Initiative. One thousand plant transcriptomes and the phylogenomics of green plants. Nature, 574:2019, doi: 10.1038/s41586-019-1693-2. - DOI - PMC - PubMed
    1. Wickett NJ, Mirarab S, Nguyen N, et al.. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad Sci U S A. 2014;111:E4859–68. - PMC - PubMed
    1. Li Z, Barker MS. Inferring putative ancient whole genome duplications in the 1000 Plants (1KP) initiative: Access to gene family phylogenies and age distributions. bioRxiv. 2019:735076 https://www.biorxiv.org/content/10.1101/735076v1. - DOI - PMC - PubMed
    1. Matasci N, Hung L-H, Yan Z, et al.. Data access for the 1,000 Plants (1KP) project. Gigascience. 2014;3:17. - PMC - PubMed
    1. Johnson MTJ, Carpenter EJ, Tian Z, et al.. Evaluating methods for isolating total RNA and predicting the success of sequencing phylogenetically diverse plant transcriptomes. PLOS One. 2012;7(11):e50226. - PMC - PubMed