Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(12):e28436.
doi: 10.1371/journal.pone.0028436. Epub 2011 Dec 12.

Plantagora: modeling whole genome sequencing and assembly of plant genomes

Affiliations

Plantagora: modeling whole genome sequencing and assembly of plant genomes

Roger Barthelson et al. PLoS One. 2011.

Abstract

Background: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them.

Methodology/principal findings: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website.

Conclusions/significance: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Computation metric values are presented for 4 different assemblies, each from a different sequencing platform/assembler combination, but created from similar datasets.
All datasets had a total coverage of 40× for rice chromosome one. Key: blue – 500 bp 454 reads, 16× coverage 2000 bp insert spacing with 24× coverage 8000 bp insert spacing, assembled with Newbler; red – 16× coverage 500 bp 454 fragment reads with 24× coverage, 75 bp Illumina reads with 8000 bp insert spacing, assembled with ABySS; green – 16× coverage 75 bp Illumina reads with 2000 bp insert spacing, 24× coverage 75 bp Illumina reads with 8000 bp insert spacing, assembled with ABySS; purple – 16× coverage 75 bp Illumina reads with 2000 bp insert spacing, 24× coverage 75 bp Illumina reads with 8000 bp insert spacing, assembled with Soapdenovo. Metric values were recorded during the assembly process.
Figure 2
Figure 2. Computation metric values are presented for 4 different assemblies (same assemblies described in
Figure 1 ). Key: blue – 500 bp 454 reads, 16× coverage 2000 bp insert spacing with 24× coverage 8000 bp insert spacing, assembled with Newbler; red – 16× coverage 500 bp 454 fragment reads with 24× coverage, 75 bp Illumina reads with 8000 bp insert spacing, assembled with ABySS; green – 16× coverage 75 bp Illumina reads with 2000 bp insert spacing, 24× coverage 75 bp Illumina reads with 8000 bp insert spacing, assembled with ABySS; purple – 16× coverage 75 bp Illumina reads with 2000 bp insert spacing, 24× coverage 75 bp Illumina reads with 8000 bp insert spacing, assembled with Soapdenovo.
Figure 3
Figure 3. Fidelity metrics were derived by comparing the assemblies against the original genome sequence by alignment.
Mean values are presented for representation, indel rate, and mismatch rate for each of the platform/assembler combinations used for the rice chromosome one studies. Key: green – mean representation; blue – mean indel rate; red – mean mismatch rate.
Figure 4
Figure 4. A sample page from the Plantagora website graphing tool is presented.
The graphs shown are of the scaffold N50 values vs. total coverage of rice chromosome one for ABySS assemblies of 75 bp Illumina reads, with 2000 bp insert size for dataset A, a 3/2 ratio of dataset A reads to dataset B, and 8000, 20000, and 40000 bp insert sizes for dataset B.

References

    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. The Sequence of the Human Genome. Science. 2001;291:1304–1351. - PubMed
    1. Dib C, Faure S, Fizames C, Samson D, Drouot N, et al. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature. 1996;380:152–154. - PubMed
    1. Settles AM, Byrne M. Opportunities and challenges grow from Arabidopsis genome sequencing. Genome Res. 1998;8:83–85. - PubMed
    1. Bevan M, Mayer K, White O, Eisen JA, Preuss D, et al. Sequence and analysis of the Arabidopsis genome. Current Opinion in Plant Biology. 2001;4:105–110. - PubMed
    1. Wicker T, Schlagenhauf E, Graner A, Close TJ, Keller B, et al. 454 sequencing put to the test using the complex genome of barley. BMC Genomics. 2006;7:275. - PMC - PubMed

Publication types