. 2016 Feb 11:9:81.

doi: 10.1186/s13104-016-1903-z.

De novo construction of a "Gene-space" for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources

Christelle Aluome¹, Grégoire Aubert², Susete Alves Carvalho³, Marie-Christine Le Paslier⁴, Judith Burstin⁵, Dominique Brunel⁶

Affiliations

¹ INRA Institut National de la Recherche Agronomique, US1279 Etude du Polymorphisme des génomes Végétaux, CEA-IG/CNG Centre National de Génotypage, 2 rue Gaston Crémieux, 91057, Evry, France. christelle.aluome@u-bordeaux.fr.
² INRA Institut National de la Recherche Agronomique, UMR1347 Agroécologie, 17 rue Sully, 21065, Dijon Cedex, France. gregoire.aubert@dijon.inra.fr.
³ INRA Institut National de la Recherche Agronomique, UMR1347 Agroécologie, 17 rue Sully, 21065, Dijon Cedex, France. susete.ac@gmail.com.
⁴ INRA Institut National de la Recherche Agronomique, US1279 Etude du Polymorphisme des génomes Végétaux, CEA-IG/CNG Centre National de Génotypage, 2 rue Gaston Crémieux, 91057, Evry, France. le.paslier@cng.fr.
⁵ INRA Institut National de la Recherche Agronomique, UMR1347 Agroécologie, 17 rue Sully, 21065, Dijon Cedex, France. judith.burstin@dijon.inra.fr.
⁶ INRA Institut National de la Recherche Agronomique, US1279 Etude du Polymorphisme des génomes Végétaux, CEA-IG/CNG Centre National de Génotypage, 2 rue Gaston Crémieux, 91057, Evry, France. dominique.brunel@versailles.inra.fr.

PMID: 26864345
PMCID: PMC4750290
DOI: 10.1186/s13104-016-1903-z

De novo construction of a "Gene-space" for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources

Christelle Aluome et al. BMC Res Notes. 2016.

. 2016 Feb 11:9:81.

doi: 10.1186/s13104-016-1903-z.

Authors

Christelle Aluome¹, Grégoire Aubert², Susete Alves Carvalho³, Marie-Christine Le Paslier⁴, Judith Burstin⁵, Dominique Brunel⁶

Affiliations

¹ INRA Institut National de la Recherche Agronomique, US1279 Etude du Polymorphisme des génomes Végétaux, CEA-IG/CNG Centre National de Génotypage, 2 rue Gaston Crémieux, 91057, Evry, France. christelle.aluome@u-bordeaux.fr.
² INRA Institut National de la Recherche Agronomique, UMR1347 Agroécologie, 17 rue Sully, 21065, Dijon Cedex, France. gregoire.aubert@dijon.inra.fr.
³ INRA Institut National de la Recherche Agronomique, UMR1347 Agroécologie, 17 rue Sully, 21065, Dijon Cedex, France. susete.ac@gmail.com.
⁴ INRA Institut National de la Recherche Agronomique, US1279 Etude du Polymorphisme des génomes Végétaux, CEA-IG/CNG Centre National de Génotypage, 2 rue Gaston Crémieux, 91057, Evry, France. le.paslier@cng.fr.
⁵ INRA Institut National de la Recherche Agronomique, UMR1347 Agroécologie, 17 rue Sully, 21065, Dijon Cedex, France. judith.burstin@dijon.inra.fr.
⁶ INRA Institut National de la Recherche Agronomique, US1279 Etude du Polymorphisme des génomes Végétaux, CEA-IG/CNG Centre National de Génotypage, 2 rue Gaston Crémieux, 91057, Evry, France. dominique.brunel@versailles.inra.fr.

PMID: 26864345
PMCID: PMC4750290
DOI: 10.1186/s13104-016-1903-z

Abstract

Background: The continuing increase in size and quality of the "short reads" raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the large number of repeated sequences which are problematic for a whole-genome quality de novo assembly. Furthermore, for most SNP identification approaches in plant genetics and breeding, only the "Gene-space" regions including the promoter, exon and intron sequences are considered.

Results: We developed the iPea protocol to produce a de novo Gene-space assembly by reconstructing, in an iterative way, the non-coding sequence flanking the Unigene cDNA sequence through addition of next-generation DNA-seq data. The approach was elaborated with the large diploid genome of pea (Pisum sativum L.), rich in repetitive sequences. The final Gene-space assembly included 35,400 contigs (97 Mb), covering 88 % of the 40,227 contigs (53.1 Mb) of the PsCam_low-copy Unigen set. Its accuracy was validated by the results of the built GenoPea 13.2 K SNP Array.

Conclusion: The iPEA protocol allows the reconstruction of a Gene-space based from RNA-Seq and DNA-seq data with limited computing resources.

PubMed Disclaimer

Figures

**Fig. 1**
The evolution of the reconstruction of a genic sequence, represented by a succession of exons (*orange*) and intron (*green*). The paired-end reads are colored in *orange* or in *green* if they correspond to an intronic sequence (*green*), an exonic sequence (*orange*) or a junction intron/exon sequence (*green* and *orange*). The figure shows that during the iterations, the introns are added to the exons present in the first reference sequence (Unigene), providing additional information for the next iteration

**Fig. 2**
The method diagram. For the first iteration, HiSeq 2000 short reads were submitted as “input short read” while longer reads from MiSeq are submitted as “input long reads”. For further iteration Ij, the contigs produced by Ij-1 are used as a new “input long read”, in order to maintain the assembly already produced

**Fig. 3**
The algorithm of the processing

**Fig. 4**
During the successive iterations, a the number of de novo genomics contigs decreases, b the mean of the de novo contig length increases, until reaching a plateau at the 6th iteration

**Fig. 5**
The results of a local BLAST between the 40,227 Unigene contigs allow the estimation of the reconstruction rate of 35,400 de novo contigs at the end of the assembly (iteration I6)

See this image and copyright information in PMC

Cited by

Quick and efficient approach to develop genomic resources in orphan species: Application in Lavandula angustifolia.
Fopa Fomeju B, Brunel D, Bérard A, Rivoal JB, Gallois P, Le Paslier MC, Bouverat-Bernier JP. Fopa Fomeju B, et al. PLoS One. 2020 Dec 11;15(12):e0243853. doi: 10.1371/journal.pone.0243853. eCollection 2020. PLoS One. 2020. PMID: 33306734 Free PMC article.
Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.
Evans T, Johnson AD, Loose M. Evans T, et al. Sci Rep. 2018 Jan 12;8(1):618. doi: 10.1038/s41598-017-19128-6. Sci Rep. 2018. PMID: 29330416 Free PMC article.
HopBase: a unified resource for Humulus genomics.
Hill ST, Sudarsanam R, Henning J, Hendrix D. Hill ST, et al. Database (Oxford). 2017 Jan 1;2017(1):bax009. doi: 10.1093/database/bax009. Database (Oxford). 2017. PMID: 28415075 Free PMC article.

References

1. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. - DOI - PMC - PubMed
1. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117–1123. doi: 10.1101/gr.089532.108. - DOI - PMC - PubMed
1. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R254. - PMC - PubMed
1. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012;1:18. doi: 10.1186/2047-217X-1-18. - DOI - PMC - PubMed
1. Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB. The real cost of sequencing: higher than you think! Genome Biol. 2011;12:125. doi: 10.1186/gb-2011-12-8-125. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

De novo construction of a "Gene-space" for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources

Affiliations

De novo construction of a "Gene-space" for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous