De novo assembly and genotyping of variants using colored de Bruijn graphs
- PMID: 22231483
- PMCID: PMC3272472
- DOI: 10.1038/ng.1028
De novo assembly and genotyping of variants using colored de Bruijn graphs
Abstract
Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex, the first de novo assembler capable of assembling multiple eukaryotic genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variations in a high-coverage human genome. Second, we identify more than 3 Mb of sequence absent from the human reference genome, in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from ten chimpanzees enables accurate variant calls without a reference sequence. Last, we estimate classical human leukocyte antigen (HLA) genotypes at HLA-B, the most variable gene in the human genome.
Figures





Similar articles
-
Succinct colored de Bruijn graphs.Bioinformatics. 2017 Oct 15;33(20):3181-3187. doi: 10.1093/bioinformatics/btx067. Bioinformatics. 2017. PMID: 28200001 Free PMC article.
-
Population-scale detection of non-reference sequence variants using colored de Bruijn graphs.Bioinformatics. 2022 Jan 12;38(3):604-611. doi: 10.1093/bioinformatics/btab749. Bioinformatics. 2022. PMID: 34726732 Free PMC article.
-
Integrating long-range connectivity information into de Bruijn graphs.Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157. Bioinformatics. 2018. PMID: 29554215 Free PMC article.
-
The present and future of de novo whole-genome assembly.Brief Bioinform. 2018 Jan 1;19(1):23-40. doi: 10.1093/bib/bbw096. Brief Bioinform. 2018. PMID: 27742661 Review.
-
State of the art de novo assembly of human genomes from massively parallel sequencing data.Hum Genomics. 2010 Apr;4(4):271-7. doi: 10.1186/1479-7364-4-4-271. Hum Genomics. 2010. PMID: 20511140 Free PMC article. Review.
Cited by
-
Telescoper: de novo assembly of highly repetitive regions.Bioinformatics. 2012 Sep 15;28(18):i311-i317. doi: 10.1093/bioinformatics/bts399. Bioinformatics. 2012. PMID: 22962446 Free PMC article.
-
Reference-based compression of short-read sequences using path encoding.Bioinformatics. 2015 Jun 15;31(12):1920-8. doi: 10.1093/bioinformatics/btv071. Epub 2015 Feb 2. Bioinformatics. 2015. PMID: 25649622 Free PMC article.
-
The Need for a Human Pangenome Reference Sequence.Annu Rev Genomics Hum Genet. 2021 Aug 31;22:81-102. doi: 10.1146/annurev-genom-120120-081921. Epub 2021 Apr 30. Annu Rev Genomics Hum Genet. 2021. PMID: 33929893 Free PMC article. Review.
-
Impact of post-alignment processing in variant discovery from whole exome data.BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z. BMC Bioinformatics. 2016. PMID: 27716037 Free PMC article.
-
Absence of genetic selection in a pathogenic Escherichia coli strain exposed to the manure-amended soil environment.PLoS One. 2018 Dec 7;13(12):e0208346. doi: 10.1371/journal.pone.0208346. eCollection 2018. PLoS One. 2018. PMID: 30532241 Free PMC article.
References
-
- Li R, Li Y, Kristiansen K, Wang J. SOAP: short oligonucleotide alignment program. Bioinformatics. 2008;24:713–4. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
Miscellaneous