Long-read sequence assembly of the gorilla genome
- PMID: 27034376
- PMCID: PMC4920363
- DOI: 10.1126/science.aae0344
Long-read sequence assembly of the gorilla genome
Abstract
Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.
Copyright © 2016, American Association for the Advancement of Science.
Figures






References
Publication types
MeSH terms
Grants and funding
- U41 HG007635/HG/NHGRI NIH HHS/United States
- HG003079/HG/NHGRI NIH HHS/United States
- HG007234/HG/NHGRI NIH HHS/United States
- U41 HG007234/HG/NHGRI NIH HHS/United States
- HG007635/HG/NHGRI NIH HHS/United States
- HG007990/HG/NHGRI NIH HHS/United States
- R01 HG006283/HG/NHGRI NIH HHS/United States
- U54 HG003079/HG/NHGRI NIH HHS/United States
- U54 HG007990/HG/NHGRI NIH HHS/United States
- R01 HG002385/HG/NHGRI NIH HHS/United States
- U24 HG007234/HG/NHGRI NIH HHS/United States
- HG002385/HG/NHGRI NIH HHS/United States
- R01 GM104390/GM/NIGMS NIH HHS/United States
- HHMI/Howard Hughes Medical Institute/United States