Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul;9(7):e1001091.
doi: 10.1371/journal.pbio.1001091. Epub 2011 Jul 5.

Modernizing reference genome assemblies

Affiliations

Modernizing reference genome assemblies

Deanna M Church et al. PLoS Biol. 2011 Jul.
No abstract available

PubMed Disclaimer

Conflict of interest statement

I have read the journal's policy and have the following conflicts: Paul Flicek is married to the deputy editor of PLoS Medicine, Melissa Norton. Evan Eichler is on the board of Pacific Biosciences.

Figures

Figure 1
Figure 1. Assembly representation for GRCh37.p3.
The top panel shows an ideogram representation of the human genome. The primary assembly unit contains sequences for the non-redundant haploid assembly; this includes the scaffolds that make up the chromosome sequence as well as unplaced and unlocalized scaffolds that are thought to represent novel sequence (not shown in this picture). Alternate loci and patches are placed in separate assembly units to facilitate annotation. Note the seven alternate scaffolds in the MHC region are all placed in different assembly units, as they all represent different representations of the same sequences. Other alternate loci can be added to these assembly units at the next major release if they don’t overlap the existing alternates. All patches are placed in the PATCHES assembly unit and minor releases are cumulative such that the latest minor release will contain all patches. The red triangle, yellow circles, and blue circles represent regions that contain additional sequences that are not given actual chromosome coordinates, but rather are given a chromosome context via alignment to the primary assembly. The red triangles represent regions’ alternate loci; these are sequences that provide an additional tiling path to the one given in the chromosome representation and are essential for representing structurally complex loci. The circles represent patch sequences; these are minor updates made to the assembly outside of the major build cycle. Yellow circles represent “fix” patches: regions of the chromosome assembly that will change with the next major assembly update. Blue circles represent “novel” patches: these are sequences that represent new alternate loci in the next major assembly update. Unlocalized and unplaced sequences are not represented in this figure. Sequences within the assembly are placed within containers known as assembly units. Note: a region can point to more than one type of extra chromosomal sequence; for example, a region could point to an alternate locus and to a fix or novel patch.
Figure 2
Figure 2. Distribution of issues addressed and an example region.
(Top Panel) Issues for GRCh37, GRCh37.p1, and GRCh37.p2, broken down by type. Issue types are: Clone Problem: The issue is contained within a single clone. This may be a single nucleotide difference or a clone mis-assembly. Path Problem: There is evidence that the tiling path within a given region is incorrect and we will need to update the path. GRC Housekeeping: Changes use to help regularize the tiling path. Missing Sequence: Sequence that we can’t yet place on the assembly. Mapping studies are ongoing to help place these sequences. Variation: There is evidence to suggest that complex variation is complicating a region and an alternate allele may need to be produced. Gap: The issue concerns filling a gap. Unknown: Issue is still under investigation for classification. (Bottom Panel) Details for issue HG-2, a Path Problem. The representation in NCBI36 was a mixed haplotype. The tiling paths for NCBI36 and GRCh37 are shown. Blue clones are anchor clones that are in NCBI36, the GRCh37 chr4 path, and the GRCh37 alternate locus path. Red clones represent the UGT2B17 insertion path and dark gray clones represent the UGT2B17 deletion path. The light gray clone was not used in NCBI36, but was used in GRCh37 to complete the alternate locus.

References

    1. Ley T. J, Mardis E. R, Ding L, Fulton B, McLellan M. D, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. - PMC - PubMed
    1. Pelak K, Shianna K. V, Ge D, Maia J. M, Zhu M, et al. The characterization of twenty sequenced human genomes. PLoS Genet. 2010;6:e1001111. doi: 10.1371/journal.pgen.1001111. - DOI - PMC - PubMed
    1. Ng S. B, Turner E. H, Robertson P. D, Flygare S. D, Bigham A. W, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–276. - PMC - PubMed
    1. Lupski J. R, Reid J. G, Gonzaga-Jauregui C, Rio Deiros D, Chen D. C. Y, et al. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Engl J Med. 2010;362:1181–1191. - PMC - PubMed
    1. Alkan C, Sajjadian S, Eichler E. E. Limitations of next-generation genome sequence assembly. Nat Methods. 2011;8:61–65. - PMC - PubMed

LinkOut - more resources