Assembly: a resource for assembled genomes at NCBI

Affiliations

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA kitts@ncbi.nlm.nih.gov.
² National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

PMID: 26578580
PMCID: PMC4702866
DOI: 10.1093/nar/gkv1226

Assembly: a resource for assembled genomes at NCBI

Paul A Kitts et al. Nucleic Acids Res. 2016.

. 2016 Jan 4;44(D1):D73-80.

doi: 10.1093/nar/gkv1226. Epub 2015 Nov 17.

Affiliations

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA kitts@ncbi.nlm.nih.gov.
² National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

PMID: 26578580
PMCID: PMC4702866
DOI: 10.1093/nar/gkv1226

Abstract

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.

Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

PubMed Disclaimer

Figures

**Figure 1.**
Same name and different sequence content: the Zv7 UCSC and NCBI zebrafish assemblies. Panel A: part of chr21 in the Zv7 zebrafish assembly as displayed in the UCSC genome browser (http://genome.ucsc.edu). Panel B: the same span of chr21 of the Zv7 assembly as displayed in the NCBI Sequence Viewer. The UCSC Zv7 assembly has many Ensembl gene predictions in this region of chr21, whereas the same region in the RefSeq version of Zv7 chr21 at NCBI shows the rb1 and dub genes on the right but no other gene models. The reason for this discrepancy is that NCBI found that one component in this region matched sequences from mouse chromosome X and replaced this foreign component with a gap when they made the RefSeq version of chr21. Zv7 has since been replaced by newer versions of the zebrafish assembly that do not have the mouse contamination.

**Figure 2.**
The NCBI genome assembly model. The diagram depicts the assembly organization for a eukaryote with two nuclear chromosomes and a mitochondrial genome. The full assembly is comprised of a primary assembly-unit containing nuclear sequences, a non-nuclear assembly-unit containing mitochondrial sequences and an alternate locus group assembly-unit containing scaffolds that have been aligned to chromosome 2 of the primary assembly.

**Figure 3.**
An example of the Assembly details page. The figure shows the upper portion of the cat (*Felis catus*) genome assembly GCF_000181335.2 page, including the metadata section and global statistics table. This figure does not show the lower portion of the page that contains tables displaying the assembly contents and detailed statistics.

See this image and copyright information in PMC

References

1. van Dijk E.L., Auger H., Jaszczyszyn Y., Thermes C. Ten years of next-generation sequencing technology. Trends Genet. 2014;30:418–426. - PubMed
1. O'Leary N., Wright M., Brister J., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D., et al. Reference Sequence (RefSeq) Database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016 doi:10.1093/nar/gkv1189. - PMC - PubMed
1. Federhen S., Clark K., Barrett T., Parkinson H., Ostell J., Kodama Y., Mashima J., Nakamura Y., Cochrane G., Karsch-Mizrachi I. Toward richer metadata for microbial sequences: replacing strain-level NCBI taxonomy taxids with BioProject, BioSample and Assembly records. Stand. Genomic Sci. 2014;9:1275–1277. - PMC - PubMed
1. Barrett T., Clark K., Gevorgyan R., Gorelenkov V., Gribov E., Karsch-Mizrachi I., Kimelman M., Pruitt K.D., Resenchuk S., Tatusova T., et al. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 2012;40:D57–D63. - PMC - PubMed
1. Kollmar M., Kollmar L., Hammesfahr B., Simm D. diArk–the database for eukaryotic genome and transcriptome assemblies in 2014. Nucleic Acids Res. 2015;43:D1107–D1112. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assembly: a resource for assembled genomes at NCBI

Affiliations

Assembly: a resource for assembled genomes at NCBI

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous