Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 3;21(1):378.
doi: 10.1186/s12859-020-03694-0.

"METAGENOTE: a simplified web platform for metadata annotation of genomic samples and streamlined submission to NCBI's sequence read archive"

Affiliations

"METAGENOTE: a simplified web platform for metadata annotation of genomic samples and streamlined submission to NCBI's sequence read archive"

Mariam Quiñones et al. BMC Bioinformatics. .

Abstract

Background: The improvements in genomics methods coupled with readily accessible high-throughput sequencing have contributed to our understanding of microbial species, metagenomes, infectious diseases and more. To maximize the impact of these genomics studies, it is important that data from biological samples will become publicly available with standardized metadata. The availability of data at public archives provides the hope that greater insights could be obtained through integration with multi-omics data, reproducibility of published studies, or meta-analyses of large diverse datasets. These datasets should include a description of the host, organism, environmental source of the specimen, spatial-temporal information and other relevant metadata, but unfortunately these attributes are often missing and when present, they show inconsistencies in the use of metadata standards and ontologies.

Results: METAGENOTE ( https://metagenote.niaid.nih.gov ) is a web portal that greatly facilitates the annotation of samples from genomic studies and streamlines the submission process of sequencing files and metadata to the Sequence Read Archive (SRA) (Leinonen R, et al, Nucleic Acids Res, 39:D19-21, 2011) for public access. This platform offers a wide selection of packages for different types of biological and experimental studies with a special emphasis on the standardization of metadata reporting. These packages follow the guidelines from the MIxS standards developed by the Genomics Standard Consortium (GSC) and adopted by the three partners of the International Nucleotides Sequencing Database Collaboration (INSDC) (Cochrane G, et al, Nucleic Acids Res, 44:D48-50, 2016) - National Center for Biotechnology Information (NCBI), European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). METAGENOTE then compiles, validates and manages the submission through an easy-to-use web interface minimizing submission errors and eliminating the need for submitting sequencing files via a separate file transfer mechanism.

Conclusions: METAGENOTE is a public resource that focuses on simplifying the annotation and submission process of data with its corresponding metadata. Users of METAGENOTE will benefit from the easy to use annotation interface but most importantly will be encouraged to publish metadata following standards and ontologies that make the public data available for reuse.

Keywords: Genomic samples; Metadata; Ontologies; Sequence read archive; Web platform.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
METAGENOTE Application Diagram. METAGENOTE allows annotation and transfer of files to SRA. METAGENOTE prepares an XML file in the format required by the SRA’s API to automatically create all records for BioProject, BioSamples and Runs. Sequence files are uploaded to METAGENOTE via a drag and drop action into the browser. METAGENOTE finally sends raw files and the XML file to NCBI’s server
Fig. 2
Fig. 2
METAGENOTE simplifies sample annotation. Metadata annotation can be done by typing directly into the sample group annotation Table (2a), selecting frequently-used words available in the drop-down menu or importing words using the ontology search functionality within the right+click menu option (2b). In addition, the right pane provides a description search box (2c) to aid user in finding information of the required field or to open an anatomy selection tool (2d)

References

    1. Leinonen R, Sugawara H, Shumway M. International nucleotide sequence database C: the sequence read archive. Nucleic Acids Res. 2011;39(Database issue):D19–D21. doi: 10.1093/nar/gkq1019. - DOI - PMC - PubMed
    1. SRA Database Growth [https://www.ncbi.nlm.nih.gov/sra/docs/sragrowth]..
    1. Genomics Standards Consortium (GSC) [https://gensc.org/].
    1. Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L, Gilbert JA, Karsch-Mizrachi I, Johnston A, Cochrane G, et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol. 2011;29(5):415–420. doi: 10.1038/nbt.1823. - DOI - PMC - PubMed
    1. Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J, Kolesnikov N, Zhukova A, Brazma A, Parkinson H. Modeling sample variables with an experimental factor ontology. Bioinformatics. 2010;26(8):1112–1118. doi: 10.1093/bioinformatics/btq099. - DOI - PMC - PubMed