Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 7;50(D1):D1500-D1507.
doi: 10.1093/nar/gkab1046.

BioSamples database: FAIRer samples metadata to accelerate research data management

Affiliations

BioSamples database: FAIRer samples metadata to accelerate research data management

Mélanie Courtot et al. Nucleic Acids Res. .

Abstract

The BioSamples database at EMBL-EBI is the central institutional repository for sample metadata storage and connection to EMBL-EBI archives and other resources. The technical improvements to our infrastructure described in our last update have enabled us to scale and accommodate an increasing number of communities, resulting in a higher number of submissions and more heterogeneous data. The BioSamples database now has a valuable set of features and processes to improve data quality in BioSamples, and in particular enriching metadata content and following FAIR principles. In this manuscript, we describe how BioSamples in 2021 handles requirements from our community of users through exemplar use cases: increased findability of samples and improved data management practices support the goals of the ReSOLUTE project, how the plant community benefits from being able to link genotypic to phenotypic information, and we highlight how cumulatively those improvements contribute to more complex multi-omics data integration supporting COVID-19 research. Finally, we present underlying technical features used as pillars throughout those use cases and how they are reused for expanded engagement with communities such as FAIRplus and the Global Alliance for Genomics and Health. Availability: The BioSamples database is freely available at http://www.ebi.ac.uk/biosamples. Content is distributed under the EMBL-EBI Terms of Use available at https://www.ebi.ac.uk/about/terms-of-use. The BioSamples code is available at https://github.com/EBIBioSamples/biosamples-v4 and distributed under the Apache 2.0 license.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The BioSamples layer cake of FAIRification. Technical pillars underpin three levels of data management supporting different communities of users described in subsequent sections.
Figure 2.
Figure 2.
Sample relationships and inter-archival relationships. A donor patient sample (top middle) is hosted in EGA under controlled access for privacy and confidentiality. A tissue sample (top left) is generated from that donor and its information hosted in EGA as well. Both samples have had BioSamples IDs assigned upon submission. Metadata attributes that can be made public are then imported by BioSamples, where that metadata can be linked to the corresponding viral sample (top left) in BioSamples, which sequencing data is hosted by ENA and linked to the sample.

References

    1. Arita M., Karsch-Mizrachi I., Cochrane G.. The international nucleotide sequence database collaboration. Nucleic Acids Res. 2021; 49:D121–D124. - PMC - PubMed
    1. Courtot M., Cherubin L., Faulconbridge A., Vaughan D., Green M., Richardson D., Harrison P., Whetzel P.L., Parkinson H., Burdett T.. BioSamples database: an updated sample metadata hub. Nucleic Acids Res. 2019; 47:D1172–D1178. - PMC - PubMed
    1. Durinx C., McEntyre J., Appel R., Apweiler R., Barlow M., Blomberg N., Cook C., Gasteiger E., Kim J.-H., Lopez R.et al. .. Identifying ELIXIR core data resources. F1000Research. 2017; 5:2422. - PMC - PubMed
    1. Hendler J. Data integration for heterogenous datasets. Big Data. 2014; 2:205–215. - PMC - PubMed
    1. Le Sueur H., Bruce I.N., Geifman N.. The challenges in data integration – heterogeneity and complexity in clinical trials and patient registries of Systemic Lupus Erythematosus. BMC Med. Res. Methodol. 2020; 20:164. - PMC - PubMed

Publication types