Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 11;15(11):e0240059.
doi: 10.1371/journal.pone.0240059. eCollection 2020.

High density genotype storage for plant breeding in the Chado schema of Breedbase

Affiliations

High density genotype storage for plant breeding in the Chado schema of Breedbase

Nicolas Morales et al. PLoS One. .

Abstract

Modern breeding programs routinely use genome-wide information for selecting individuals to advance. The large volumes of genotypic information required present a challenge for data storage and query efficiency. Major use cases require genotyping data to be linked with trait phenotyping data. In contrast to phenotyping data that are often stored in relational database schemas, next-generation genotyping data are traditionally stored in non-relational storage systems due to their extremely large scope. This study presents a novel data model implemented in Breedbase (https://breedbase.org/) for uniting relational phenotyping data and non-relational genotyping data within the open-source PostgreSQL database engine. Breedbase is an open-source, web-database designed to manage all of a breeder's informatics needs: management of field experiments, phenotypic and genotypic data collection and storage, and statistical analyses. The genotyping data is stored in a PostgreSQL data-type known as binary JavaScript Object Notation (JSONb), where the JSON structures closely follow the Variant Call Format (VCF) data model. The Breedbase genotyping data model can handle different ploidy levels, structural variants, and any genotype encoded in VCF. JSONb is both compressed and indexed, resulting in a space and time efficient system. Furthermore, file caching maximizes data retrieval performance. Integration of all breeding data within the Chado database schema retains referential integrity that may be lost when genotyping and phenotyping data are stored in separate systems. Benchmarking demonstrates that the system is fast enough for computation of a genomic relationship matrix (GRM) and genome wide association study (GWAS) for datasets involving 1,325 diploid Zea mays, 314 triploid Musa acuminata, and 924 diploid Manihot esculenta samples genotyped with 955,690, 142,119, and 287,952 genotype-by-sequencing (GBS) markers, respectively.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Core Chado database schema relied upon by Breedbase.
The modifications for non-relationally storing genotyping data within the Chado relational schema are highlighted in red [17].
Fig 2
Fig 2. The search wizard is the primary means of querying Breedbase and provides a means for downloading phenotypic and genotypic records in several formats.
The search consists of four query categories (1) to (4) to filter across every kind of data object in the database. In this example, traits were first selected (1) and ‘grain moisture’, ‘grain yield’, and ‘plant height’ were chosen. Then, accessions were selected (2) and from the 1,404 accessions which met the selected trait criteria 8 accessions were chosen. Then, trials were selected (3) and of the 5 field trials which met the selected trait and accessions criteria, 4 trials were chosen. Then, locations were selected (4) and the two locations which met the previous criteria were chosen. A genotyping protocol can be selected as a filter in (1) to (4); however, a default genotyping protocol is used when one is not explicitly selected. Clicking on “Related Genotype Data” brings a dialog to filter genotype data for the selected accessions by chromosome, start position, and end position prior to downloading in VCF or Dosage Matrix formats (5). Additionally, a marker set can be selected to filter downloaded genotypes further. Genotypes can be computed from parents in the pedigrees of the selected accessions if the parents were genotyped by clicking the “Compute from Parents” checkbox for (5), (6), or (7). The genomic relationship matrix (GRM) can be downloaded (6) for the selected accessions after filtering for minor allele frequency (MAF) and missing data. Three formats for downloading the GRM are available: a tab separated matrix format (.tsv), a three-column format (.tsv), and a heatmap figure (.pdf). A GWAS can be computed by selecting accessions and traits in (1) to (4) and results can be downloaded (7) as Manhattan and QQ plot figures (.pdf) or as a tabular file of the p-values (.tsv). Clicking “Related Trial Phenotypes” brings a dialog to filter phenotypes by minimum and maximum values prior to downloading phenotypic data in CSV or Excel formats (8).

References

    1. Thomson M. J. 2014. “High-Throughput SNP Genotyping to Accelerate Crop Improvement.” Plant Breeding and Biotechnology. https://www.e-sciencecentral.org/articles/SC000009999. 10.1186/1472-6750-14-50 - DOI - PMC - PubMed
    1. Chen Jiafa, Zavala Cristian, Ortega Noemi, Petroli Cesar, Franco Jorge, Burgueño Juan, et al. 2016. “The Development of Quality Control Genotyping Approaches: A Case Study Using Elite Maize Lines.” PloS One 11 (6): e0157236 10.1371/journal.pone.0157236 - DOI - PMC - PubMed
    1. Rasheed Awais, Hao Yuanfeng, Xia Xianchun, Khan Awais, Xu Yunbi, Varshney Rajeev K., et al. 2017. “Crop Breeding Chips and Genotyping Platforms: Progress, Challenges, and Perspectives.” Molecular Plant 10 (8): 1047–64. 10.1016/j.molp.2017.06.008 - DOI - PubMed
    1. Meuwissen Theo, Hayes Ben and Goddard Mike. 2016. “Genomic Selection: A Paradigm Shift in Animal Breeding.” Animal Frontiers. 10.2527/af.2016-0002. - DOI
    1. Newell Mark A. and Jannink Jean-Luc. 2014. “Genomic Selection in Plant Breeding.” Methods in Molecular Biology. 10.1007/978-1-4939-0446-4_10 - DOI - PubMed

Publication types