Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 15;23(1):80.
doi: 10.1186/s13059-022-02650-w.

Target-oriented prioritization: targeted selection strategy by integrating organismal and molecular traits through predictive analytics in breeding

Affiliations

Target-oriented prioritization: targeted selection strategy by integrating organismal and molecular traits through predictive analytics in breeding

Wenyu Yang et al. Genome Biol. .

Abstract

Genomic prediction in crop breeding is hindered by modeling on limited phenotypic traits. We propose an integrative multi-trait breeding strategy via machine learning algorithm, target-oriented prioritization (TOP). Using a large hybrid maize population, we demonstrate that the accuracy for identifying a candidate that is phenotypically closest to an ideotype, or target variety, achieves up to 91%. The strength of TOP is enhanced when omics level traits are included. We show that TOP enables selection of inbreds or hybrids that outperform existing commercial varieties. It improves multiple traits and accurately identifies improved candidates for new varieties, which will greatly influence breeding.

Keywords: Crop breeding; Genomic prediction; Machine learning; Multiple traits; Omics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Multiple selection schemes in crop breeding. A The schematic workflow of the TOP algorithm. By learning the optimal trait weights using the maximum likelihood algorithm, genomic predictions of multiple traits are integrated to select the best individual candidates from diverse breeding pools, maximizing the global similarity to an ideotype or target. B Flowchart illustrating the process of model building, multi-omics data test, and field performance test in the present study
Fig. 2
Fig. 2
Genomic prediction of agronomic traits in a maize NCII population. A The performance of prediction accuracy by different training datasets and the prediction accuracy between specifically and randomly selected training datasets. In the 5820 F1 hybrids with 194 maternal and 30 paternal lines, the F1s of 1 to 5 diagonal strips were chosen to be the training set, and the remaining F1s acted as the testing set (see Supplementary Fig. 1 for details). The prediction accuracy is evaluated by the Pearson correlation coefficient (r) between the prediction and measured phenotype in the testing set. The red dot indicates the prediction accuracy using three-diagonal strips as the training set (569 individuals). The violin plot indicates the prediction accuracy of 569 randomly selected individuals used as training sets that are repeated 100 times. The significant difference between three-diagonal F1s and random F1s used as training sets is marked by an asterisk for each trait (P<0.01, Student’s t test). B Trait performances achieved by selecting the earliest flowering F1 hybrids. From the testing set of 5251 F1 hybrids, the 100 earliest flowering individuals were selected based on genomic prediction of days to tassel. The early flowering-based selection (red) were compared to the remaining individuals (blue) for days to tassel and ear weight, based on Student’s t test (P<0.01)
Fig. 3
Fig. 3
The performance of multi-trait selection methods. A The learning workflow of TOP algorithm. It iteratively learned the trait weights, the whole process firstly learned the weights by towards trait prediction accuracy (assessed by correlation between each round of simulated weight and prediction accuracy), then attempted to learning the trait balance status. The process converged until the identification rate goes stable. B The performance of multi-trait selection methods. Three methods are tested, including (i) independent culling levels (red line); (ii) three scenarios of weights on economic value for index selection called Index1, Index2, and Index3 (blue dash line, blue bold dash line, and blue line); and (iii) TOP (orange line)
Fig. 4
Fig. 4
Improvement of TOP accuracy driven by robust omics data. AC Identification rate of TOP increases when more omics traits are included in the model. For the Maize368 dataset, 17 agronomic traits (Agro), 88 transcriptomic traits (Exp), and 24 metabolic traits (Met) were sequentially added in the TOP model; For the Maize282 dataset, 21 agronomic traits, 144 transcriptomic traits from developing tissues (Exp1) and 182 transcriptomic traits from adult tissues (Exp2) were sequentially added in the TOP model; For Rice210 dataset, 4 agronomic traits (Agro), 46 transcriptomic traits (Exp), and 38 metabolic traits (Met) were included. All omics data with single-trait prediction accuracy less than 0.25 were excluded from the analyses. DF Identification rate improvement due to filtering low-quality data. Before model training, traits with prediction accuracy (r) greater than 0.5 were considered; after training, traits with poor weights (w<0) were excluded from the model
Fig. 5
Fig. 5
Selecting individuals with either earlier or later flowering than Zhengdan958. A The distribution of flowering time of the 100 individuals most similar to the target, with the 5% earlier (red) or later (blue) flowering individuals relative to Zhengdan958 (the black vertical line). The proportion of individuals selected with earlier and later flowering compared to Zhengdan958 is indicated by the value before the slash in red and blue, respectively, while the proportion of randomly selected individuals is after the slash in both cases. B The global similarity between selected individuals and Zhengdan958. The global similarity is measured by the mean squared error (MSE) for all traits excluding days to tassel between each selected individual and Zhengdan958; lower MSE values indicate higher global similarity. Three selection scenarios, early-version Zhengdan958 (red), late-version Zhengdan958 (blue), and Zhengdan958 itself (yellow), are presented for comparison with the randomly selected individuals, based on Student’s t test
Fig. 6
Fig. 6
Selecting individuals with earlier flowering and shorter plant stature than Zhengdan958. A Scatter plot of flowering time and plant height of selected individuals. The red dots indicate the earlier flowering, shorter (early and short) version of Zhengdan958, and the blue dots indicate the late and tall version. The black vertical and horizontal lines indicate flowering time (days to tassel) and plant height (cm) of Zhengdan958. The proportion of individuals selected by TOP as early & short compared to Zhengdan958 is indicated by the percentage before the slash in red, and late & tall in blue, and the proportion of individuals selected randomly after the slash in both cases. B. The global similarity between selected individuals and Zhengdan958. The similarity measurement excluded days to tassel and plant height. Three selection scenarios, early and short, late and tall, and original version of Zhengdan958, were compared with randomly selected individuals based on Student’s t test

Similar articles

Cited by

References

    1. Steinwand MA, Ronald PC. Crop biotechnology and the future of food. Nat Food. 2020;1(5):273–283.
    1. Hickey JM, Chiurugwi T, Mackay I, Powell W, Eggen A, Kilian A, Jones C, Canales C, Grattapaglia D, Bassi F. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nat Genet. 2017;49(9):1297. - PubMed
    1. Lusser M, Parisi C, Plan D, Rodríguez-Cerezo E. Deployment of new biotechnologies in plant breeding. Nat Biotechnol. 2012;30(3):231–239. - PMC - PubMed
    1. Hickey LT, Hafeez AN, Robinson H, Jackson SA, Leal-Bertioli SC, Tester M, Gao C, Godwin ID, Hayes BJ, Wulff BB. Breeding crops to feed 10 billion. Nat Biotechnol. 2019;37(7):744–754. - PubMed
    1. Borlaug NE. Contributions of conventional plant breeding to food production. Science. 1983;219(4585):689–693. - PubMed

Publication types

LinkOut - more resources