. 2022 Jun 23;13(7):1129.

doi: 10.3390/genes13071129.

The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus

Mike Ruigrok^{1

2}, Bing Xue², Andrew Catanach¹, Mengjie Zhang², Linley Jesson¹, Marcus Davy¹, Maren Wellenreuther^{1

3}

Affiliations

¹ The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand.
² Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand.
³ School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand.

PMID: 35885912
PMCID: PMC9320665
DOI: 10.3390/genes13071129

The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus

Mike Ruigrok et al. Genes (Basel). 2022.

. 2022 Jun 23;13(7):1129.

doi: 10.3390/genes13071129.

Authors

Mike Ruigrok^{1

2}, Bing Xue², Andrew Catanach¹, Mengjie Zhang², Linley Jesson¹, Marcus Davy¹, Maren Wellenreuther^{1

3}

Affiliations

¹ The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand.
² Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand.
³ School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand.

PMID: 35885912
PMCID: PMC9320665
DOI: 10.3390/genes13071129

Abstract

Background: Genetic diversity provides the basic substrate for evolution. Genetic variation consists of changes ranging from single base pairs (single-nucleotide polymorphisms, or SNPs) to larger-scale structural variants, such as inversions, deletions, and duplications. SNPs have long been used as the general currency for investigations into how genetic diversity fuels evolution. However, structural variants can affect more base pairs in the genome than SNPs and can be responsible for adaptive phenotypes due to their impact on linkage and recombination. In this study, we investigate the first steps needed to explore the genetic basis of an economically important growth trait in the marine teleost finfish Chrysophrys auratus using both SNP and structural variant data. Specifically, we use feature selection methods in machine learning to explore the relative predictive power of both types of genetic variants in explaining growth and discuss the feature selection results of the evaluated methods.

Methods: SNP and structural variant callers were used to generate catalogues of variant data from 32 individual fish at ages 1 and 3 years. Three feature selection algorithms (ReliefF, Chi-square, and a mutual-information-based method) were used to reduce the dataset by selecting the most informative features. Following this selection process, the subset of variants was used as features to classify fish into small, medium, or large size categories using KNN, naïve Bayes, random forest, and logistic regression. The top-scoring features in each feature selection method were subsequently mapped to annotated genomic regions in the zebrafish genome, and a permutation test was conducted to see if the number of mapped regions was greater than when random sampling was applied.

Results: Without feature selection, the prediction accuracies ranged from 0 to 0.5 for both structural variants and SNPs. Following feature selection, the prediction accuracy increased only slightly to between 0 and 0.65 for structural variants and between 0 and 0.75 for SNPs. The highest prediction accuracy for the logistic regression was achieved for age 3 fish using SNPs, although generally predictions for age 1 and 3 fish were very similar (ranging from 0-0.65 for both SNPs and structural variants). The Chi-square feature selection of SNP data was the only method that had a significantly higher number of matches to annotated genomic regions of zebrafish than would be explained by chance alone.

Conclusions: Predicting a complex polygenic trait such as growth using data collected from a low number of individuals remains challenging. While we demonstrate that both SNPs and structural variants provide important information to help understand the genetic basis of phenotypic traits such as fish growth, the full complexities that exist within a genome cannot be easily captured by classical machine learning techniques. When using high-dimensional data, feature selection shows some increase in the prediction accuracy of classification models and provides the potential to identify unknown genomic correlates with growth. Our results show that both SNPs and structural variants significantly impact growth, and we therefore recommend that researchers interested in the genotype-phenotype map should strive to go beyond SNPs and incorporate structural variants in their studies as well. We discuss how our machine learning models can be further expanded to serve as a test bed to inform evolutionary studies and the applied management of species.

Keywords: Chrysophrys auratus; feature selection; growth; prediction; single-nucleotide polymorphisms; structural variants.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Overview of the processing of the genomic data.

**Figure 2**
Density plot of the distribution of growth of 1- and 3-year-old snappers. The bins small, medium, and large are coloured red, green, and blue for year 1, respectively, and for year 3 they are orange, purple, and black, respectively.

**Figure 3**
Circos plots of density (proportion of 100 kb genomic windows covered by variant class, (a) and locations and sizes (in base pairs, (b)) of structural variants called by Parliament2 within the 32 F2 samples and merged by SURVIVOR. In addition, the density plot shows the top 25 growth areas from the relief feature selection of SNPs and structural variants. In legends, dels = deletions, invs = inversions, ins = insertions, and dups = duplications.

**Figure 4**
Prediction accuracy (%) for the different feature selection sets of classification algorithm using a Chi-square feature selection method.

**Figure 5**
Prediction accuracy (%) for the different feature selection sets of classification algorithms using a mutual Information feature selection method.

**Figure 6**
Prediction accuracy (%) for the different feature selection sets of classification algorithms using a ReliefF feature selection method.

**Figure 7**
Permutation test of top 200 features compared to sampling 200 features randomly across the genomic windows. Solid lines are the number of positive hits to a genomic annotation of zebrafish (black is ReliefF feature selection, blue is mutual information feature selection, and orange is Chi-square feature selection). Dotted lines are the 99% confidence intervals for the 1000 sets of 200 randomly sampled features.

**Figure 8**
Permutation test of top 200 features compared to sampling 200 features randomly across the genomic windows. Solid lines are the number of positive hits to a genomic annotation of zebrafish that contain the words “Growth” or “Development” in the cellular function column (black is relief feature selection, blue is mutual information feature selection, and orange is Chi-square feature selection). Dotted lines are the 99% confidence intervals for the 1000 sets of 200 randomly sampled features.

See this image and copyright information in PMC

Cited by

Advancing genetic improvement in the omics era: status and priorities for United States aquaculture.
Andersen LK, Thompson NF, Abernathy JW, Ahmed RO, Ali A, Al-Tobasei R, Beck BH, Calla B, Delomas TA, Dunham RA, Elsik CG, Fuller SA, García JC, Gavery MR, Hollenbeck CM, Johnson KM, Kunselman E, Legacki EL, Liu S, Liu Z, Martin B, Matt JL, May SA, Older CE, Overturf K, Palti Y, Peatman EJ, Peterson BC, Phelps MP, Plough LV, Polinski MP, Proestou DA, Purcell CM, Quiniou SMA, Raymo G, Rexroad CE, Riley KL, Roberts SB, Roy LA, Salem M, Simpson K, Waldbieser GC, Wang H, Waters CD, Reading BJ; Aquaculture Genomics, Genetics and Breeding Workshop. Andersen LK, et al. BMC Genomics. 2025 Feb 17;26(1):155. doi: 10.1186/s12864-025-11247-z. BMC Genomics. 2025. PMID: 39962419 Free PMC article. Review.
Non-synonymous variation and protein structure of candidate genes associated with selection in farm and wild populations of turbot (Scophthalmus maximus).
Andersen Ø, Rubiolo JA, Pirolli D, Aramburu O, Pampín M, Righino B, Robledo D, Bouza C, De Rosa MC, Martínez P. Andersen Ø, et al. Sci Rep. 2023 Feb 21;13(1):3019. doi: 10.1038/s41598-023-29826-z. Sci Rep. 2023. PMID: 36810752 Free PMC article.

References

1. May R.M. Biological diversity: Differences between land and sea. Philos. Trans. R. Soc. London. Ser. B Biol. Sci. 1994;343:105–111.
1. Mérot C., Oomen R., Tigano A., Wellenreuther M. A Roadmap for Understanding the Evolutionary Significance of Structural Genomic Variation. Trends Ecol. Evol. 2020;35:561–572. doi: 10.1016/j.tree.2020.03.002. - DOI - PubMed
1. Wellenreuther M., Mérot C., Berdan E., Bernatchez L. Going beyond SNPs: The role of structural genomic variants in adaptive evolution and species diversification. Mol. Ecol. 2019;28:1203–1209. doi: 10.1111/mec.15066. - DOI - PubMed
1. Chain F.J.J., Feulner P.G.D. Ecological and evolutionary implications of genomic structural variations. Front. Genet. 2014;5:326. doi: 10.3389/fgene.2014.00326. - DOI - PMC - PubMed
1. Chain F.J.J., Feulner P.G.D., Panchal M., Eizaguirre C., Samonte I.E., Kalbe M., Lenz T.L., Stoll M., Bornberg-Bauer E., Milinski M., et al. Extensive Copy-Number Variation of Young Genes across Stickleback Populations. PLoS Genet. 2014;10:e1004830. doi: 10.1371/journal.pgen.1004830. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus

Affiliations

The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources