Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul;25(5):e13844.
doi: 10.1111/1755-0998.13844. Epub 2023 Aug 1.

Easy-to-use R functions to separate reduced-representation genomic datasets into sex-linked and autosomal loci, and conduct sex assignment

Affiliations

Easy-to-use R functions to separate reduced-representation genomic datasets into sex-linked and autosomal loci, and conduct sex assignment

Diana A Robledo-Ruiz et al. Mol Ecol Resour. 2025 Jul.

Abstract

Identifying sex-linked markers in genomic datasets is important because their presence in supposedly neutral autosomal datasets can result in incorrect estimates of genetic diversity, population structure and parentage. However, detecting sex-linked loci can be challenging, and available scripts neglect some categories of sex-linked variation. Here, we present new R functions to (1) identify and separate sex-linked loci in ZW and XY sex determination systems and (2) infer the genetic sex of individuals based on these loci. We tested these functions on genomic data for two bird and one mammal species and compared the biological inferences made before and after removing sex-linked loci using our function. We found that our function identified autosomal loci with ≥98.8% accuracy and sex-linked loci with an average accuracy of 87.8%. We showed that standard filters, such as low read depth and call rate, failed to remove up to 54.7% of sex-linked loci. This led to (i) overestimation of population FIS by up to 24%, and the number of private alleles by up to 8%; (ii) wrongly inferring significant sex differences in heterozygosity; (iii) obscuring genetic population structure and (iv) inferring ~11% fewer correct parentages. We discuss how failure to remove sex-linked markers can lead to incorrect biological inferences (e.g. sex-biased dispersal and cryptic population structure) and misleading management recommendations. For reduced-representation datasets with at least 15 known-sex individuals of each sex, our functions offer convenient resources to remove sex-linked loci and to sex the remaining individuals (freely available at https://github.com/drobledoruiz/conservation_genomics).

Keywords: COLONY; bioinformatic filtering; molecular sexing; multilocus contigs; sex chromosomes; sex‐linked loci.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest to declare.

Figures

FIGURE 1
FIGURE 1
Schematic of the distribution patterns of three types of sex‐linked loci in the ZW sex‐determination system: W‐linked loci are found only in the W chromosome (yellow); Z‐linked loci are found only in the Z chromosome (orange); gametologous loci are present in both chromosomes (green). The same principles apply to the XY sex‐determination system, but males are heterogametic (XY) and females homogametic (XX).
FIGURE 2
FIGURE 2
Graphical representation of the expected call rate and proportion of heterozygous individuals for autosomal and sex‐linked loci. (a) Autosomal loci (grey) are expected to present roughly the same call rate for males and females. W‐linked loci (yellow) are expected to be called in females but absent in males because males lack a W chromosome. We refer to other loci whose call rate is biased by sex as ‘sex‐biased’ (blue, drawn here for male‐bias in call rate). (b) Autosomal loci (grey) are expected to present roughly the same proportion of heterozygous males and females. For Z‐linked loci (orange), females are expected to be homozygous because they have only one Z chromosome. For gametologous loci (green), males are expected to be homozygous because they have two Z chromosomes, each with the same Z‐associated allele.
FIGURE 3
FIGURE 3
Plots produced by function filter.sex.linked after being used to identify and remove sex‐linked loci from eastern yellow robin (EYR) genetic data. Top panels: plots of female call rate against male call rate in which each point represents a locus, before (a) and after (b) removing 2639 sex‐linked loci with differential call rates between the sexes. Bottom panels: plots of the proportion of heterozygous females against the proportion of heterozygous males with each point representing a locus, before (c) and after (d) removing 1168 sex‐linked loci with differential heterozygosity between the sexes.
FIGURE 4
FIGURE 4
The proportion of true sex‐linked loci that function filter.sex.linked was able to identify with a variable number of known‐sex individuals for EYR (a), YTH (b) and LBP (c) datasets. The sex ratio of known‐sex individuals was 1:1, except for ‘all’ which included the whole set of known‐sex individuals (EYR: 352 females and 429 males, YTH: 289 females and 347 males, LBP: 164 females and 212 males). The proportion of individuals that were assigned a definite sex (‘M’ or ‘F’) by function infer.sex using the sex‐linked loci identified with a variable number of known‐sex individuals for EYR (d), YTH (e), and LBP (f) datasets. In black is the accuracy of definite sex assignments.
FIGURE 5
FIGURE 5
Progression of four types of sex‐linked loci after different SNP filtering steps (‘Standard’ filtering regime) were applied to eastern yellow robin (EYR), yellow‐tufted honeyeater (YTH), and Leadbeater's possum (LBP) datasets. Arrows to the right indicate the percentage of sex‐linked loci (out of the initial 100%) that were removed. Down arrows indicate the percentage of sex‐linked loci (out of the initial 100%) that remain in the dataset.
FIGURE 6
FIGURE 6
Percentage change of six measures of population genetic diversity after removing sex‐linked loci (AR, allelic richness; FIS, Wright's F IS; He, expected heterozygosity; Ho, observed heterozygosity; P, polymorphism; PA, private alleles). Estimates are given per population of eastern yellow robin (EYR), yellow‐tufted honeyeater (YTH), and Leadbeater's possum (LBP).
FIGURE 7
FIGURE 7
Principal component analyses (PCA) of the genomic dataset of eastern yellow robin, EYR, before (top panels) and after (bottom panels) removing sex‐linked loci. In (a) and (c), individuals are coloured according to their population. In (b) and (d), individuals are coloured by sex.

Similar articles

Cited by

References

    1. Ahrens, C. W. , Jordan, R. , Bragg, J. , Harrison, P. A. , Hopley, T. , Bothwell, H. , & Rymer, P. D. (2021). Regarding the F‐word: The effects of data filtering on inferred genotype‐environment associations. Molecular Ecology Resources, 21(5), 1460–1474. - PubMed
    1. Allendorf, F. W. , Funk, W. C. , Aitken, S. N. , Byrne, M. , & Luikart, G. (2022). Conservation and the genomics of populations. Oxford University Press. - PMC - PubMed
    1. Altschul, S. F. , Gish, W. , Miller, W. , Myers, E. W. , & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410. - PubMed
    1. Amos, J. N. , Harrisson, K. A. , Radford, J. Q. , White, M. , Newell, G. , Mac Nally, R. , Sunnucks, P. , & Pavlova, A. (2014). Species‐ and sex‐specific connectivity effects of habitat fragmentation in a suite of woodland birds. Ecology, 95, 1556–1568. - PubMed
    1. Arnold, B. D. , & Wilkinson, G. S. (2015). Female natal philopatry and gene flow between divergent clades of pallid bats (Antrozous pallidus). Journal of Mammalogy, 96(3), 531–540.

LinkOut - more resources