. 2017 Sep 8;12(9):e0184087.

doi: 10.1371/journal.pone.0184087. eCollection 2017.

SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing

Seongmun Jeong¹, Jiwoong Kim², Won Park¹, Hongmin Jeon¹, Namshin Kim¹

Affiliations

¹ Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea.
² Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, United States of America.

PMID: 28886064
PMCID: PMC5590872
DOI: 10.1371/journal.pone.0184087

SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing

Seongmun Jeong et al. PLoS One. 2017.

. 2017 Sep 8;12(9):e0184087.

doi: 10.1371/journal.pone.0184087. eCollection 2017.

Authors

Seongmun Jeong¹, Jiwoong Kim², Won Park¹, Hongmin Jeon¹, Namshin Kim¹

Affiliations

¹ Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea.
² Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, United States of America.

PMID: 28886064
PMCID: PMC5590872
DOI: 10.1371/journal.pone.0184087

Abstract

Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Fig 1. Procedure for extracting sex-specific marker sequences.**
Two sex chromosomes were aligned with each other using LASTZ, and syntenic regions with polymorphisms were extracted. Final sex-specific marker sequences were selected after removal of similar sequences (90% identity by BLAST).

**Fig 2. Average read counts for each marker by input number of million sequence reads (log10) for human (hg38) datasets.**
Red arrows indicate minimum read counts: 5 million (5×10⁶) reads for whole-exome sequencing and RNA sequencing and 100 million (1×10⁸) reads for whole-genome sequencing. The red horizontal line denotes the minimum average read counts of sex-specific marker sequences.

See this image and copyright information in PMC

Cited by

Considerations and challenges for sex-aware drug repurposing.
Fisher JL, Jones EF, Flanary VL, Williams AS, Ramsey EJ, Lasseigne BN. Fisher JL, et al. Biol Sex Differ. 2022 Mar 25;13(1):13. doi: 10.1186/s13293-022-00420-8. Biol Sex Differ. 2022. PMID: 35337371 Free PMC article. Review.
Novel human sex-typing strategies based on the autism candidate gene NLGN4X and its male-specific gametologue NLGN4Y.
Maxeiner S, Sester M, Krasteva-Christ G. Maxeiner S, et al. Biol Sex Differ. 2019 Dec 18;10(1):62. doi: 10.1186/s13293-019-0279-x. Biol Sex Differ. 2019. PMID: 31852540 Free PMC article.
The genomic prehistory of the Indigenous peoples of Uruguay.
Lindo J, De La Rosa R, Santos ALCD, Sans M, DeGiorgio M, Figueiro G. Lindo J, et al. PNAS Nexus. 2022 Apr 21;1(2):pgac047. doi: 10.1093/pnasnexus/pgac047. eCollection 2022 May. PNAS Nexus. 2022. PMID: 36713318 Free PMC article.
Bioinformatics services for analyzing massive genomic datasets.
Ko G, Kim PG, Cho Y, Jeong S, Kim JY, Kim KH, Lee HY, Han J, Yu N, Ham S, Jang I, Kang B, Shin S, Kim L, Lee SW, Nam D, Kim JF, Kim N, Kim SY, Lee S, Roh TY, Lee B. Ko G, et al. Genomics Inform. 2020 Mar;18(1):e8. doi: 10.5808/GI.2020.18.1.e8. Epub 2020 Mar 31. Genomics Inform. 2020. PMID: 32224841 Free PMC article.
Alternatives to amelogenin markers for sex determination in humans and their forensic relevance.
Dash HR, Rawat N, Das S. Dash HR, et al. Mol Biol Rep. 2020 Mar;47(3):2347-2360. doi: 10.1007/s11033-020-05268-y. Epub 2020 Jan 25. Mol Biol Rep. 2020. PMID: 31983014 Review.

See all "Cited by" articles

References

1. The 1000 Genomes Project Consortium (2015). "A global reference for human genetic variation." Nature 526(7571): 68–74. doi: 10.1038/nature15393 - DOI - PMC - PubMed
1. Leinonen R., Sugawara H., and Shumway M. (2011). "The sequence read archive." Nucleic Acids Res 39(Database issue): D19–21. doi: 10.1093/nar/gkq1019 - DOI - PMC - PubMed
1. Cancer Genome Atlas Research, N. (2008). "Comprehensive genomic characterization defines human glioblastoma genes and core pathways." Nature 455(7216): 1061–1068. doi: 10.1038/nature07385 - DOI - PMC - PubMed
1. Grady B. J., Torstenson E, Dudek S.M., Giles J., Sexton D., and Ritchie M.D. (2010). "Finding unique filter sets in PLATO: a precursor to efficient interaction analysis in GWAS data." Pac Symp Biocomput: 315–326. - PMC - PubMed
1. Arunrat Chaveerach R. S., Tanee Tawatchai, Sanubol Arisa, Thooptianrat Tikumporn, Faijaidee Waraporn, and Yaipool Kittibodee (2015). "Genetic markers for sex identification in Thai population." Forensic Science International: Genetics Supplement Series 5: e390–e391.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing

Affiliations

SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources