Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 11;16(1):78.
doi: 10.1186/s12859-015-0509-0.

Fully automated pipeline for detection of sex linked genes using RNA-Seq data

Affiliations

Fully automated pipeline for detection of sex linked genes using RNA-Seq data

Monika Michalovova et al. BMC Bioinformatics. .

Abstract

Background: Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring.

Results: We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design and subsequent steps of wet-lab verification.

Conclusions: Our pipeline presents a simple and freely accessible software tool for identification of sex chromosome linked genes in species without an existing reference genome. Based on combination of genetic crosses and RNA-Seq data, we have designed a high-throughput, cost-effective approach for a broad community of scientists focused on sex chromosome structure and evolution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Segregation patterns. All possible combinations of sex-linked alleles in parent generation are listed. Alleles on the X chromosome exhibit criss-cross inheritance, while alleles on the Y chromosome can only be transmitted from father to son (holandric inheritance). Segregation patterns are used as the starting point for the design of filtering rules.
Figure 2
Figure 2
Simplified workflow. Mother’s reads are assembled with Trinity assembler and newly created contigs are then used as a reference to which all reads are mapped. This yields into 4 bam files storing alignments. After applying filters, X-linked contigs are identified. For detection of Y-linked genes, reference contigs are assembled from male reads (together male parent and male progeny)
Figure 3
Figure 3
Schematic representation of identification of X-linked genes exhibiting the A pattern. Mother is homozygote (X1X1), father has a different variant (X2). Approximately half of the daughter’s reads follow the father’s variant (SNP) while sons inherit mother’s variant. Variants specific only for males suggesting the Y origin are neglected.
Figure 4
Figure 4
Schematic representation of identification of Y-linked genes. Reference contigs are assembled from male reads only. Then, female reads do not map to contigs of Y-chromosome origin while male reads do. Short fragments in red color visualized on the figure represent limited tolerance of mapping/assemble errors.
Figure 5
Figure 5
Experimental laboratory verification of sex-linked contigs. A) PCR products of candidate X-linked contigs were sequenced and clustered. Father and daughter variants cluster together, which confirms X-linkage of a selected gene. Note that SNP marked in blue color is shared only among sequences of father and daughters. Another SNP in yellow represents sequencing error. B) PCR products of candidate Y-linked contigs. For every contig/gene genomic DNA of 7 individuals (4 males, 3 females) was used as a template: father (Reckovice, CZ), brother of father (Reckovice, CZ), two sisters of father (Reckovice, CZ), mother (Almería, Spain), two brothers of mother (Almería, Spain). Product is present only in male individuals.

Similar articles

Cited by

References

    1. Vyskot B, Hobza R. Gender in plants: sex chromosomes are emerging from the fog. Trends Genet. 2004;20:432–438. doi: 10.1016/j.tig.2004.06.006. - DOI - PubMed
    1. Hobza R, Vyskot B. Laser microdissection-based analysis of plant sex chromosomes. Methods Cell Biol. 2007;82:433–453. doi: 10.1016/S0091-679X(06)82015-7. - DOI - PubMed
    1. Hobza R, Hrusakova P, Safar J, Bartos J, Janousek B, Zluvova J, et al. MK17, a specific marker closely linked to the gynoecium suppression region on the Y chromosome in Silene latifolia. Theor Appl Genet. 2006;113:280–287. doi: 10.1007/s00122-006-0293-3. - DOI - PubMed
    1. Moore RC, Kozyreva O, Lebel-Hardenack S, Siroky J, Hobza R, Vyskot B, et al. Genetic and functional analysis of DD44, a sex-linked gene from the dioecious plant Silene latifolia, provides clues to early events in sex chromosome evolution. Genetics. 2003;163:321–334. - PMC - PubMed
    1. Soneson C, Delorenzi M. A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics. 2013;14:91. doi: 10.1186/1471-2105-14-91. - DOI - PMC - PubMed

Publication types