Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 23;15(9):e1008314.
doi: 10.1371/journal.pgen.1008314. eCollection 2019 Sep.

Chromosomal rearrangements as a source of new gene formation in Drosophila yakuba

Affiliations

Chromosomal rearrangements as a source of new gene formation in Drosophila yakuba

Nicholas B Stewart et al. PLoS Genet. .

Abstract

The origins of new genes are among the most fundamental questions in evolutionary biology. Our understanding of the ways that new genetic material appears and how that genetic material shapes population variation remains incomplete. De novo genes and duplicate genes are a key source of new genetic material on which selection acts. To better understand the origins of these new gene sequences, we explored the ways that structural variation might alter expression patterns and form novel transcripts. We provide evidence that chromosomal rearrangements are a source of novel genetic variation that facilitates the formation of de novo exons in Drosophila. We identify 51 cases of de novo exon formation created by chromosomal rearrangements in 14 strains of D. yakuba. These new genes inherit transcription start signals and open reading frames when the 5' end of existing genes are combined with previously untranscribed regions. Such new genes would appear with novel peptide sequences, without the necessity for secondary transitions from non-coding RNA to protein. This mechanism of new peptide formations contrasts with canonical theory of de novo gene progression requiring non-coding intermediaries that must acquire new mutations prior to loss via pseudogenization. Hence, these mutations offer a means to de novo gene creation and protein sequence formation in a single mutational step, answering a long standing open question concerning new gene formation. We further identify gene expression changes to 134 existing genes, indicating that these mutations can alter gene regulation. Population variability for chromosomal rearrangements is considerable, with 2368 rearrangements observed across 14 inbred lines. More rearrangements were identified on the X chromosome than any of the autosomes, suggesting the X is more susceptible to chromosome alterations. Together, these results suggest that chromosomal rearrangements are a source of variation in populations that is likely to be important to explain genetic and therefore phenotypic diversity.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Example of paired end reads mapped abnormally to the reference genome.
A) CY17C Chromosome sequenced with paired reads with 325 bp insert size. CY17C has an insertion of sequence from 2L into chromosome arm 3R. B) Each end was then aligned to the reference genome. Left reads that mapped around 3R:902100 had paired right reads mapped to regions near 2L:12261000 (red arrows). Additionally, right reads that mapped around 3R:902600 paired with left reads mapping around 2L:12261150 (blue arrows). This indicates that the region between 2L:12261000–12261150 has been inserted into 3R: 902000–902600 in line CY17C. Each rearrangement needs at least 4 abnormally mapping read pairs to be considered.
Fig 2
Fig 2. Example of paired end sequence reads mapping from RNASeq data.
A) CY17C has an insertion of sequence from 2L into chromosome arm 3R. This insert placed a previously untranscribed region within a previously transcribed gene. Paired end reads were generated from cDNA. B) The paired end reads of this RNA transcript will map to separate chromosomes on the reference sequence, and split read mapping may be seen at the breakpoints. Three total misaligned RNASeq pairs and/or split reads is needed to be considered a formation of a new gene.
Fig 3
Fig 3. New gene formation through genome rearrangement on chromosome 3R and 2L.
Observing sequence depth of the RNA we can infer relative expression and identify newly transcribed regions in lines that have rearrangement calls. Relative RNA Coverage depth was calculated from Tophat RNASeq alignments by dividing the read depth at each base by the total number of reads mapped. Two regions that have 2 genomic rearrangement calls and Tophat fusion calls supporting the formation of a de novo gene. A) Diagram showing the predicted sequence movement based on the Trinity Transcript blast. An insertion of the sequence from 2L:12260976 in-between 902154 and 902563 has moved a segment of previously untranscribed DNA to a region with active transcription on 3R. RNA transcript assembled by Trinity confirms the observed coverage pattern in RNASeq data. The transcript starts near 3R:902000, the middle section mapped between 2L:12260976–12261178 and the final section then maps near 3R:902500. B) and C) The grey coverage lines are RNA sequence coverage from 3 reference RNASeq replicates which do not have this rearrangement. D) and E) RNA sequence coverage of line CY17C which has the rearrangement present. F) and G) CY17C has a two genomic rearrangement calls between 2L:12260976–12261229 matching with 3R:901825–902154 (red arrows) and 3R:902563–902607 (blue arrows). Grey boxes represent the Trinity transcript aligned to the reference genome.
Fig 4
Fig 4
A) Diagram showing the predicted sequence movement based on the Trinity Transcript blast. A rearrangement joining the sequence from 2L:5435633–5436212 902154 and 2L:5435462–5436047 has moved a segment of previously untranscribed DNA to a region with active transcription on 2L. B) and C) The grey coverage lines are RNA sequence coverage from 3 reference RNASeq replicates which do not have this rearrangement. D) and E) CY28A4 has the rearrangement and increased transcription in region 2L:5435462–5436047. F) and G) CY28A4 has a rearrangement calls between 2L:13246986–13247746 matching with 2L:5435633–5436212 (red arrows) and Square boxes in represent the Trinity transcript aligned to the reference genome. The black boxes represent exons of a preexisting gene (1.g484.t1).
Fig 5
Fig 5
Distribution of new genes per strain identified in testes, male carcass, ovaries, and female carcass based on 14 inbred lines in males (A) and females (B). A total of 51 new genes were identified across all 14 strains in all tissues. We not see a difference in the number of new genes expressed between male gametic and somatic tissue (ANOVA, F(1,13) = 0.04, P>0.8). While there is a significant difference between ovaries and female tissue (ANOVA, F(1,13) = 4.379, P<0.05), the values are low for each line (including being 0 for multiple samples). This suggest that the male comparison is more indicative of the ratio between somatic and gametic tissue.
Fig 6
Fig 6. Site frequency spectrum of rearrangements found in the 14 lines.
Most of the rearrangements are singletons. However, there is a slight increase in number of rearrangements found in at least 11 of the 14 lines.
Fig 7
Fig 7. Number of rearrangement breakpoints per base pair on each chromosome arm for inbred lines of D. yakuba.
Total number of rearrangement sites on each chomosome varied (ANOVA, F(4,52) = 43.42, P<10−15). This is mostly do to the the fact that the X chromosome has significantly more rearrangement breakpoints than the autosomes (Tukey HSD for each comparison involving the X, P<10−6). Chromosome 3R had significantly fewer rearrangements than the X, 2L and 2R (Tukey HSD, P < 0.05).
Fig 8
Fig 8. Many rearrangements lie in the same region making it hard to fully elucidate the nature of a particular rearrangement.
For instance, in line CY21B3 has 5 rearrangement calls (represented by the connecting lines of the two large sections of chromosome 2R) associated with two regions 2R:7002500–7005500 and 2R:9895000–9902000. 4 separate small regions that are separated by at least 1 sequencing insert size (325 bp) within 2R:7002500–7005500 have reads that pair with 3 separate small regions between 2R:9895000–990200. All the lines always show at least one of these rearrangements but generally each line has 2–3 separate rearrangement calls between regions 2R:7002500–7005500 and 2R:9895000–9902000. The 9.9Mb breakpoint lies close to the known inversion breakpoint on 2R where recombination is suppressed.

References

    1. Conant GC, Wolfe KH. Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet. 2008;9(12):938–50. 10.1038/nrg2482 . - DOI - PubMed
    1. Ohno S. Evolution by gene duplication Berlin, New York,: Springer-Verlag; 1970. xv, 160 p. p.
    1. Long M, Langley CH. Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science. 1993;260(5104):91–5. 10.1126/science.7682012 . - DOI - PubMed
    1. Rogers RL, Hartl DL. Chimeric genes as a source of rapid evolution in Drosophila melanogaster. Mol Biol Evol. 2012;29(2):517–29. Epub 2011/07/21. 10.1093/molbev/msr184 - DOI - PMC - PubMed
    1. Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, Zhan Z, et al. On the origin of new genes in Drosophila. Genome Res. 2008;18(9):1446–55. 10.1101/gr.076588.108 - DOI - PMC - PubMed

Publication types

LinkOut - more resources