Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;25(3):445-58.
doi: 10.1101/gr.185579.114. Epub 2015 Jan 14.

The Release 6 reference sequence of the Drosophila melanogaster genome

Affiliations

The Release 6 reference sequence of the Drosophila melanogaster genome

Roger A Hoskins et al. Genome Res. 2015 Mar.

Abstract

Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
FISH mapping of BACs on mitotic and polytene chromosomes. (A–C) BAC fluorescent hybridization signals (red, yellow) on mitotic chromosomes stained with DAPI (blue) are shown in pseudocolored images. Arrows indicate numbered divisions in the cytogenetic map of the pericentric and centric heterochromatin of mitotic chromosomes. Scale bar (A) indicates 3 µm. (A) BACR18B19 represents the previously unmapped Release 5 scaffold AABU01001089 and maps to division h26 on the X chromosome. (B) CH221-23H11 (red) represents the proximal end of the Release 5 arm X sequence, and CH223-33C13 (yellow) represents the distal end of the Release 5 XHet scaffold CP000208. Their signals overlap in h26p-h27; “p” indicates a proximal location within cytogenetic division h26. (C) CH223-01L14 represents the previously unmapped Release 5 scaffolds AABU01002700, AABU01002715, and AABU01001895 and produces a strong signal in h11-h14 on the Y chromosome. The long arm (YL) and short arm (YS) of the Y chromosome are indicated. (D–G) BAC and gene fluorescent hybridization signals (red, green) on polytene chromosomes of the wm4, SuUR Su(var)3-906 strain stained with DAPI (blue). Scale bar (D) indicates 3 µm. (D) BACR11F21 (red) and BACR30J23 (green) represent opposite ends of the Release 5 scaffold CP000188 and map to the proximal part of region 41A in the pericentric heterochromatin of chromosome arm 2R. BACR30J23 localizes distal to BACR11F21, orienting the scaffold. (E) BACR06D11 (red) represents the previously unmapped Release 5 scaffold CP000194 and localizes in the 20BC region of the X chromosome, proximal to DIP1 (green) in 20A. (F) BACR18B19 (green) represents the previously unmapped Release 5 scaffold AABU01001089 and also localizes in the 20BC region, distal to CG14621 (red) in 20C. (G) BACR06D11 and BACR18B19 signals overlap, but the strongest BACR06D11 signal is distal to the strongest BACR18B19 signal, suggesting their relative order. Sequence finishing shows that these BACs overlap each other by 40 kb.
Figure 2.
Figure 2.
The Release 6 chromosome arm sequences. Schematic representation of the Release 6 chromosome arm sequences (horizontal white bars). The boundary between euchromatin and pericentric heterochromatin on each arm (Riddle et al. 2011) is indicated (red vertical lines). Clone gaps between sequence scaffolds (vertical black lines) are labeled with their cytogenetic locations on the polytene chromosome map. (Below) Color-coding indicates the sequence release at which each BAC-based sequence was finalized: Release 3 (gray), Release 4 (green), Release 5 (purple), Release 6 (black). Sequences finished in WGS3-based scaffolds are indicated (blue).
Figure 3.
Figure 3.
Myosin 81F links five Release 6 sequence scaffolds in pericentric heterochromatin. (A) The cDNA MIP31562 defines a new gene Myosin 81F that spans >2.5 Mb in the pericentric heterochromatin of chromosome arm 3R. The cDNA sequence was used in assembling five Release 5 BAC contigs, three unmapped (white boxes) and two on 3RHet (shaded boxes), into a series of five ordered and oriented Release 6 sequence scaffolds (Supplemental Fig. S14, 3RHet.9–13). Four unsized clone gaps between the scaffolds are indicated by double diagonal lines and map within introns of the gene. MIP31562 is 6604 bp in length, and its genomic alignment defines 22 exons. (B) The long ORF of MIP31562 encodes a 2082-aa protein with a myosin motor domain, three IQ motifs, and two sets of MyTH4 (Myosin Tail Homology 4), Ubiquitin (UBQ), FERM (4.1 protein, Ezrin, Radixin, and Moesin), and PH-like (Pleckstrin Homology-like) domains. The UBQ and FERM domains together are known as the multidomain Band 4.1 (B41) (Sellers 2000).
Figure 4.
Figure 4.
Genes in the Release 6 assembly of the Y chromosome. The locations of genes on the mitotic cytogenetic map of the long (YL) and short (YS) arms of the Y chromosome are indicated. The chromosome is divided into 25 heterochromatic regions, h1 through h25, and the location of the centromere (C) is indicated. The locations of highly repetitive sequence blocks at the Su(Ste) locus, the 14HT satellite, the 18HT satellite, and the rDNA locus are indicated. Genes newly represented in the Release 6 assembly are Pp1-Y1, polycystine-related-Y (PRY), Aldehyde reductase Y (ARY), WD40 Y (WDY), flagrante delicto Y (FDY), Mst35Y, Mst77FY1-18, and Coiled-Coils Y (CCY), and those partially represented in Release 5 and completely represented in Release 6 are male fertility factor kl-5 (kl-5), male fertility factor kl-3 (kl-3), male fertility factor kl-2 (kl-2), Ppr-Y, Pp1-Y2, and Occludin-Related Y (ORY). The FDY gene has been tentatively placed in region h16. Cytogenetic map locations of the genes with citations are indicated in the text.
Figure 5.
Figure 5.
Measurement of three euchromatic clone gaps by whole-genome optical restriction mapping. Alignments of the Release 6 genomic sequence (Rel6) to whole-genome optical restriction map contigs (Contig) at clone gaps between sequence scaffolds (arrows) are diagrammed. Aligned NheI restriction fragments (shaded boxes), unaligned fragments (white boxes), and alignment points (lines connecting NheI restriction sites in the sequence and the map) are indicated. The euchromatic clone gaps at (A) 2R: 57B, (B) 3L: 64C, and (C) 4: 102F are spanned by whole-genome map contigs, providing estimates of the gap sizes (Supplemental Table S3).

References

    1. Abad JP, Carmena M, Baars S, Saunders RD, Glover DM, Ludena P, Sentis C, Tyler-Smith C, Villasante A. 1992. Dodeca satellite: a conserved G+C-rich satellite from the centromeric heterochromatin of Drosophila melanogaster. Proc Natl Acad Sci 89: 4663–4667. - PMC - PubMed
    1. Abad JP, Agudo M, Molina I, Losada A, Ripoll P, Villasante A. 2000. Pericentromeric regions containing 1.688 satellite DNA sequences show anti-kinetochore antibody staining in prometaphase chromosomes of Drosophila melanogaster. Mol Gen Genet 264: 371–377. - PubMed
    1. Abad JP, de Pablos B, Agudo M, Molina I, Giovinazzo G, Martin-Gallardo A, Villasante A. 2004a. Genomic and cytological analysis of the Y chromosome of Drosophila melanogaster: telomere-derived sequences at internal regions. Chromosoma 113: 295–304. - PubMed
    1. Abad JP, De Pablos B, Osoegawa K, De Jong PJ, Martin-Gallardo A, Villasante A. 2004b. Genomic analysis of Drosophila melanogaster telomeres: full-length copies of HeT-A and TART elements at telomeres. Mol Biol Evol 21: 1613–1619. - PubMed
    1. Accardo MC, Dimitri P. 2010. Fluorescence in situ hybridization with Bacterial Artificial Chromosomes (BACs) to mitotic heterochromatin of Drosophila. Methods Mol Biol 659: 389–400. - PubMed

Publication types

Associated data