Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun;206(2):537-556.
doi: 10.1534/genetics.116.198838.

Genomes of the Mouse Collaborative Cross

Affiliations

Genomes of the Mouse Collaborative Cross

Anuj Srivastava et al. Genetics. 2017 Jun.

Abstract

The Collaborative Cross (CC) is a multiparent panel of recombinant inbred (RI) mouse strains derived from eight founder laboratory strains. RI panels are popular because of their long-term genetic stability, which enhances reproducibility and integration of data collected across time and conditions. Characterization of their genomes can be a community effort, reducing the burden on individual users. Here we present the genomes of the CC strains using two complementary approaches as a resource to improve power and interpretation of genetic experiments. Our study also provides a cautionary tale regarding the limitations imposed by such basic biological processes as mutation and selection. A distinct advantage of inbred panels is that genotyping only needs to be performed on the panel, not on each individual mouse. The initial CC genome data were haplotype reconstructions based on dense genotyping of the most recent common ancestors (MRCAs) of each strain followed by imputation from the genome sequence of the corresponding founder inbred strain. The MRCA resource captured segregating regions in strains that were not fully inbred, but it had limited resolution in the transition regions between founder haplotypes, and there was uncertainty about founder assignment in regions of limited diversity. Here we report the whole genome sequence of 69 CC strains generated by paired-end short reads at 30× coverage of a single male per strain. Sequencing leads to a substantial improvement in the fine structure and completeness of the genomes of the CC. Both MRCAs and sequenced samples show a significant reduction in the genome-wide haplotype frequencies from two wild-derived strains, CAST/EiJ and PWK/PhJ. In addition, analysis of the evolution of the patterns of heterozygosity indicates that selection against three wild-derived founder strains played a significant role in shaping the genomes of the CC. The sequencing resource provides the first description of tens of thousands of new genetic variants introduced by mutation and drift in the CC genomes. We estimate that new SNP mutations are accumulating in each CC strain at a rate of 2.4 ± 0.4 per gigabase per generation. The fixation of new mutations by genetic drift has introduced thousands of new variants into the CC strains. The majority of these mutations are novel compared to currently sequenced laboratory stocks and wild mice, and some are predicted to alter gene function. Approximately one-third of the CC inbred strains have acquired large deletions (>10 kb) many of which overlap known coding genes and functional elements. The sequence of these mice is a critical resource to CC users, increases threefold the number of mouse inbred strain genomes available publicly, and provides insight into the effect of mutation and drift on common resources.

Keywords: MPP; drift; genetic variants; multiparental populations; selection; whole genome sequence.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The CC genomes. In all figures we use the following colors and letter codes to represent the eight founder strains of the CC: A/J, yellow (A); C57BL/6J, gray (B); 129S1/SvImJ, pink (C); NOD/ShiLtJ, dark blue (D); NZO/HlLtJ, light blue (E); CAST/EiJ, green (F); PWK/PhJ, red (G); and WSB/EiJ, purple (H). (A) Haplotype mosaic for the sequenced representative of the CC001/Unc strain. (B) Number of haplotype blocks identified in the MRCA and sequenced samples. (C) Distribution of haplotype block size in MRCAs and sequenced samples in log scale. (D) Founder contribution to the genomes of CC strains with all eight founders. Autosomes are shown in the left panel and chromosome X in the right. Within a panel and founder strain the left boxes represent MRCAs and the right, the sequenced sample. (E) Founder contribution to the genomes of CC strains with missing founders. Founder contribution to chromosome X. Autosomes are shown in the left panel and chromosome X in the right. Within a panel and founder strain the left boxes represent MRCAs and the right, the sequenced sample.
Figure 2
Figure 2
Sequencing improves haplotype assignment in recombination intervals. (A) Haplotype reconstruction for chromosome 5 from MRCAs of CC044/GeniUnc. The focal recombination event is indicated by a gray box. (B) Zoomed-in view of recombination interval, showing the flanking informative markers from the MegaMUGA genotyping array. Haplotype assignment in the MRCAs is uncertain over 11.7 kb. (C) Alleles in the sequenced CC044/GeniUnc male shared with PWK/PhJ (top track) or C57BL/6J (bottom track); inferred recombination interval is indicated by a gray box. (D) Genotypes at informative SNPs between PWK/PhJ and C57BL/6J reduce the recombination interval to 298 bp, between rs32922813 and rs32922811.
Figure 3
Figure 3
Biased contribution of the CC founders to the residual heterozygosity present in the MRCAs and sequenced samples. The x-axis shows log ratio of observed to expected proportion of the genome in each of 28 possible heterozygous states (y-axis) across 56 CC strains with all eight founder haplotypes present. Heterozygous states are divided into two classes: those involving classical inbred strains only (top) or those involving at least one wild-derived strain (bottom). Black dotted line gives expected value of the statistic (zero), and gray dashed lines show median value in each panel.
Figure 4
Figure 4
Haplotype frequencies on chromosomes 2, 12, and X in MRCAs and sequenced samples. The analysis is restricted to the 56 CC strains with all eight founder strains present.
Figure 5
Figure 5
Frequency of private variants in 69 CC strains. (A) The log10 frequency per gigabase of SNPs and indels by chromosome (text) and haplotype (color) reveals that wild-derived haplotypes have higher apparent rates of private variation. (B) The strain-specific frequency of SNPs on nonwild autosomal haplotypes was estimated by Poisson regression. The frequency per gigabase of private SNPs increases with the breeding generation of the sequenced animal. The slope of the regression line (2.4 SNPs per gigabase per generation) provides an estimate of the rate of accumulation of new SNPs in the CC strains. Strains are identified by the last two digits of the strain name, e.g., CC002/Unc is indicated as “02” in the figure.
Figure 6
Figure 6
Examples of large private deletions. (A) Deletion on a C57BL/6J haplotype on chromosome 17: 57 Mb in CC026/GeniUnc is not shared with CC040/Unc, which shares the underlying C57BL/6J haplotype. Top panel shows normalized coverage in whole-genome sequencing (in 1-kb bins) for CC040/Unc; lower panel shows normalized coverage in CC026/GeniUnc. The deletion spans exons (red) from four genes including complement factor gene C3. Assembled sequence spanning the deletion shows microhomology over 9 bp at the breakpoint. (B) Deletion on a 129S1/SvImJ haplotype on chromosome 3: 133 Mb in CC055/TauUnc is not shared with CC018/Unc, which shares the underlying 129S1/SvImJ haplotype. Organization follows that found in A. The deletion spans the middle exons (red) of Npnt, which encodes the integrin-binding protein nephronectin.
Figure 7
Figure 7
Mapping of unplaced sequences in the CC. (A) QTL scan demonstrating successful localization of GL456378, a contig not localized in the current mouse reference genome (mm10/GRCm38.p5), to distal chromosome 4. (B) Estimated copy number of GL456378 in founder strains. (C) Genomic distribution of 19 sequences localized using the CC. Gold, sequences previously assigned to a chromosome but not a specific position; black, sequences whose position was previously unknown. Dot indicates marker with maximum LOD score and line segment indicates 95% credible interval.

Similar articles

Cited by

References

    1. Ananda G., Takemon Y., Hinerfeld D., Korstanje R., 2014. Whole-genome sequence of the C57L/J mouse inbred strain. G3 4: 1689–1692. - PMC - PubMed
    1. Aylor D. L., Valdar W., Foulds-Mathes W., Buus R. J., Verdugo R. A., et al. , 2011. Genetic analysis of complex traits in the emerging collaborative cross. Genome Res. 21: 1213–1222. - PMC - PubMed
    1. Balcova M., Faltusova B., Gergelits V., Bhattacharyya T., Mihola O., et al. , 2016. Hybrid sterility locus on chromosome X controls meiotic recombination rate in mouse. PLoS Genet. 12: e1005906. - PMC - PubMed
    1. Bouchet S., Olatoye M. O., Marla S. R., Perumal R., Tesso T., et al. , 2017. Increased power to dissect adaptive traits in global sorghum diversity using a nested association mapping population. Genetics 206: 573–585. - PMC - PubMed
    1. Bouquet M., Selva J., Auroux M., 1993. Cryopreservation of mouse oocytes: Mutagenic effects in the embryo? Biol. Reprod. 49: 764–769. - PubMed

Publication types