Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Mar;19(3):470-80.
doi: 10.1101/gr.081851.108. Epub 2009 Feb 9.

Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes

Affiliations

Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes

Antoine Barrière et al. Genome Res. 2009 Mar.

Abstract

The majority of nematodes are gonochoristic (dioecious) with distinct male and female sexes, but the best-studied species, Caenorhabditis elegans, is a self-fertile hermaphrodite. The sequencing of the genomes of C. elegans and a second hermaphrodite, C. briggsae, was facilitated in part by the low amount of natural heterozygosity, which typifies selfing species. Ongoing genome projects for gonochoristic Caenorhabditis species seek to approximate this condition by intense inbreeding prior to sequencing. Here we show that despite this inbreeding, the heterozygous fraction of the whole genome shotgun assemblies of three gonochoristic Caenorhabditis species, C. brenneri, C. remanei, and C. japonica, is considerable. We first demonstrate experimentally that independently assembled sequence variants in C. remanei and C. brenneri are allelic. We then present gene-based approaches for recognizing heterozygous regions of WGS assemblies. We also develop a simple method for quantifying heterozygosity that can be applied to assemblies lacking gene annotations. Consistently we find that approximately 10% and 30% of the C. remanei and C. brenneri genomes, respectively, are represented by two alleles in the assemblies. Heterozygosity is restricted to autosomes and its retention is accompanied by substantial inbreeding depression, suggesting that it is caused by multiple recessive deleterious alleles and not merely by chance. Both the overall amount and chromosomal distribution of heterozygous DNA is highly variable between assemblies of close relatives produced by identical methodologies, and allele frequencies have continued to change after strains were sequenced. Our results highlight the impact of mating systems on genome sequencing projects.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Genomic distribution of inferred heterozygous (thick lines) and homozygous (thin solid lines) regions in the sequenced C. brenneri genome. Chromosomal locations are assigned based on positions of C. elegans orthologs. Genes represented by two alleles are shown above chromosomes; homozygous loci are shown below chromosomes. Regions of currently undetermined status are represented by dashed lines.
Figure 2.
Figure 2.
Gene-based, genome-wide survey for heterozygosity in the preliminary C. remanei assembly Cr01. 10,322 single-copy C. elegans genes were used to query the assembly. The fraction of total queries that identified two distinct yet highly similar gene predictions within a 100-kb sliding window (with 50-kb steps) along the C. elegans chromosome is plotted at the bottom of each panel. Left scales (red) refer only to these values. The upper portion of each panel depicts the WGS read depth for queries that have an apparent singleton C. remanei homolog (gray diamonds), and the mean depth for doublet homologs (black diamonds). Right scales (black) refer only to these values. The small proportion of queries identifying more than two variants are not shown in the depth analysis. Regions in which doublet homologs occur in clusters and have consistently low mean read depth are inferred to be heterozygous. By this criterion, regions of the C. remanei genome that are syntenic with C. elegans LGI at 5 Mb, LGV at 9 and 18 Mb, and nearly all of LGIV are heterozygous. The mean WGS read depth for the singleton homologs in each query chromosome is plotted with a dashed line. The singleton read depth for chromosome X (8.23×) lies between the genome-wide 9.2× and the 6.9× expected for equal sex ratio, likely due to the substantially smaller size and genome copy number of males relative to females.
Figure 3.
Figure 3.
Estimated copy number distributions for five genome assemblies. For each species, a sliding query window of 1000 bp with 500 bp steps was used to identify nonself matches in the assembly. The percentages reported are relative to the size of the total assembly, not to an inferred actual genome size. For the hermaphroditic C. elegans (sequenced by a minimum clone tiling path method) and C. briggsae (sequenced by WGS), all sequences with copy number of two or more likely represent true copy number variation in the genome. For the three gonochoristic species, however, each bin potentially represents a mix between truly paralogous DNA and retained alleles. As the apparent single-copy sequence is at least 55% in all assemblies, the majority of the unrecognized alleles are expected to lie in the two-copy category. Single-copy DNA is not shown.
Figure 4.
Figure 4.
Persistent heterozygosity is associated with inbreeding depression. (A) Comparison of C. brenneri strains LKC28 (founder) with PB2801 (inbred for sequencing). The number of viable adults produced by PB2801 is significantly lower than those produced by LKC28 (p < 0.04; Kolmogorov–Smirnov test). (B) Comparison of C. remanei strains EM464 (founder) with PB4641 (inbred for sequencing). PB4641 has significantly lower fitness (p < 0.001, Kolmogorov–Smirnov test). Thick vertical lines indicate the median, boxes represent upper and lower quartiles, and whiskers the entirety of the distributions.
Figure 5.
Figure 5.
Changes in allele frequencies in (A) fog-1 and (B) sur-2 loci in C. brenneri. Gray represents allele “A” and black represents allele “B.” Frequencies in LKC28 and “PB2801 present” were obtained by PCR genotyping of individual animals from these strains—44 and 76 for fog-1 and 57 and 50 for sur-2. Allele frequencies in the strain PB2801 at the time of sequencing were inferred from the average number of sequence reads through each of the two alleles.

References

    1. Baird S.E. Haldane's rule by sexual transformation in Caenorhabditis. Genetics. 2002;161:1349–1353. - PMC - PubMed
    1. Begun D.J., Holloway A.K., Stevens K., Hillier L.W., Poh Y.P., Hahn M.W., Nista P.M., Jones C.D., Kern A.D., Dewey C.N., et al. Population genomics: Whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 2007;5:e310. doi: 10.1371/journal.pbio.0050310. - DOI - PMC - PubMed
    1. Brenner S. The genetics of Caenorhabditis elegans. Genetics. 1974;77:71–94. - PMC - PubMed
    1. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: A platform for investigating biology. Science. 1998;282:2012–2018. - PubMed
    1. Cutter A.D., Payseur B.A. Selection at linked sites in the partial selfer Caenorhabditis elegans. Mol. Biol. Evol. 2003;20:665–673. - PubMed

Publication types

LinkOut - more resources