Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 May 11;101(19):7329-34.
doi: 10.1073/pnas.0401648101. Epub 2004 May 3.

The diploid genome sequence of Candida albicans

Affiliations

The diploid genome sequence of Candida albicans

Ted Jones et al. Proc Natl Acad Sci U S A. .

Abstract

We present the diploid genome sequence of the fungal pathogen Candida albicans. Because C. albicans has no known haploid or homozygous form, sequencing was performed as a whole-genome shotgun of the heterozygous diploid genome in strain SC5314, a clinical isolate that is the parent of strains widely used for molecular analysis. We developed computational methods to assemble a diploid genome sequence in good agreement with available physical mapping data. We provide a whole-genome description of heterozygosity in the organism. Comparative genomic analyses provide important clues about the evolution of the species and its mechanisms of pathogenesis.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Assembly strategy. Effects of separate assembly of diverged homologs by a single-copy assembler such as phrap. (a) Hypothetical configuration of genomic sequence. Two diverged homologous regions are shown in pink and brown, flanked by nearly homozygous sequence shown in blue. Reads containing pink sequence look different from brown reads and must not assemble into the same contig. In the blue regions, reads from either homolog look alike and be assembled together. (b and c) The two possible ways in which these conditions can be met by the assembler. In both cases, two contigs are produced, one containing pink reads and the other, brown. In b, the two blue flanking regions assemble into different contigs. The first contig contains a small amount of blue sequence on the right because of reads that are mostly pink but extend into the blue region. The second similarly contains a small amount of blue sequence (on the left). In c, both blue flanking regions are assembled into the contig containing the pink homolog. The second contig consists only of the brown homolog plus a small amount of blue sequence, as described for b. In both cases, the phrap contig numbers x, y, z, and w are arbitrary, and the separated homologs must be located by sequence alignment. In b, it is predicted that the alignment will extend to the right end of contig x and the left end of y. In c, the alignment will include both ends of contig w, running the entire length of the contig. We call such alignments terminal.
Fig. 2.
Fig. 2.
Diploid assembly of a pair of homologous supercontigs. Shown is a pair of homologous supercontigs (10065 and 20065) built from phrap contigs 1563, 2303*, 998, 2231, and 1981, where * denotes sequence complementation. There are terminal alignments indicating separately assembled homologous regions occurring between 1563:2303*, 998:2303*, 2303*:2231, and 2231:1981. For simplicity, both diverged homologs are shown in pink. In nearly homozygous regions of the phrap assembly, where a single sequence represents both homologs, sequence is copied in the direction shown by the arrows to fill in the dotted regions in the opposite homologous supercontig, reversing the process described in Fig. 1. In the heterozygous regions, low-quality bases at the ends of phrap contigs corresponding to the small blue regions in Fig. 1 b and c are also replaced with sequence from the other homolog. Not shown is a small region of internal trimming in contig 998 (see Supporting Text).
Fig. 3.
Fig. 3.
Polymorphism distribution on chromosome 7. Shown are eight supercontigs accounting for 93% of the sequence of chromosome 7, ordered and oriented by physical map data. The orientation of supercontigs 10110 and 10253 is uncertain. The position of the polymorphism is shown on the x axis. The polymorphism position is assigned much like the base position except that insertion polymorphisms are given a position, and multibase deletions are collapsed to a single position. Bar heights show polymorphism frequency per kilobase in 5,000-position windows across the concatenated supercontigs.
Fig. 4.
Fig. 4.
Size distribution of indel polymorphisms up to 15 bp in coding and noncoding sequence. The coding fraction is determined from the reduced ORF set. Indel frequency in coding sequence decreases with increasing length, but multiples of three are present at higher frequency than other lengths. Over-representation of multiples of three nearly disappears in noncoding sequence.
Fig. 5.
Fig. 5.
Genome comparisons with other species. Matches of the 6,419 ORFs to human, Saccharomyces, and Schizosaccharomyces proteins. Shown are blastp hits (E value 10-8 or better) of C. albicans ORFs against S. cerevisiae (S.c), S. pombe (S.p), and human (H) protein sequences. The protein comparison sets are described in Supporting Text.

References

    1. Poulter, R. T. (1987) Crit. Rev. Microbiol. 15, 97-101. - PubMed
    1. Chibana, H., Magee, B. B., Grindle, S., Ran, Y., Scherer, S. & Magee, P. T. (1998) Genetics 149, 1739-1752. - PMC - PubMed
    1. Chibana, H., Beckerman, J. L. & Magee, P. T. (2000) Genome Res. 10, 1865-1877. - PubMed
    1. Hull, C. M., Raisner, R. M. & Johnson, A. D. (2000) Science 289, 307-310. - PubMed
    1. Magee, B. B. & Magee, P. T. (2000) Science 289, 310-313. - PubMed

Publication types