Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Feb 1;25(3):309-14.
doi: 10.1093/bioinformatics/btn632. Epub 2009 Jan 7.

PanCGH: a genotype-calling algorithm for pangenome CGH data

Affiliations

PanCGH: a genotype-calling algorithm for pangenome CGH data

Jumamurat R Bayjanov et al. Bioinformatics. .

Abstract

Motivation: Pangenome arrays contain DNA oligomers targeting several sequenced reference genomes from the same species. In microbiology, these can be employed to investigate the often high genetic variability within a species by comparative genome hybridization (CGH). The biological interpretation of pangenome CGH data depends on the ability to compare strains at a functional level, particularly by comparing the presence or absence of orthologous genes. Due to the high genetic variability, available genotype-calling algorithms can not be applied to pangenome CGH data.

Results: We have developed the algorithm PanCGH that incorporates orthology information about genes to predict the presence or absence of orthologous genes in a query organism using CGH arrays that target the genomes of sequenced representatives of a group of microorganisms. PanCGH was tested and applied in the analysis of genetic diversity among 39 Lactococcus lactis strains from three different subspecies (lactis.cremoris, hordniae) and isolated from two different niches (dairy and plant). Clustering of these strains using the presence/absence data of gene orthologs revealed a clear separation between different subspecies and reflected the niche of the strains.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic representation of the PanCGH algorithm for a CGH experiment. The left panel shows the fluorescence of a query strain to a set of probes (p1 to pn) targeting different reference orthologs (homologous genes from reference strains A, B and C) of an ortholog group gi. Some probes target several reference orthologs, as shown by the overlap between the probe sets targeting the reference orthologs from strains A and B. In the right panel, a schematic representation of the calculation of the presence score is shown. For each reference ortholog, the mode (indicated with a star) is calculated from the distribution of (log) signals of the corresponding probes. The presence score is the highest of these mode values. In this case, the presence score is above the threshold and equals the mode of the signals targeting the reference ortholog from strain B.
Fig. 2.
Fig. 2.
Hierarchical clustering of L. lactis strains based on presence/absence predictions of representatives of 4571 ortholog groups of L. lactis. The pairwise binary distance was used as a distance metric and clustering was performed using the average linkage agglomeration method (Hastie et al. 2001). The cluster of strains at the top represents the subspecies cremoris genotype, while the large cluster at the bottom, excluding strains P7266 and P7304, contains strains of subspecies lactis genotype and one strain (LMG8520) of subspecies hordniae phenotype. In these two clusters 1341 groups from the total of 4571 ortholog groups are present in all strains. Though strains P7266 and P7304 have subspecies lactis phenotype, they are far apart from other subspecies lactis strains (see explanation in text). Branches with a solid rectangle are dairy isolates and other strains are isolated from plants.

References

    1. Cleveland WS, et al. Local regression models. In: Chambers JM, Hastie TJ, editors. Chapter 8 of Statistical Models in S. Cole: Wadsworth & Brooks; 1992. pp. 312–316.
    1. Earl AM, et al. Bacillus subtilis genome diversity. J. Bacteriol. 2007;189:1163–1170. - PMC - PubMed
    1. Fields Development Team. Fields: Tools for Spatial Data. National Center for Atmospheric Research, Boulder, CO: 2006. [(last accessed August, 2008)]. Available at http://www.image.ucar.edu/Software/Fields/
    1. Fitch WM. Distinguishing homologous from analogous proteins. Syst. Zool. 1970;19:99–113. - PubMed
    1. Francke C, et al. A generic approach to identify transcription factor-specific operator motifs; inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1. BMC Genomics. 2008;9:145. - PMC - PubMed

Publication types