Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 25:13:146.
doi: 10.1186/1471-2105-13-146.

Identifying mutation regions for closely related individuals without a known pedigree

Affiliations

Identifying mutation regions for closely related individuals without a known pedigree

Wenjuan Cui et al. BMC Bioinformatics. .

Abstract

Background: Linkage analysis is the first step in the search for a disease gene. Linkage studies have facilitated the identification of several hundred human genes that can harbor mutations leading to a disease phenotype. In this paper, we study a very important case, where the sampled individuals are closely related, but the pedigree is not given. This situation happens very often when the individuals share a common ancestor 6 or more generations ago. To our knowledge, no algorithm can give good results for this case.

Results: To solve this problem, we first developed some heuristic algorithms for haplotype inference without any given pedigree. We propose a model using the parsimony principle that can be viewed as an extension of the model first proposed by Dan Gusfield. Our heuristic algorithm uses Clark's inference rule to infer haplotype segments.

Conclusions: We ran our program both on the simulated data and a set of real data from the phase II HapMap database. Experiments show that our program performs well. The recall value is from 90% to 99% in various cases. This implies that the program can report more than 90% of the true mutation regions. The value of precision varies from 29% to 90%. When the precision is 29%, the size of the reported regions is three times that of the true mutation region. This is still very useful for narrowing down the range of the disease gene location. Our program can complete the computation for all the tested cases, where there are about 110,000 SNPs on a chromosome, within 20 seconds.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pedigree 1: a pedigree with 2 diseased individuals in the input. There are 5 generations in the pedigree. The filled squares (circles) represent the diseased individuals. In the latest generation of this pedigree, 2 out of 10 individuals are diseased, which are numbered 44 and 48.
Figure 2
Figure 2
Pedigree 2: a pedigree with 3 diseased individuals in the input. There are 5 generations in the pedigree. The filled squares (circles) represent the diseased individuals. In the latest generation of this pedigree, 3 out of 10 individuals are diseased, which are numbered 43, 44 and 48.
Figure 3
Figure 3
Pedigree 3: a pedigree with 4 diseased individuals in the input. There are 5 generations in the pedigree. The filled squares (circles) represent the diseased individuals. In the latest generation of this pedigree, 4 out of 12 individuals are diseased, which are numbered 43, 45, 49 and 50.
Figure 4
Figure 4
Pedigree 4: a pedigree with 5 diseased individuals in the input. There are 5 generations in the pedigree. The filled squares (circles) represent the diseased individuals. In the latest generation of this pedigree, 5 out of 15 individuals are diseased, which are numbered 44, 46, 51, 54 and 55.
Figure 5
Figure 5
The different sets of input individuals based on Pedigree 1. Just the latest two generations are selected in this experiment. Squares and circles with a slash are individuals whose genotype is unknown. From top to bottom, the families in the input decrease.
Figure 6
Figure 6
The pedigree of family CEPH 1341 and CEPH 1375. The filled individuals are the individuals sharing the segment. We take them as the diseased individuals.
Figure 7
Figure 7
The region from 107M to 110M on chromosome 9. There are totally 6,519 SNP sites on this region, as shown by the blue line. The region with the highest score reported by our program contains 4,891 SNP sites, as shown by the red line.

Similar articles

References

    1. Emahazion T, Feuk L, Sawyer S, Fredman D, St Clair D, Prince J, Brookes A. SNP association studies in Alzheimer’s disease highlight problems for complex disease analysis. Trends Genet. 2001;17(7):407–413. doi: 10.1016/S0168-9525(01)02342-3. - DOI - PubMed
    1. Leykin I, Hao K, Cheng J, Meyer N, Pollak M, Smith R, Wong W, Rosenow C, Li C. Comparative linkage analysis and visualization of high-density oligonucleotide SNP array data. BMC Genet. 2005;6:7. - PMC - PubMed
    1. Sellick G, Longman C, Tolmie J, Newbury-Ecob R, Geenhalgh L, Hughes S, Whiteford M, Garrett C, Houlston R. Genomewide linkage searches for Mendelian disease loci can be efficiently conducted using high-density SNP genotyping arrays. Nucleic Acids Res. 2004;32(20):e164. doi: 10.1093/nar/gnh163. - DOI - PMC - PubMed
    1. Lander E, Green P. Construction of multilocus genetic linkage maps in humans. Proc Nat Acad Sci USA. 1987;84(8):2363–2367. doi: 10.1073/pnas.84.8.2363. - DOI - PMC - PubMed
    1. Kruglyak L, Daly M, Reeve-Daly M, Lander E. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Human Genet. 1996;58(6):1347–1363. - PMC - PubMed

Publication types

LinkOut - more resources