Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb 6;94(2):257-67.
doi: 10.1016/j.ajhg.2014.01.005.

A statistical framework to guide sequencing choices in pedigrees

Affiliations

A statistical framework to guide sequencing choices in pedigrees

Charles Y K Cheung et al. Am J Hum Genet. .

Abstract

The use of large pedigrees is an effective design for identifying rare functional variants affecting heritable traits. Cost-effective studies using sequence data can be achieved via pedigree-based genotype imputation in which some subjects are sequenced and missing genotypes are inferred on the remaining subjects. Because of high cost, it is important to carefully prioritize subjects for sequencing. Here, we introduce a statistical framework that enables systematic comparison among subject-selection choices for sequencing. We introduce a metric "local coverage," which allows the use of inferred inheritance vectors to measure genotype-imputation ability specifically in a region of interest, such as one with prior evidence of linkage. In the absence of linkage information, we can instead use a "genome-wide coverage" metric computed with the pedigree structure. These metrics enable the development of a method that identifies efficient selection choices for sequencing. As implemented in GIGI-Pick, this method also flexibly allows initial manual selection of subjects and optimizes selections within the constraint that only some subjects might be available for sequencing. In the present study, we used simulations to compare GIGI-Pick with PRIMUS, ExomePicks, and common ad hoc methods of selecting subjects. In genotype imputation of both common and rare alleles, GIGI-Pick substantially outperformed all other methods considered and had the added advantage of incorporating prior linkage information. We also used a real pedigree to demonstrate the utility of our approach in identifying causal mutations. Our work enables prioritization of subjects for sequencing to facilitate dissection of the genetic basis of heritable traits.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sequencing Choices Affect the Percentage of Alleles Called Founder chromosomes and copies of those same founder chromosomes in descendants are labeled with numbers, and alleles of genotypes are labeled with letters. Observed alleles are in bold black, and imputed alleles are in nonbold blue. Vertical lines represent alleles that can be phased unambiguously to FGLs. Subjects who were selected for sequencing are indicated by shading. Three subject-selection choices are presented: (A) parent and child are selected, and the child is homozygous for the marker, (B) founder spouses are selected, and both are heterozygous for the marker, and (C) parent and child are selected, and both are heterozygous for the marker.
Figure 2
Figure 2
Joint-Prioritized Subject-Selection Method In this example, the number of templates to keep (γ) is 2. In the first selection, the method computes coverage for each subject (a–h). Subject e has the highest coverage, and subject c has the second-highest coverage, so they are kept as templates. In the second selection, the method considers adding another subject to each template, e.g., (e, a), (e, b), (e, c), (e, d), (e, f), (e, g), (e, h), (c, a), (c, b), (c, d), (c, f), (c, g), and (c, h). Set (e, g) gives the highest coverage, and set (e, h) gives the second-highest coverage, so they are kept as templates for the third selection. This scheme repeats until the desired number of subjects is selected. After the third step, sets (e, g, d) and (e, h, d) give the highest and second-highest coverages, respectively. If a total of three subjects are desired, set (e, g, d) becomes the final selection.
Figure 3
Figure 3
Real Pedigree Used for Subject Selection Affected subjects are shaded, and subjects available for sequencing are underlined. Only subjects with some genotype data or descendants with genotype data were included. Some subjects were omitted from this figure for the protection of confidentiality.
Figure 4
Figure 4
Sensitivity of Calling Rare Alleles as a Function of the Number of Subjects Selected Programs (solid lines) are as follows: (A) GIGI (local), (B) GIGI (GW), (C) PRIMUS, and (D) ExomePicks. Ad hoc schemes (dashed lines) are as follows: (E) bottom only and (F) bottom and parents. Refer to Figures S3 and S4 for the actual subjects selected. The “bottom and parents” and “bottom-only” designs had the same sensitivity in the first six selected subjects because the subjects selected were the same until the seventh choice.
Figure 5
Figure 5
Sensitivity Computed for Different Selection Methods against the Distribution from 200 Samples of Random Subject Selection for Seven Subjects Selected The histogram describes the distribution of sensitivity values from samples of random subject selection. Subject-selection methods are compared against random subject selection, and the locations of the lines indicate the sensitivity of the methods. Programs are depicted by solid lines, and ad hoc schemes are represented by dashed lines.
Figure 6
Figure 6
Correlation between Imputation Performance and Estimated Coverage for Seven Subjects Selected Per-data-set accuracy (A) and sensitivity (B) versus local coverage computed for data set 1 and average accuracy (C) and sensitivity (D) versus genome-wide coverage computed across ten data sets.

Similar articles

Cited by

References

    1. Amberger J., Bocchini C., Hamosh A. A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) Hum. Mutat. 2011;32:564–567. - PubMed
    1. Manolio T.A., Brooks L.D., Collins F.S. A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 2008;118:1590–1605. - PMC - PubMed
    1. Wijsman E.M. The role of large pedigrees in an era of high-throughput sequencing. Hum. Genet. 2012;131:1555–1563. - PMC - PubMed
    1. Ott J., Kamatani Y., Lathrop M. Family-based designs for genome-wide association studies. Nat. Rev. Genet. 2011;12:465–474. - PubMed
    1. Sobreira N.L.M., Cirulli E.T., Avramopoulos D., Wohler E., Oswald G.L., Stevens E.L., Ge D.L., Shianna K.V., Smith J.P., Maia J.M. Whole-genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS Genet. 2010;6:e1000991. - PMC - PubMed

Publication types