Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Aug 5;8(8):e70151.
doi: 10.1371/journal.pone.0070151. Print 2013.

Filtering for compound heterozygous sequence variants in non-consanguineous pedigrees

Affiliations

Filtering for compound heterozygous sequence variants in non-consanguineous pedigrees

Tom Kamphans et al. PLoS One. .

Abstract

The identification of disease-causing mutations in next-generation sequencing (NGS) data requires efficient filtering techniques. In patients with rare recessive diseases, compound heterozygosity of pathogenic mutations is the most likely inheritance model if the parents are non-consanguineous. We developed a web-based compound heterozygous filter that is suited for data from NGS projects and that is easy to use for non-bioinformaticians. We analyzed the power of compound heterozygous mutation filtering by deriving background distributions for healthy individuals from different ethnicities and studied the effectiveness in trios as well as more complex pedigree structures. While usually more then 30 genes harbor potential compound heterozygotes in single exomes, this number can be markedly reduced with every additional member of the pedigree that is included in the analysis. In a real data set with exomes of four family members, two sisters affected by Mabry syndrome and their healthy parents, the disease-causing gene PIGO, which harbors the pathogenic compound heterozygous variants, could be readily identified. Compound heterozygous filtering is an efficient means to reduce the number of candidate mutations in studies aiming at identifying recessive disease genes in non-consanguineous families. A web-server is provided to make this filtering strategy available at www.gene-talk.de.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Tom Kamphans, who is affiliated to SmartAlgos, is a self-employed software developer and consultant (SmartAlgos). This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Compound Heterozygote Filtering Rules.
If both parents of the index patient are unaffected it is not possible that one of the heterozygous disease causing mutations is present in a heterozygous state in both parents unless a recombination occurred between this variant and the second compound heterozygous mutation.
Figure 2
Figure 2. Exomes of 85 European individuals (CEU) as well as 88 African individuals (YRI) were filtered for rare compound heterozygous candidate variants.
A) In average around 230 variants pass the filter in CEU exomes and 309 in YRI exomes. B) The potential compound heterozygotes are distributed over 31 genes in CEU individuals and 67 genes in YRI individuals. C) Altogether 1998 genes harbored potential compound heterozygous variants in the tested individuals and compound heterozygotes in 1066 genes occurred only in singular cases.
Figure 3
Figure 3. Filtering results for compound heterozyotes in a case study.
With the filter settings for genotype frequency <0.01, effect on protein level (functional filter: missense, nonsense, stop loss, splice site, insertions or deletions), and compound heterozygous yields six variants in three genes. MUC16 and NBPR10 are both genes from large gene families known for their high variability and detection artifacts due to pseudogenes. The heterozygotes in PIGO remain as the likeliest candidates. The Show icon at the right end of the line links to an expert curated annotation database that indicates that the mutation in PIGO is causing Hyperphosphatasia with mental retardation syndrome and has been published in . The gene view for PIGO lists all variant annotations for this gene and links to further knowledge bases. The length of the coding sequence of the longest transcript (max. CDS) and the mean number of rare heterozygous variant calls per exome (MRHC) are important parameters for the assessment of candidate genes.
Figure 4
Figure 4. The length of the coding sequence and the mean number of rare alleles per gene.
In an average healthy individual from the 5000 exomes project there is more than one rare heterozygous variant in MUC16 that has an allele frequency below 0.01 in the reference population. In contrast, the coding sequence of PIGO is much shorter and rare heterozygous variants occur in less than 8 out of 1000 exomes.
Figure 5
Figure 5. Illustration of mapping artifacts resulting in false positive variant detection.
The illustrated sample carries a mutation in the maternal copy of a pseudogene of NBPF10. If the pseudogene is not included in the reference sequence, the reads originating from this pseudogene are mismapped. This may result in a false variant call. Indicative for false genotype calls are proportions of reads supporting the alternate allele that strongly deviate from 0.5 or 1.

References

    1. Kamphans T, Krawitz PM (2012) GeneTalk: an expert exchange platform for assessing rare sequence variants in personal genomes. Bioinformatics 28: 2515–2516. - PMC - PubMed
    1. Neale BM, Kou Y, Liu L, Ma'ayan A, Samocha KE, et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485: 242–245. - PMC - PubMed
    1. Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, et al. (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328: 636–639. - PMC - PubMed
    1. Haider S, Ballester B, Smedley D, Zhang J, Rice P, et al. (2009) BioMart Central Portal–unified access to biological data. Nucleic Acids Res 37: W23–27. - PMC - PubMed
    1. A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. - PMC - PubMed

Publication types

LinkOut - more resources