Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1993 Nov;53(5):1137-45.

Molecular and statistical approaches to the detection and correction of errors in genotype databases

Affiliations

Molecular and statistical approaches to the detection and correction of errors in genotype databases

L M Brzustowicz et al. Am J Hum Genet. 1993 Nov.

Erratum in

  • Am J Hum Genet 1994 Jun;54(6):1132

Abstract

Errors in genotyping data have been shown to have a significant effect on the estimation of recombination fractions in high-resolution genetic maps. Previous estimates of errors in existing databases have been limited to the analysis of relatively few markers and have suggested rates in the range 0.5%-1.5%. The present study capitalizes on the fact that within the Centre d'Etude du Polymorphisme Humain (CEPH) collection of reference families, 21 individuals are members of more than one family, with separate DNA samples provided by CEPH for each appearance of these individuals. By comparing the genotypes of these individuals in each of the families in which they occur, an estimated error rate of 1.4% was calculated for all loci in the version 4.0 CEPH database. Removing those individuals who were clearly identified by CEPH as appearing in more than one family resulted in a 3.0% error rate for the remaining samples, suggesting that some error checking of the identified repeated individuals may occur prior to data submission. An error rate of 3.0% for version 4.0 data was also obtained for four chromosome 5 markers that were retyped through the entire CEPH collection. The effects of these errors on a multipoint map were significant, with a total sex-averaged length of 36.09 cM with the errors, and 19.47 cM with the errors corrected. Several statistical approaches to detect and allow for errors during linkage analysis are presented. One method, which identified families containing possible errors on the basis of the impact on the maximum lod score, showed particular promise, especially when combined with the limited retyping of the identified families. The impact of the demonstrated error rate in an established genotype database on high-resolution mapping is significant, raising the question of the overall value of incorporating such existing data into new genetic maps.

PubMed Disclaimer

References

    1. Cell. 1987 Aug 14;50(4):565-71 - PubMed
    1. Clin Genet. 1977 Aug;12(2):119-24 - PubMed
    1. Am J Hum Genet. 1983 Mar;35(2):241-62 - PubMed
    1. Proc Natl Acad Sci U S A. 1984 Jun;81(11):3443-6 - PubMed
    1. In Vitro. 1984 Nov;20(11):856-8 - PubMed

Publication types

Substances

LinkOut - more resources