Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr-Jun;40(2):530-539.
doi: 10.1590/1678-4685-GMB-2016-0180. Epub 2017 May 8.

Evaluation of MC1R high-throughput nucleotide sequencing data generated by the 1000 Genomes Project

Affiliations

Evaluation of MC1R high-throughput nucleotide sequencing data generated by the 1000 Genomes Project

Leonardo Arduino Marano et al. Genet Mol Biol. 2017 Apr-Jun.

Abstract

The advent of next-generation sequencing allows simultaneous processing of several genomic regions/individuals, increasing the availability and accuracy of whole-genome data. However, these new approaches may present some errors and bias due to alignment, genotype calling, and imputation methods. Despite these flaws, data obtained by next-generation sequencing can be valuable for population and evolutionary studies of specific genes, such as genes related to how pigmentation evolved among populations, one of the main topics in human evolutionary biology. Melanocortin-1 receptor (MC1R) is one of the most studied genes involved in pigmentation variation. As MC1R has already been suggested to affect melanogenesis and increase risk of developing melanoma, it constitutes one of the best models to understand how natural selection acts on pigmentation. Here we employed a locally developed pipeline to obtain genotype and haplotype data for MC1R from the raw sequencing data provided by the 1000 Genomes FTP site. We also compared such genotype data to Phase 3 VCF to evaluate its quality and discover any polymorphic sites that may have been overlooked. In conclusion, either the VCF file or one of the presently described pipelines could be used to obtain reliable and accurate genotype calling from the 1000 Genomes Phase 3 data.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Percentage of mismatches observed for data generated by our pipelines (UnifiedGenotyper in gray and HaplotypeCaller in black) as compared to data obtained directly from the VCF concerning 2,504 individuals analyzed by the 1000 Genomes Project.
Figure 2
Figure 2. Percentage of mismatches observed for data generated by our pipelines (UnifiedGenotyper in gray and HaplotypeCaller in black) as compared to data obtained directly from the VCF concerning 178 (UnifiedGenotyper) or 150 (HaplotypeCaller) loci analyzed by the 1000 Genomes Project.

Similar articles

Cited by

References

    1. Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 2011;12:R18–R18. - PMC - PubMed
    1. Aoki K. Sexual selection as a cause of human skin colour variation: Darwin’s hypothesis revisited. Ann Hum Biol. 2002;29:589–608. - PubMed
    1. Brandt DY, Aguiar VR, Bitarello BD, Nunes K, Goudet J, Meyer D. Mapping bias overestimates reference allele frequencies at the HLA genes in the 1000 Genomes Project phase I data. G3. 2015;5:931–941. - PMC - PubMed
    1. Castelli EC, Ramalho J, Porto IO, Lima TH, Felicio LP, Sabbagh A, Donadi EA, Mendes-Junior CT. Insights into HLA-G genetics provided by worldwide haplotype diversity. Front Immunol. 2014;5:476–476. - PMC - PubMed
    1. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, Depristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. - PMC - PubMed