Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug 21:6:8018.
doi: 10.1038/ncomms9018.

Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals

Collaborators, Affiliations

Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals

Masao Nagasaki et al. Nat Commun. .

Abstract

The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1. SNVs in 1KJPN.
(a) Statistics on read depth in 1KJPN. The vertical bars indicate the minimum and maximum depth of the number of sequence reads on each individual after filtering. They were sorted according to the average sequenced read depth (the black line). (b) The plot shows the power to detect SNVs (blue) of the confidence SNVs and the mean r2 values before (yellow) and after (orange) filtering with SNP array data for the same sample on non-reference allele counts ranging from 1 to 50. The r2 between genotypes from the SNVs in 1KJPN and the SNP array data is given by the squared Pearson correlation. (c) The numbers of novel and known SNVs in each MAF bin. The novel SNV frequency begins to dominate for lower MAFs. (d) The rate of variant discovery by minimum MAF in the 1KJPN population. The rates of variant discovery in our sequencing strategy were plotted against minimum MAF in the 1KJPN population by different sampling size. The distribution of population MAF was estimated on the basis of the demographic model shown in Supplementary Fig. 3.
Figure 2
Figure 2. The impact of very-rare variants on genomic regions and functional categories.
(a) The SFSs of intergenic region for SNVs of 1KJPN (blue) and 1KGP (red). (b) The numbers of SNVs observed in 1KJPN and 1KGP are depicted as four functional categories. The fraction of very-rare variants observed in 1KJPN are depicted with 95% binomial confidence interval according to (c) genomic region, (d) probable consequences for coding regions, (e) in noncoding regions and (f) for scaled C scores. Because the number of genotyped individuals in the confidence SNVs is different among sites because of the individual depth filter, we applied a hypergeometric projection, which subsamples each variant down to a sample size of 963 (90% of 1,070 samples) to obtain the SFSs of the confidence SNVs for a,cf.
Figure 3
Figure 3. Properties of genomic variation discovered in 1KJPN.
(a) The size-frequency spectrum of SNVs, deletions and insertions discovered by high-coverage sequencing in 1KJPN. Novelty rates are shown by the red line. Peaks corresponding to long interspersed elements (LINE), Alu and microsatellite repeat (MSR) are shown. (b) Size-frequency spectrum of CNVs estimated from high-coverage sequencing data in the genic regions in 1KJPN. (c) Histograms and scatterplot of diploid copy numbers of AMY1 genes (blue) and region X (red) in 1KJPN. A diagram depicting the positions of AMY1A, Region X, AMY1B and AMY1C on chromosome 1 of GRCh37 is shown in the right top. (d) Allele frequencies for HLA-A in 1,070 individuals in 1KJPN estimated by high-coverage sequencing (blue), and 1,018 Japanese individuals typed by PCR-SSOP (red).
Figure 4
Figure 4. Imputation with the Japanese reference panel.
(a) Comparison of imputation performance (r2) for four reference panels: 1,070 individuals in 1KJPN (1KJPN), 1,092 cosmopolitan samples in 1KGP (1KGP ALL), 1KJPN plus 1KGP ALL (1KJPN+1KGP ALL) and 89 Japanese individuals in 1KGP (1KGP JPT). The x axis represents the MAF of each panel. The y axis represents the averaged r2 at SNV sites that exist in both the cosmopolitan samples of 1KGP and 1KJPN. (b) A Manhattan plot of P values from GWAS of MMD. The SNV sites from the original data set and imputed markers are plotted as dots in magenta and grey, respectively. Blue and red lines display the significance threshold of the original and imputed results, respectively. Only one significant signal was identified on chromosome 17. (c) A plot of P values from GWAS of MMD with the original (non-imputed; upper panel) and imputed (lower panel) data set around the SNP exhibiting the significant signal in b. In the imputed result, the SNP with the highest association is a nonsynonymous variant of RNF213, and was reported as one of the MMD-causing variants in the original study. In contrast, from the non-imputed result the SNP with the highest association is located in the coding region of ENDOV.

References

    1. Lander E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). - PubMed
    1. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005). - PMC - PubMed
    1. Ozaki K. et al. Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat. Genet. 32, 650–654 (2002). - PubMed
    1. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and. Nature 447, 661–678 (2007). - PMC - PubMed
    1. Reich D. E. & Lander E. S. On the allelic spectrum of human disease. Trends Genet. 17, 502–510 (2001). - PubMed

Publication types