Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 21;9(1):75.
doi: 10.1038/s41421-023-00582-8.

The STROMICS genome study: deep whole-genome sequencing and analysis of 10K Chinese patients with ischemic stroke reveal complex genetic and phenotypic interplay

Affiliations

The STROMICS genome study: deep whole-genome sequencing and analysis of 10K Chinese patients with ischemic stroke reveal complex genetic and phenotypic interplay

Si Cheng et al. Cell Discov. .

Abstract

Ischemic stroke is a leading cause of global mortality and long-term disability. However, there is a paucity of whole-genome sequencing studies on ischemic stroke, resulting in limited knowledge of the interplay between genomic and phenotypic variations among affected patients. Here, we outline the STROMICS design and present the first whole-genome analysis on ischemic stroke by deeply sequencing and analyzing 10,241 stroke patients from China. We identified 135.59 million variants, > 42% of which were novel. Notable disparities in allele frequency were observed between Chinese and other populations for 89 variants associated with stroke risk and 10 variants linked to response to stroke medications. We investigated the population structure of the participants, generating a map of genetic selection consisting of 31 adaptive signals. The adaption of the MTHFR rs1801133-G allele, which links to genetically evaluated VB9 (folate acid) in southern Chinese patients, suggests a gene-specific folate supplement strategy. Through genome-wide association analysis of 18 stroke-related traits, we discovered 10 novel genetic-phenotypic associations and extensive cross-trait pleiotropy at 6 lipid-trait loci of therapeutic relevance. Additionally, we found that the set of loss-of-function and cysteine-altering variants present in the causal gene NOTCH3 for the autosomal dominant stroke disorder CADASIL displayed a broad neuro-imaging spectrum. These findings deepen our understanding of the relationship between the population and individual genetic layout and clinical phenotype among stroke patients, and provide a foundation for future efforts to utilize human genetic knowledge to investigate mechanisms underlying ischemic stroke outcomes, discover novel therapeutic targets, and advance precision medicine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Summary of the major components of the STROMICS resource and WGS content.
Individuals were recruited from the CNSR-III and underwent a series of standard diagnostic procedures according to the WHO criteria, and acute ischemic stroke was confirmed by MRI or brain CT. An electronic data capture (EDC) system was developed and used for data collection. Clinical phenotypes were extracted from EDC, medical records during hospitalization, biomarker measurement from biological samples, and death registry from the Chinese Center for Disease Control and Prevention (CDC). Individuals were followed up at 3 months, 6 months, and 1–5 years annually. Blood and urine samples were collected in face-to-face visits (baseline and follow-ups), and were stored in the Beijing Tiantan hospital. Omics screening is performed on the blood of the individuals. The number of high-quality genetic variants identified from the WGS is shown. Figure created with BioRender.com.
Fig. 2
Fig. 2. Allele frequency spectrum and functional annotation of the 135.59 million genetic variants among 10,241 individuals from STROMICS.
a The geographical distribution of STROMICS samples in China. b The number and allele frequency spectrum of STROMICS variants (SNVs and indels). Novel and known variants are defined by dbSNP (Materials and methods). AC, allele count. c Length and number distribution of STROMICS variants. The purple line shows the proportion of novel variants. d The total number of variants observed in each functional class of genome. e Relationship between alternative allele count and the number of variants among different functional categories. The function categorization of the genetic variants (All, LoF, splicing, Moderate, Low, ncRNA) was shown in Supplementary Table S8. f Venn diagram showing the concordance of genetic variants among STROMICS, gnomAD, ChinaMAP, NyuWa Genome resource, and WBBC.
Fig. 3
Fig. 3. Population structure and adaptation.
a, b PCA of all the individuals in STROMICS (n = 10,241) colored by seven geographical regions (a) and by 31 provincial divisions (b). Each point represents one participant and is placed according to their eigenvectors. c, d Distribution of the 3 ancestry components in STROMICS participants (n = 10,241) of geographical region (c) and of provincial divisions (d) as inferred using the ADMIXTURE for K = 3. Each color reflects one of the three ancestral components. The proportion of ancestral components for each individual was indicated by a stacked bar. Individuals were organized by provinces along the x-axis. e Genomic signatures under selection along PC1 (upper panel) and PC2 (lower panel). The nearest gene of the lead SNV for each selection signal is indicated. f A geographical distribution of the A allele frequency of the SNV rs1801133 (chr1:11796321) in the MTHFR and CLCN6 loci under genetic selection in the STROMICS population. Provinces with a sample size of < 5 were filled in gray. g, h Linear regression of homocysteine and VB9 (folate) on the three genotypes of rs1801133, respectively, with gender, age, history of stroke, and the day duration between stroke onset and the blood sampling as the covariates.
Fig. 4
Fig. 4. Circular presentation of single variant and gene-based association of stroke-related biochemical, behavioral, and imaging traits.
Chromosomes were indicated by numbered panels 1–22. The –log10 P for single variant and gene-based genetic association with the traits by chromosomal position was shown by the blue and yellow panels, respectively. The significance threshold was P < 2.78 × 10–9 and P < 1.50 × 10–7 for single variant and gene-based association test correcting for the number of traits tested (n = 18) and the number of genes tested (n = 17,464). In the outermost blue panel, genetic loci, namely, the 1 Mbp window centering on the lead SNV, were indicated by the nearest gene of the independent lead SNV in the single variant association analysis. Genetic loci that had not been reported to be associated with the same trait in the GWAS catalog (v1.0.2-e105_r2021-12-21) were marked by a star (*) and shown in bold and orange. In the innermost yellow panel, gene loci that passed the gene-based association test using variants with MAF < 0.005 (Supplementary Table S20) are shown. Additionally, the CETP and PCSK9 gene loci that also passed the gene-based association test using rare functional variants alone (Supplementary Table S21) were marked by number (#) and shown in bold. Color keys in the middle represent the categories of the traits.
Fig. 5
Fig. 5. CADASIL-susceptible variants in STROMICS and neuroimaging of the carriers.
a Distribution and number of carriers of CADASIL-susceptible variants in NOTCH3 protein. The variants are represented by their effect on amino acid in protein sequence according to Human Genome Variant Society Nomenclature. The upper panel shows the number of carriers for each variant. The variants are colored by their functional categorization. The lower panel shows the location of variants in NOTCH3 protein domains. b Violin plots of DWMH and PVH scores among carriers of Cys-altering SNVs in EGFr 1–6, 7–34, and non-EGFr regions. c Frequency of hyperintensities involving temporal lobe and external capsule, lacunes, microbleeds, and brain atrophy among carriers of Cys-altering SNV in EGFr 1–6, 7–34, and non-EGFr regions.

References

    1. GBD 2017 Causes of Death Collaborators. Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980-2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet392, 1736–1788 (2018). - PMC - PubMed
    1. Bak S, Gaist D, Sindrup SH, Skytthe A, Christensen K. Genetic liability in stroke: a long-term follow-up study of Danish twins. Stroke. 2002;33:769–774. doi: 10.1161/hs0302.103619. - DOI - PubMed
    1. Bevan S, et al. Genetic heritability of ischemic stroke and the contribution of previously reported candidate gene and genomewide associations. Stroke. 2012;43:3161–3167. doi: 10.1161/STROKEAHA.112.665760. - DOI - PubMed
    1. Dichgans M, Pulit SL, Rosand J. Stroke genetics: discovery, biology, and clinical applications. Lancet Neurol. 2019;18:587–599. doi: 10.1016/S1474-4422(19)30043-2. - DOI - PubMed
    1. Malik R, et al. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat. Genet. 2018;50:524–537. doi: 10.1038/s41588-018-0058-3. - DOI - PMC - PubMed