Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Dec;648(8092):117-127.
doi: 10.1038/s41586-025-09680-x. Epub 2025 Oct 15.

The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies

Hsin-Chou Yang #  1   2 Pui-Yan Kwok #  3   4 Ling-Hui Li #  5 Yi-Min Liu #  5 Yuh-Jyh Jong #  6   7   8 Kang-Yun Lee #  9   10 Da-Wei Wang  11 Ming-Fang Tsai  5 Jenn-Hwai Yang  12 Chien-Hsiun Chen  5 Erh-Chan Yeh  5 Chun-Yu Wei  5   13 Cathy S-J Fann  5 Yen-Tsung Huang  12   14   15 Chia-Wei Chen  12 Yi-Ju Lee  12 Shih-Kai Chu  12   16 Chih-Hsing Ho  17 Cheng-Shin Yang  5 Yungling Leo Lee  5 Hung-Hsin Chen  5 Ming-Chih Hou  18 Jeng-Fong Chiou  19   20 Shun-Fa Yang  21   22 Chih-Hung Wang  23   24 Chih-Yang Huang  25   26 Kuan-Ming Chiu  27   28 Ming Chen  29 Fu-Tien Chiang  30   31 Sing-Lian Lee  32 Shiou-Sheng Chen  33   34   35 Wei-Jen Yao  36 Chih-Cheng Chien  37   38 Shih-Yao Lin  39   40 Fu-Pang Chang  39 Hsiang-Ling Ho  39   41 Yi-Chen Yeh  39   40 Wei-Cheng Tseng  42   43   44 Ming-Hwai Lin  45 Hsiao-Ting Chang  43   45 Ling-Ming Tseng  40   46   47 Wen-Yih Liang  39 Paul Chih-Hsueh Chen  39 Jen-Fan Hang  39   40   48 Shih-Chieh Lin  39 Yu-Jiun Chan  39   49   50 Ying-Ju Kuo  39 Lei-Chi Wang  39   40 Chin-Chen Pan  39 Yu-Cheng Hsieh  48   51   52 Yi-Ming Chen  51   52 Tzu-Hung Hsiao  51 Ching-Heng Lin  51 Yen-Ju Chen  51 I-Chieh Chen  51 Chien-Lin Mao  51 Shu-Jung Chang  51 Yen-Lin Chang  53 Yi-Ju Liao  53 Chih-Hung Lai  54 Wei-Ju Lee  52   55 Hsin Tung  52   55 Ting-Ting Yen  56 Hsin-Chien Yen  57 Chun-Ming Shih  9   58   59 Teh-Ying Chou  60   61 Tsan-Hon Liou  62   63 Chen-Yuan Chiang  64   65 Yih-Giun Cherng  66   67 Chih-Hwa Chen  68   69   70 Chao-Hua Chiu  9   71 Sung-Hui Tseng  70   72 Emily Pei-Ying Lin  71   73   74 Ying-Ju Chen  5 Hui-Ping Chuang  5 Tsai-Chuan Chen  5 Wei-Ting Huang  5 Joey Sin  5 I-Ling Liu  5 Yi-Chen Chen  5 Kuo-Kuang Chao  5 Yu-Min Wu  5 Pin-Pin Yu  5 Lung-Pao Chang  5 Kuei-Yao Yen  5 Li-Ching Chang  5 Yi-Jing Sheen  75   76   77 Yuan-Tsong Chen  5 Kamhon Kan  78 Hsiang-Lin Tsai  79 Yao-Kuang Wang  80   81 Ming-Feng Hou  82 Yuan-Han Yang  83 Chao-Hung Kuo  81   84 Wen-Jeng Wu  85 Jee-Fu Huang  86   87   88 Inn-Wen Chong  8   89 Jong-Rung Tsai  90   91 Cheng-Yu Lin  92 Ming-Chin Yu  93 Tsong-Hai Lee  94   95 Meng-Han Tsai  96   97 Yu-Che Ou  98 Pin-Yuan Chen  96   99 Tsung-Hui Hu  100   101 Yu-Chiau Shyu  102   103 Chih-Kuang Cheng  48   104 Yu-Jen Fang  105   106 Song-Chou Hsieh  107 Chien-Hung Chen  105   108   109 Chieh-Chang Chen  110   111 Ko-Jen Li  107 Chin-Hsien Lin  112 Hsien-Yi Chiu  113 Chen-Chi Wu  113 Chun-Yen Chen  114 Shi-Jye Chu  115 Feng-Cheng Liu  115 Fu-Chi Yang  116 Hsin-An Chang  114 Wei-Liang Chen  117 Sung-Sen Yang  118 Yueh-Feng Sung  116 Tso-Fu Wang  119   120 Shinn-Zong Lin  121   122 Yen-Wen Wu  40   123   124 Chien-Sheng Wu  125   126 Ju-Ying Jiang  127 Gwo-Chin Ma  29 Ting-Yu Chang  29 Juey-Jen Hwang  30   31 Kuo-Jang Kao  128 Chen-Fang Hung  128 Ting-Fang Chiu  129   130   131 Po-Yueh Chen  132   133 Kochung Tsui  37   134   135 Ming-Shiang Wu  105   108 See-Tong Pang  96   136 Shih-Ann Chen  77   137   138 Wei-Ming Chen  139 Chun-Houh Chen  12 Wayne Huey-Herng Sheu  75   140   141 Jer-Yuarn Wu  142
Affiliations

The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies

Hsin-Chou Yang et al. Nature. 2025 Dec.

Abstract

Han Chinese people comprise nearly 20% of the global population but remain under-represented in genetic studies1,2, so there is an urgent need for large-scale cohorts to advance precision medicine. Here we present the Taiwan Precision Medicine Initiative (TPMI), established by Academia Sinica in collaboration with 16 major medical centres around Taiwan, which has recruited 565,390 participants who consent to provide DNA samples for genetic profiling and grant access to their electronic medical records (EMRs) for research. EMR access is both retrospective and prospective, allowing longitudinal studies. Genetic profiling is done with population-optimized arrays of single-nucleotide polymorphisms for people of Han Chinese ancestry, which enable genome-wide association3,4, phenome-wide association5,6 and polygenic risk score7,8 studies to be performed to evaluate common disease risk and pharmacogenetic response. Participants also agreed to be re-contacted for future research and receive personalized genetic risk profiles with health management recommendations. The TPMI has established the TPMI Data Access Platform, a central database and analysis platform that both safeguards the security of the data and facilitates academic research. As a large cohort of individuals with non-European ancestry that merges genetic profiles with EMR data and enables longitudinal follow-up, TPMI provides a unique resource that could be used to validate genetic risk prediction models, perform clinical trials of risk-based health management and inform health policies. Ultimately, the TPMI cohort will contribute to global genetic research and serve as a model for population-based precision medicine.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Map of medical centres, their satellite hospitals and sample sizes.
Locations of 16 partner medical centres and 33 affiliated hospitals, along with the numbers of DNA samples, genotyped samples, individuals with EMRs received, individuals with EMRs stored in the TPMI Data Lake and individuals with both genotype and EMR data. Source Data.
Fig. 2
Fig. 2. Cohort characteristics.
a, Sex-specific age distribution. b, Top 20 most prevalent ICD-10 codes: E78 (disorders of lipoprotein metabolism and other lipidaemias), I10 (EHT), E11 (type 2 diabetes mellitus), K21 (gastro-oesophageal reflux disease), J30 (vasomotor and allergic rhinitis), G47 (sleep disorders), K05 (gingivitis and periodontal diseases), N39 (other urinary disorders), M47 (spondylosis), K59 (other functional intestinal disorders), M79 (other and unspecified soft tissue disorders), R10 (abdominal and pelvic pain), H10 (conjunctivitis), I25 (chronic ischaemic heart disease), N40 (enlarged prostate), L30 (other and unspecified dermatitis), I11 (hypertensive heart disease), H04 (lacrimal system disorders), N18 (chronic kidney disease) and R07 (pain in throat or chest). c, Age of onset for the top 20 diseases. Onset ages in male individuals (blue) and female individuals (pink) are presented as box plots, ordered by median. Box plots represent minima, first quartile, median, third quartile and maxima. Values and sample sizes are in the Source Data. d, Top 20 most prevalent laboratory tests: creatinine_B (blood creatinine), WBC (white blood cell count), SGPT (serum glutamic pyruvic transaminase or alanine aminotransferase; S-GPT/ALT), HB (haemoglobin), platelet (platelet count), HCT (haematocrit), RBC (red blood cell count), EGFR (estimated glomerular filtration rate), SGOT (serum glutamic–oxaloacetic transaminase or aspartate aminotransferase; S-GOT/AST), TG (triglyceride), cholesterol_T (Total Cholesterol), BUN (blood urea nitrogen), glucose_AC (fasting glucose), LDL_C (low-density lipoprotein cholesterol), HDL_C (high-density lipoprotein cholesterol), uric acid_B (blood uric acid), HbA1c (haemoglobin A1c), bilirubin_T (bilirubin, total value), albumin and TSH (thyroid-stimulating hormone, measured by enzyme immunoassay or luminescence immunoassay). Left, sex-specific distribution of record counts per individual (winsorized at the 95th percentile); middle, proportion of individuals with test data; right, distribution of average follow-up years. Box plots represent minima, first quartile, median, third quartile and maxima. Values and sample sizes are in the Source Data. e, The top pie chart shows the proportions of related and unrelated samples. The bottom pie chart shows relationship categories: duplicate (DUP) or monozygotic twin (MZ), parent–offspring (PO), full sibling (FS), second degree (2nd) and third degree (3rd). Source Data.
Fig. 3
Fig. 3. Population structure.
a, PCA analysis. The TPMI cohort was compared with TWB, SGDP and 1KGP samples. The top-left inset compares TPMI participants born before 1950 with those born after 1950; the top-right inset compares the TPMI and the 1KGP. The main figure shows TPMI, TWB and two Taiwan Indigenous tribes (SGDP). Admixture fraction plots show ancestry fractions from ten ancestral populations (K = 10), with principal component (PC) 1 on the bottom axis and PC2 on the right. b, Coancestry and fine-scale structure. The coancestry heat map shows individuals (rows, columns) clustered by shared haplotypes, with colour intensity indicating haplotype copying. Darker blue or red indicates higher coancestry; yellow or light orange indicates lower. Diagonal blocks mark within-group sharing: K1–K6 show strong within-group haplotype sharing; K1–K2 (Han Chinese-enriched) exhibit strong coancestry with each other but less with K3–K6 (Indigenous-enriched), reflecting genetic differentiation; K3–K6 form distinct blocks, with some asymmetric sharing suggesting admixture or shared ancestry. The dendrogram shows clustering consistent with subgroup distinctions. c, Admixture graph depicting relationships and gene flow among K1–K6. Solid arrows represent drift edges (genetic drift from ancestral populations); dotted arrows represent admixture, with percentages indicating fractions. Edge numbers denote drift lengths (f2 units). K1 derives around 90% of ancestry from a lineage that also contributes to K2, plus 10% admixture from a lineage related to K6, indicating close K1–K2 affinity with minor Indigenous input. K4 shows around 49% of ancestry from a K5-related lineage (shared with K3) and 51% from a branch that also contributes to K2, reflecting Han–Indigenous admixture. K6 is mostly unadmixed with a long drift branch (f2 = 70), consistent with a highly diverged Indigenous lineage. K5 seems to be ancestral to other Indigenous groups (K3, K4 and possibly indirectly K6), with considerable early divergence (drift = 36 on both edges).
Fig. 4
Fig. 4. Comparison of T2D GWAS results in the TPMI with those from four biobanks.
a, Comparison of GWAS results from the TPMI and the PheWebs of the Biobank Japan (BBJ), China Kadoorie Biobank (CKB), the Korean Genome and Epidemiology Study (KoGES) and the UK Biobank (UKB). A Firth logistic regression was applied for the T2D GWAS in the TPMI. All statistical tests were two-sided. Multiple-testing adjustment was applied using a genome-wide significance threshold of P < 1 × 10−8. Novel T2D-associated SNPs identified in our GWAS but absent in biobanks at the genome-wide significance level are shown. b, Pairwise comparison with each biobank. Different statistical methods were applied across cohorts. For BBJ, CKB and KoGES, a generalized linear mixed model was implemented using SAIGE; for UKB: linear regression was used. All tests were two-sided. Bonferroni correction was applied for multiple-testing adjustment across loci. The four graphs (from top to bottom) show T2D-associated SNPs identified in the TPMI but not in BBJ, CKB, KoGES and UKB, respectively. Source Data.
Fig. 5
Fig. 5. PRS analysis for T2D.
a, AUC of PRS (red curve) and of PRS, age, sex and BMI (blue curve). b, Dose–response effect of PRS levels on the odds ratio (OR) of T2D. Dose–response effect of PRS (red line) and of a combination of PRS, age, sex and BMI (blue curve) with n = 205,779 independent samples. Error bars represent the 95% confidence interval, calculated as exp (β ± Z0.025 × s.e.), based on maximum likelihood estimation from a logistic regression model with different decile intervals of PRS values included as covariates. The estimated coefficient (β) is provided in column B (‘Estimate’) and the standard error (s.e.) is provided in column C (‘Std. Error’) in the Source Data for b. Source Data.

References

    1. Klein, R. J. et al. Complement factor H polymorphism in age-related macular degeneration. Science308, 385–389 (2005). - PMC - PubMed
    1. Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Primers1, 59 (2021).
    1. Denny, J. C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics26, 1205–1210 (2010). - PMC - PubMed
    1. Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol.31, 1102–1110 (2013). - PMC - PubMed
    1. Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun.10, 3328 (2019). - PMC - PubMed

LinkOut - more resources