Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 2;12(9):2044-2057.
doi: 10.1158/2159-8290.CD-21-1547.

AACR Project GENIE: 100,000 Cases and Beyond

Affiliations

AACR Project GENIE: 100,000 Cases and Beyond

Trevor J Pugh et al. Cancer Discov. .

Abstract

The American Association for Cancer Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) is an international pan-cancer registry with the goal to inform cancer research and clinical care worldwide. Founded in late 2015, the milestone GENIE 9.1-public release contains data from >110,000 tumors from >100,000 people treated at 19 cancer centers from the United States, Canada, the United Kingdom, France, the Netherlands, and Spain. Here, we demonstrate the use of these real-world data, harmonized through a centralized data resource, to accurately predict enrollment on genome-guided trials, discover driver alterations in rare tumors, and identify cancer types without actionable mutations that could benefit from comprehensive genomic analysis. The extensible data infrastructure and governance framework support additional deep patient phenotyping through biopharmaceutical collaborations and expansion to include new data types such as cell-free DNA sequencing. AACR Project GENIE continues to serve a global precision medicine knowledge base of increasing impact to inform clinical decision-making and bring together cancer researchers internationally.

Significance: AACR Project GENIE has now accrued data from >110,000 tumors, placing it among the largest repository of publicly available, clinically annotated genomic data in the world. GENIE has emerged as a powerful resource to evaluate genome-guided clinical trial design, uncover drivers of cancer subtypes, and inform real-world use of genomic data. This article is highlighted in the In This Issue feature, p. 2007.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
AACR Project GENIE 9.1-public release summary. A, Linear growth in the number of samples in each public release of registry data (green bars); releases 1.0.1 through 4.1-public contained data from the eight founding institutions. The 5.0-public release was the first to contain data from new participating institutions, while the 6.0-public release was the first to contain data from all new participating institutions. Some site data were subsequently removed for quality reasons, resulting in the 6.2-public release (yellow star); the 9.1-public release (black arrow) is the version on which this article is based. The total number of mutations per release (blue bars), copy-number alterations (gray bars), and fusions (purple bars, structural variants) are also shown. A spike in the number of mutations in the 5.0-public release was subsequently corrected in the 6.2-public release after adjustment of centralized data filtering. The number of institutions providing fusion data (purple bars) has increased from three beginning with the 5.0-public release to six in the 9.1-public release (Supplementary Fig. S1); the large spike observed in the 7.0-public release and moving forward reflects the clearing of a backlog at a major contributing institution. B, The overview of the 9.1-public release in cBioPortal shows the top 11 cancer types and detailed cancer types (panels 1 and 2, respectively); the source of the sequenced sample (3); the age distribution of the patients whose samples were sequenced (4); the sex and race distribution (5 and 6, respectively); as well as the most frequent copy-number alterations, fusions, and mutated genes (7, 8, and 9, respectively). Panel 8 lists the genes most frequently subject to a gene fusion, with the specific partner genes for individual fusions explorable through the Patient or Query views in cBioPortal. The full cohort can be explored at https://genie.cbioportal.org.
Figure 2. Workflow for variant filtration from site upload to analysis ready calls. The flowchart depicts the processing workflow of the GENIE data from the sites to the final data release. Sites prepare, filter, and upload data according to a prespecified format to the Synapse platform. Automated processes perform quality assurance checks and harmonize data across sites by mapping clinical data and genomic variants to standardized terminologies. Harmonized files representing patient, sample, mutation, and other information are then processed through sample and variant filters to remove out-of-scope data or potential artifacts. After filtering, final quality control checks are performed, and the public data releases are made available to users on Synapse and cBioPortal.
Figure 2.
Workflow for variant filtration from site upload to analysis ready calls. The flowchart depicts the processing workflow of the GENIE data from the sites to the final data release. Sites prepare, filter, and upload data according to a prespecified format to the Synapse platform. Automated processes perform quality assurance checks and harmonize data across sites by mapping clinical data and genomic variants to standardized terminologies. Harmonized files representing patient, sample, mutation, and other information are then processed through sample and variant filters to remove out-of-scope data or potential artifacts. After filtering, final quality control (QC) checks are performed, and the public data releases are made available to users on Synapse and cBioPortal. BED, browser extensible data; MAF, mutation annotation format; PHI, protected health information; VCF, variant call format.
Figure 3. General GENIE pipeline (Journey Map). GENIE data go through four distinct processes to ensure high-quality data reach the end users; responsibilities are shared by consortium member functional teams. During preprocessing (blue lane), data are formatted, filtered, and checked at the center prior to upload; Sage validates data received and issues are communicated back to the providers; if necessary, AACR communicates critical messages to centers contributing data. Sage processes (green lane) the collected data monthly, including reannotating variants using Genome Nexus and consistent formatting for release. Processed data are released (yellow lane) to the consortium for review. Upon release (red lane), all stakeholders participate in cross-functional team communication about potential quality issues and fixes prior to lock and public release (not shown).
Figure 3.
General GENIE pipeline (Journey Map). GENIE data go through four distinct processes to ensure that high-quality data reach the end users; responsibilities are shared by consortium member functional teams. During preprocessing (blue lane), data are formatted, filtered, and checked at the center prior to upload; Sage validates data received and issues are communicated to the providers; and if necessary, AACR communicates critical messages to centers contributing data. Sage processes (green lane) the collected data monthly, including reannotating variants using Genome Nexus and consistent formatting for release. Processed data are released (yellow lane) to the consortium for review. Upon release (red lane), all stakeholders participate in cross-functional team communication about potential quality issues and fixes prior to lock and public release (not shown). QC, quality control.
Figure 4.
Figure 4.
NCI-MATCH + GENIE. A, Results per substudy of NCI-MATCH showing the number of patients matching based on the GENIE 9.1 release and the proportion of matches for each of the top 10 most frequently matched cancer types based on top-level OncoTree codes. The number of patients who matched to each substudy in the first GENIE article is also provided for comparison. amp, amplification; CNS, central nervous system; CUP, cancer of unknown primary; del, deletion; mut, mutation. B, The overall percentage of patients who match the eligibility for each substudy for the latest GENIE cohort compared with reported results for NCI-MATCH. A linear regression, shown in blue, shows a high correlation with an r-squared of 0.62. C, Overall frequency of the eight most common cancer types among patients in the GENIE cohort compared with the frequency among 5,540 patients screened through NCI-MATCH. 95% confidence intervals are shown.
Figure 5. Actionability-sensitizing and resistance alterations. Tumor types are shown by decreasing overall frequency of actionable therapeutic sensitizing alterations on the top, whereas the frequency of alterations associated with therapeutic resistance is shown below. Actionable sensitizing alterations were defined by the OncoKB knowledge base, whereas resistance alterations include actionable alterations from OncoKB and alterations with emerging evidence curated from the COSMIC database and the scientific literature. For resistance alterations, additional information showing genes and percentage of samples mutated are included below each bar. This analysis includes the top 30 tumor types in GENIE by sample count.
Figure 5.
Actionability—sensitizing and resistance alterations. Tumor types are shown by decreasing overall frequency of actionable therapeutic sensitizing alterations on the top, whereas the frequency of alterations associated with therapeutic resistance is shown below. Actionable sensitizing alterations were defined by the OncoKB knowledge base, whereas resistance alterations include actionable alterations from OncoKB and alterations with emerging evidence curated from the COSMIC database and the scientific literature. For resistance alterations, additional information showing genes and percentage of samples mutated are included below each bar. This analysis includes the top 30 tumor types in GENIE by sample count.
Figure 6. The somatic mutational landscape of rare tumor subtypes. A, Strategy for the identification of rare tumor subtypes, using cancers annotated under the top-level “Pancreas” OncoTree node as an example. Terminal OncoTree nodes with fewer than 50 associated sequenced samples (colored red) were included in the rare tumor analysis. B, Heatmap showing the distribution of the proportion of nonsilent mutations across rare tumor sites. For brevity, only tumor subtypes with more than 40 samples sequenced are included and driver genes with a mutational prevalence less than 10% across all analyzed tumor subtypes have been omitted. C, Mutational plots showing high frequency of mutations in uncommonly mutated driver genes: DICER1 (tumor suppressor gene) and CTNNB1 (oncogene).
Figure 6.
The somatic mutational landscape of rare tumor subtypes. A, Strategy for the identification of rare tumor subtypes, using cancers annotated under the top-level “Pancreas” OncoTree node as an example. Terminal OncoTree nodes with fewer than 50 associated sequenced samples (colored red) were included in the rare tumor analysis. IPMN, intraductal papillary mucinous neoplasm; MCN, mucinous cystic neoplasm; PAAC, acinar cell carcinoma of the pancreas; PAAD, pancreatic adenocarcinoma; PAASC, adenosquamous carcinoma of the pancreas; PACT, cystic tumor of the pancreas; PANET, pancreatic neuroendocrine tumor; PB, pancreatoblastoma; PSC, serous cystadenoma of the pancreas; SPN, solid pseudopapillary neoplasm of the pancreas; UCP, undifferentiated carcinoma of the pancreas. B, Heat map showing the distribution of the proportion of nonsilent mutations across rare tumor sites. For brevity, only tumor subtypes with more than 40 samples sequenced are included and driver genes with a mutational prevalence less than 10% across all analyzed tumor subtypes have been omitted. PNS, peripheral nervous system. C, Mutational plots showing high frequency of mutations in uncommonly mutated driver genes: DICER1 (tumor suppressor gene) and CTNNB1 (oncogene).

Comment in

  • doi: 10.1158/2159-8290.CD-12-9-ITI

References

    1. AACR Project GENIE Consortium. AACR project GENIE: powering precision medicine through an international consortium. Cancer Discov 2017;7:818–31. - PMC - PubMed
    1. Smyth LM, Zhou Q, Nguyen B, Yu C, Lepisto EM, Arnedos M, et al. . Characteristics and outcome of AKT1E17K-mutant breast cancer defined through AACR project GENIE, a clinicogenomic registry. Cancer Discov 2020;10:526–35. - PMC - PubMed
    1. LeNoue-Newton ML, Chen SC, Stricker T, Hyman DM, Blauvelt N, Bedard PL, et al. . Natural history and characteristics of ERBB2-mutated hormone receptor–positive metastatic breast cancer: a multi-institutional retrospective case–control study from AACR project GENIE. Clin Cancer Res 2022;28:2118–30. - PubMed
    1. Mahal BA, Alshalalfa M, Kensler KH, Chowdhury-Paulino I, Kantoff P, Mucci LA, et al. . Racial differences in genomic profiling of prostate cancer. N Engl J Med 2020;383:1083–5. - PMC - PubMed
    1. Holowatyj AN, Eng C, Wen W, Idrees K, Guo X. Spectrum of somatic cancer gene variations among adults with appendiceal cancer by age at disease onset. JAMA Netw Open 2020;3:e2028644. - PMC - PubMed

Publication types

Substances