Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 12;109(24):9551-6.
doi: 10.1073/pnas.1200019109. Epub 2012 May 24.

Data-driven unbiased curation of the TP53 tumor suppressor gene mutation database and validation by ultradeep sequencing of human tumors

Affiliations

Data-driven unbiased curation of the TP53 tumor suppressor gene mutation database and validation by ultradeep sequencing of human tumors

Karolina Edlund et al. Proc Natl Acad Sci U S A. .

Abstract

Cancer mutation databases are expected to play central roles in personalized medicine by providing targets for drug development and biomarkers to tailor treatments to each patient. The accuracy of reported mutations is a critical issue that is commonly overlooked, which leads to mutation databases that include a sizable number of spurious mutations, either sequencing errors or passenger mutations. Here we report an analysis of the latest version of the TP53 mutation database, including 34,453 mutations. By using several data-driven methods on multiple independent quality criteria, we obtained a quality score for each report contributing to the database. This score can now be used to filter for high-confidence mutations and reports within the database. Sequencing the entire TP53 gene from various types of cancer using next-generation sequencing with ultradeep coverage validated our approach for curation. In summary, 9.7% of all collected studies, mostly comprising numerous tumors with multiple infrequent TP53 mutations, should be excluded when analyzing TP53 mutations. Thus, by combining statistical and experimental analyses, we provide a curated mutation database for TP53 mutations and a framework for mutation database analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
TP53 mutation heterogeneity. (A) Box-and-whisker analysis of mutant TP53 activity according to origin. The y axis corresponds to the transcriptional activity of TP53 mutants as reported by Kato et al., and included in the UMD TP53 database (17, 18). Box-and-whisker plots show the upper and lower quartiles and range (box), median value (horizontal line inside the box), and full-range distribution (whisker line) for TP53 activity. All: entire database; tumors: tumors only; cell lines: cell lines only; germline: germ line only. For germ-line mutations, the R337H mutation, very frequently found in patients with adrenocortical carcinoma in Brazil, was only added once to the database because it has been shown to be a founder mutation. The Mann–Whitney U test was used to evaluate statistical significance. N.S., not significant. (B) Distribution of the EVT criterion. The number of TP53 mutations per tumor is very heterogeneous. The EVT criterion ranges from 1 to 5.7 with 660 publications with a value of 1 and 26 publications with a value greater than 2. This heterogeneity is not cancer-specific and can be observed in all types of neoplasia. (C) Activity of mutant TP53 in tumors with only one mutation (SM), two mutations (DM), or more than two mutations (MM). The Mann–Whitney U test was used to evaluate statistical significance. NS, not significant; **P < 0.001; ***P < 0.0001. A log scale was used for the y axis.
Fig. 2.
Fig. 2.
Quality criteria profiles for outlier studies. Scaled data from all parameters were collected from those studies tagged as outliers (129) and used for hierarchical clustering analysis. Green indicates positive scaled values and red indicates negative scaled values. Four clusters were identified, all including publications with a large number of infrequent TP53 mutants. Cluster A presented high values for most criteria and low values for FREQ, identifying outliers with the highest SD in the PCA. Cluster B was predominantly composed of tumors with a high frequency of two mutations (T2). Cluster C was driven by publications with an unusually large number of tumors with synonymous mutations (WT), tumors with two mutations (T2), and with high TP53 activity (ACT). Cluster D included tumors with unusual hot-spot mutations (REC). Interestingly, two publications with a high FREQ criterion were also identified (red asterisks at the bottom of the figure), but low values were observed for the other criteria. Examination of these two publications showed that they included only mutants at hot-spot codons 175, 248, and 273. No methodological bias was observed in these reports.
Fig. 3.
Fig. 3.
Ranking TP53 reports in colorectal and breast cancer. For each publication describing TP53 mutations in colorectal and breast cancer, the mean (dots) and 99% confidence interval (bars) of TP53 activity were graphically displayed. Data for all studies on colorectal (A) or breast cancer (B) are shown on the far right of the graph. The y axis corresponds to TP53 transactivation activity, with a value of −1.23 for the negative control and a value of 2.03 for 100% of wild-type activity (see SI Materials and Methods and Fig. S2). A publication code is indicated on the x axis. Studies are presented from left to right in decreasing order using data from the PCA. A green box indicates outlier studies obtained by PCA, whereas studies displayed in red are outliers detected exclusively by using the ACT criterion. PCA analysis was performed using either all criteria including ACT (+ACT) or without ACT (−ACT). A change of status was observed for only one study (2018) in breast cancer. The mean activity of TP53 mutants described in this publication is the highest for breast cancer indicating that the ACT parameter was a strong component in the analysis (distance from the median decreased from 2.2 to 1.8 SD). No changes were observed for colorectal cancer.

References

    1. Bamford S, et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004;91:355–358. - PMC - PubMed
    1. Forbes SA, et al. COSMIC: Mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39(Database issue):D945–D950. - PMC - PubMed
    1. Meyerson M, Gabriel S, Getz G. Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet. 2010;11:685–696. - PubMed
    1. Chanock SJ, Thomas G. The devil is in the DNA. Nat Genet. 2007;39:283–284. - PubMed
    1. Soussi T. Advances in carcinogenesis: A historical perspective from observational studies to tumor genome sequencing and TP53 mutation spectrum analysis. Biochim Biophys Acta. 2011;1816:199–208. - PubMed

Publication types