Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 29:8:e41676.
doi: 10.7554/eLife.41676.

Incidences of problematic cell lines are lower in papers that use RRIDs to identify cell lines

Affiliations

Incidences of problematic cell lines are lower in papers that use RRIDs to identify cell lines

Zeljana Babic et al. Elife. .

Abstract

The use of misidentified and contaminated cell lines continues to be a problem in biomedical research. Research Resource Identifiers (RRIDs) should reduce the prevalence of misidentified and contaminated cell lines in the literature by alerting researchers to cell lines that are on the list of problematic cell lines, which is maintained by the International Cell Line Authentication Committee (ICLAC) and the Cellosaurus database. To test this assertion, we text-mined the methods sections of about two million papers in PubMed Central, identifying 305,161 unique cell-line names in 150,459 articles. We estimate that 8.6% of these cell lines were on the list of problematic cell lines, whereas only 3.3% of the cell lines in the 634 papers that included RRIDs were on the problematic list. This suggests that the use of RRIDs is associated with a lower reported use of problematic cell lines.

Keywords: authentication; cell line; computational biology; none; reproducibility; rigor; software; systems biology; text mining.

PubMed Disclaimer

Conflict of interest statement

ZB, TG No competing interests declared, AC runs the cell bank in Australia and heads the ICLAC consortium. MM, AB heads the RRID project, and founded SciCrunch, a company that supports the RRID project. AB develops the Cellosaurus database. IO works as a consultant for SciCrunch.

Figures

Figure 1.
Figure 1.. Identification of misidentified cell lines.
The number of cell lines used in PubMed Central articles available for text mining is shown as a function of year. The names of cell lines were matched using two criteria, strict and loose. The strict criterion constitutes an exact match where the name used by the researchers and detected by SciScore is on the list of ICLAC register of misidentified cell lines. The loose criterion was calculated by adding a wild-card character (*) to the end of all names found by SciScore, and matching the names and synonyms on the ICLAC list. The graph is divided into two sections: before and after 2012. 2012 was chosen as the year to break the graph because the publication of the authentication standard and the formation of ICLAC occurred that year (Masters, 2012).
Figure 2.
Figure 2.. The prevalence of open access papers containing one or more cell lines found on the problematic list.
Journals are sorted from left to right by the number of cell lines detected by SciScore (only the top 25 journals are shown for presentation purposes; data for all journals is given in Figure 2—source data 1). Each bar represents the percent of cell lines (red) or papers (orange) that are on the problematic cell-line list. Cell line presence on the misidentified list is scored by the edit distance metric, which skips all special characters such as spaces and dashes and assumes that any string that contains the same letters and numbers is an edit distance of 0 (e.g., EF 1 = EF-1). Journals that published papers under a license not allowing text mining are not represented here.
Figure 3.
Figure 3.. The integrity of SciScore for finding papers with cell lines.
A manual review of 1,003 papers from the journal Scientific Reports showed a 95% agreement between the curator and the SciScore algorithm. Both the curator and SciScore detected a cell line in 138 articles, and no cell line in 822. Of 1,003 papers, 50 represent a disagreement (false positives and false negatives).
Figure 4.
Figure 4.. The warning message on the RRID portal and the Cellosaurus database present a misidentified cell line, COLO 720E.
However, this warning does not originate at either the Cellosaurus (the naming resource for problematic cell lines) or the RRID sites; it simply reflects the information available at ICLAC.org. ICLAC members examine publications and test data to reach a conclusion, and then disseminate this on their website via a spreadsheet available to everyone for download. The Cellosaurus database picks up these data, working closely with ICLAC, and updates their entries. The data are then passed to the RRID portal, where it is displayed for researchers searching for cell lines, among other resources. Cellosaurus and the RRID portal strive to make all new data available as quickly as possible.
Figure 5.
Figure 5.. An example of a public annotation using the hypothes.is platform.
Note, all data made in the public channel, such as RRID resolution data, are ported daily to the CrossRef Event database for developers, providing additional ways of making these data FAIR (that is, findable, accessible, interoperable and reusable). Information about this cell line is accessible to readers with one click, including papers that use the cell line and original reference. For journals like eLife, which typeset the RRIDs with live links, hypothesis is not necessary to access the information about cell lines. Based on the paper Liao et al., 2017 using the hypothes.is platform.
Figure 6.
Figure 6.. Percentages of papers with cell lines found on the problematic list.
The "auto.detect.cell" lines data come from the edit distance metric, same as Figure 2; n=305,161; the RRID cell lines are based on 1,502 cell lines. The "auto.detect" papers percentage is based on n=150,459 unique papers, where the problematic cell-line list is detected based on the edit distance metric. The RRID papers percentage is based on n=634 papers.

References

    1. American Type Culture Collection Standards Development Organization Workgroup ASN-0002 Cell line misidentification: the beginning of the end. Nature Reviews Cancer. 2010;10:441–448. doi: 10.1038/nrc2852. - DOI - PubMed
    1. ATCC Authentication of human cell lines: standardization of STR profiling. [December 21, 2018];2011 https://webstore.ansi.org/RecordDetail.aspx?sku=ANSI%2FATCC+ASN-0002-2011
    1. Bairoch A. The cellosaurus, a Cell-Line knowledge resource. Journal of Biomolecular Techniques : JBT. 2018;29:25–38. doi: 10.7171/jbt.18-2902-002. - DOI - PMC - PubMed
    1. Bandrowski A, Brush M, Grethe JS, Haendel MA, Kennedy DN, Hill S, Hof PR, Martone ME, Pols M, Tan SC, Washington N, Zudilova-Seinstra E, Vasilevsky N, RINL Resource Identification Initiative The Resource Identification Initiative: a cultural shift in publishing. Brain and Behavior. 2016;6:e00417. doi: 10.1002/brb3.417. - DOI - PMC - PubMed
    1. Bandrowski AE, Martone ME. RRIDs: a simple step toward improving reproducibility through rigor and transparency of experimental methods. Neuron. 2016;90:434–436. doi: 10.1016/j.neuron.2016.04.030. - DOI - PMC - PubMed

Publication types