Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar 5:6:8.
doi: 10.3389/fncom.2012.00008. eCollection 2012.

Tracking replicability as a method of post-publication open evaluation

Affiliations

Tracking replicability as a method of post-publication open evaluation

Joshua K Hartshorne et al. Front Comput Neurosci. .

Abstract

Recent reports have suggested that many published results are unreliable. To increase the reliability and accuracy of published papers, multiple changes have been proposed, such as changes in statistical methods. We support such reforms. However, we believe that the incentive structure of scientific publishing must change for such reforms to be successful. Under the current system, the quality of individual scientists is judged on the basis of their number of publications and citations, with journals similarly judged via numbers of citations. Neither of these measures takes into account the replicability of the published findings, as false or controversial results are often particularly widely cited. We propose tracking replications as a means of post-publication evaluation, both to help researchers identify reliable findings and to incentivize the publication of reliable results. Tracking replications requires a database linking published studies that replicate one another. As any such database is limited by the number of replication attempts published, we propose establishing an open-access journal dedicated to publishing replication attempts. Data quality of both the database and the affiliated journal would be ensured through a combination of crowd-sourcing and peer review. As reports in the database are aggregated, ultimately it will be possible to calculate replicability scores, which may be used alongside citation counts to evaluate the quality of work published in individual journals. In this paper, we lay out a detailed description of how this system could be implemented, including mechanisms for compiling the information, ensuring data quality, and incentivizing the research community to participate.

Keywords: open evaluation; post-publication evaluation; replicability; replication.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Replication tracker: search window. Much like any other paper index, Replication Tracker would allow the user to search for papers by author, keyword, and other typical search terms.
Figure 2
Figure 2
Replication tracker: example search results. Results of a search query list relevant papers, along with number of citations and information about the paper’s replicability. This information consists of the number of attempted replications reported to the system, a summary statistic of whether the finding successfully replicates or fails to replicate (“Replicability Score”), and a summary statistic of the strength of the evidence. These numbers are derived from RepLinks, data which is crowd-sourced from users and moderators (Figure 3).
Figure 3
Figure 3
Replication tracker: search results expansion, showing RepLinks for a target paper. Each RepLink represents an attempted replication. Again, the degree of success of the replication (“replication type”) and strength of the evidence is noted. These are determined by aggregating determinations made by individual users (Figure 4).
Figure 4
Figure 4
Replication tracker: expansion of a RepLink, showing ratings by individual readers, which are summarized in Figure 3. Users are also able to add comments, explaining their determinations, or flag posts as irrelevant, prompting review by moderators.

References

    1. Adamic L. A., Zhang J., Bakshy E., Ackerman M. S. (2008). “Knowledge sharing and yahoo answers: everyone knows something,” in Proceedings of the 17th International Conference on World Wide Web, Beijing
    1. Albert P. S., Dodd L. E. (2004). A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics 60, 427–43510.1111/j.0006-341X.2004.00187.x - DOI - PubMed
    1. Armitage P., McPherson C. K., Rowe B. C. (1969). Repeated significance tests on accumulating data. J. R. Stat. Soc. Ser. A Stat. Soc. 132, 235–244
    1. Baayen R. H., Davidson D. J., Bates D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. J. Mem. Lang. 59, 390–41210.1016/j.jml.2007.12.005 - DOI
    1. Bederson B. B., Hu C., Resnik P. (2010). “Translation by interactive collaboration between monolingual users,” in Proceedings of Graphics Interface, Ottawa, 39–46