Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May 2:2012:bas024.
doi: 10.1093/database/bas024. Print 2012.

Directly e-mailing authors of newly published papers encourages community curation

Collaborators, Affiliations

Directly e-mailing authors of newly published papers encourages community curation

Stephanie M Bunt et al. Database (Oxford). .

Abstract

Much of the data within Model Organism Databases (MODs) comes from manual curation of the primary research literature. Given limited funding and an increasing density of published material, a significant challenge facing all MODs is how to efficiently and effectively prioritize the most relevant research papers for detailed curation. Here, we report recent improvements to the triaging process used by FlyBase. We describe an automated method to directly e-mail corresponding authors of new papers, requesting that they list the genes studied and indicate ('flag') the types of data described in the paper using an online tool. Based on the author-assigned flags, papers are then prioritized for detailed curation and channelled to appropriate curator teams for full data extraction. The overall response rate has been 44% and the flagging of data types by authors is sufficiently accurate for effective prioritization of papers. In summary, we have established a sustainable community curation program, with the result that FlyBase curators now spend less time triaging and can devote more effort to the specialized task of detailed data extraction. Database URL: http://flybase.org/

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Literature curation pipeline before (a) and after (b) integrating community curation. A weekly search of the PubMed database identifies recent Drosophila-related publications. Newly identified papers subsequently undergo skim curation, which assigns data type flags and captures a limited subset of curated information (genes studied and antibodies generated). The data type flags are used to identify data-rich papers which are prioritized for full curation. The skim curation step previously carried out by FlyBase curators (a), has been replaced by community curation (b) by adapting the pipeline. First, we now download the PDF file of each new publication (currently possible for 89% of new papers). Secondly, we developed the EmailAuthor software suite, which is used to automatically e-mail the corresponding author of new papers. Finally, authors who have been e-mailed use the FTYP tool to skim curate their paper.
Figure 2.
Figure 2.
The Fast-Track Your Paper tool. The first page of the FTYP tool, listing the six steps that guide the user through the complete community curation process.
Figure 3.
Figure 3.
Workflow of the EmailAuthor software suite. For each publication, the software first checks its type and curation status using information stored in the FlyBase database. If it is a research paper that has not yet been triaged and a PDF file corresponding to the paper is available, the software attempts to extract the corresponding author’s e-mail address from the PDF file. If this is successful (97% of cases), an e-mail is sent to the extracted e-mail address. At each decision point, the information is stored in a tracking database.
Figure 4.
Figure 4.
Author response to direct e-mailing. Overall response to (a) weekly e-mailing (corresponding author e-mailed <2 weeks after the entry for the published paper appeared in PubMed) and (b) single e-mailing to authors of untriaged papers carried out in December 2010 (in this case a PubMed entry for the published paper had existed for 2–13 months prior to e-mailing the corresponding author). The number of papers in each category is shown. (c) Speed of author response: number of days between author being sent e-mail and completing the author submission.
Figure 5.
Figure 5.
Accuracy of author-submitted data type flags. (a) Accuracy at the level of the whole paper. The number of papers in each category is shown. (b) Accuracy on a flag-by-flag basis. (i) Frequency of occurrence and accuracy of selection of each data type flag. (ii) Error rates for selection of each data type flag.
Figure 6.
Figure 6.
Community curation is most productive when authors are directed to a particular publication. A general e-mail was sent to the Drosophila research community on 13 October 2010 (arrow), alerting them that we would be starting the weekly direct e-mailing the following week. This resulted in a small increase in successful author submissions, but resulted in a larger increase in unproductive redirects from the FTYP tool, where authors attempted to curate a paper that had already been skimmed or fully curated and were redirected to a page thanking them for their effort.

References

    1. Hunter L, Cohen KB. Biomedical language processing: what’s beyond PubMed? Mol. Cell. 2006;21:589–594. - PMC - PubMed
    1. McQuilton P, St.Pierre SE, Thurmond J, et al. FlyBase 101 – the basics of navigating FlyBase. Nucleic Acids Res. 2012;40:D706–D714. - PMC - PubMed
    1. Swarbreck D, Wilks C, Lamesch P, et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 2008;36:D1009–D1014. - PMC - PubMed
    1. Dowell KG, McAndrews-Hill MS, Hill DP, et al. Integrating text mining into the MGI biocuration workflow. Database. 2009;2009 DOI: 10.1093/database/bap019. - PMC - PubMed
    1. Yook K, Harris TW, Bieri T, et al. WormBase 2012: more genomes, more data, new website. Nucleic Acids Res. 2012;40:D735–D741. - PMC - PubMed

Publication types

LinkOut - more resources