Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Sep 10;12(9):1278.
doi: 10.3390/biom12091278.

Integrating Text Mining into the Curation of Disease Maps

Affiliations
Review

Integrating Text Mining into the Curation of Disease Maps

Malte Voskamp et al. Biomolecules. .

Abstract

An adequate visualization form is required to gain an overview and ultimately understand the complex and diverse biological mechanisms of diseases. Recently, disease maps have been introduced for this purpose. A disease map is defined as a systems biological map or model that combines metabolic, signaling, and physiological pathways to create a comprehensive overview of known disease mechanisms. With the increase in publications describing biological interactions, efforts in creating and curating comprehensive disease maps is growing accordingly. Therefore, new computational approaches are needed to reduce the time that manual curation takes. Test mining algorithms can be used to analyse the natural language of scientific publications. These types of algorithms can take humanly readable text passages and convert them into a more ordered, machine-usable data structure. To support the creation of disease maps by text mining, we developed an interactive, user-friendly disease map viewer. The disease map viewer displays text mining results in a systems biology map, where the user can review them and either validate or reject identified interactions. Ultimately, the viewer brings together the time-saving advantages of text mining with the accuracy of manual data curation.

Keywords: disease maps; systems biology; text mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Flowchart of the processes included in the tool. Input knowledge and data are shown in green on the right, the software modules are shown in yellow, and the output files are shown in blue on the right. Two CSV files, one containing the list of interactions and one containing the subcellular localisation of the entities, serve as input for the CytoscapeJSON parser implemented in Python. The resulting JSON file serves as input for the disease map viewer, where the interactions are validated by expert knowledge. The validated interactions can then be exported in a cellular layout in a JSON file or as a list of interactions in a CSV file.
Figure 2
Figure 2
Interface of the disease map viewer. The large window in the middle shows the text mining data as a coarse disease map in a cellular layout. The left sidebar shows the legend and filter options, and the right sidebar shows the review function, where the supporting sentences from the parsed publications are displayed and the user can validate or reject an interaction. The buttons on the bottom left show the timeline option, where the interaction data can be filtered by date of publication.

References

    1. Mazein A., Ostaszewski M., Kuperstein I., Watterson S., Le Novère N., Lefaudeux D., De Meulder B., Pellet J., Balaur I., Saqi M., et al. Systems medicine disease maps: Community-driven comprehensive representation of disease mechanisms. NPJ Syst. Biol. Appl. 2018;4:21. doi: 10.1038/s41540-018-0059-y. - DOI - PMC - PubMed
    1. Novère N.L., Hucka M., Mi H., Moodie S., Schreiber F., Sorokin A., Demir E., Wegner K., Aladjem M.I., Wimalaratne S.M., et al. The Systems Biology Graphical Notation. Nat. Biotechnol. 2009;27:735–741. doi: 10.1038/nbt.1558. - DOI - PubMed
    1. Hucka M., Finney A., Sauro H.M., Bolouri H., Doyle J.C., Kitano H., Arkin A.P., Bornstein B.J., Bray D., Cornish-Bowden A., et al. The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19:524–531. doi: 10.1093/bioinformatics/btg015. - DOI - PubMed
    1. Ostaszewski M., Gebel S., Kuperstein I., Mazein A., Zinovyev A., Dogrusoz U., Hasenauer J., Fleming R.M.T., Le Novère N., Gawron P., et al. Community-driven roadmap for integrated disease maps. Brief. Bioinform. 2019;20:659–670. doi: 10.1093/bib/bby024. - DOI - PMC - PubMed
    1. Ostaszewski M., Niarakis A., Mazein A., Kuperstein I., Phair R., Orta-Resendiz A., Singh V., Aghamiri S.S., Acencio M.L., Glaab E., et al. COVID-19 Disease Map, a computational knowledge repository of virus–host interaction mechanisms. Mol. Syst. Biol. 2021;17 doi: 10.15252/msb.202110851. - DOI - PMC - PubMed

Publication types

LinkOut - more resources