Building a semantically annotated corpus for chronic disease complications using two document types
- PMID: 33735207
- PMCID: PMC7971867
- DOI: 10.1371/journal.pone.0247319
Building a semantically annotated corpus for chronic disease complications using two document types
Abstract
Narrative information in electronic health records (EHRs) contains a wealth of information related to patient health conditions. In addition, people use Twitter to express their experiences regarding personal health issues, such as medical complaints, symptoms, treatments, lifestyle, and other factors. Both genres of text include different types of health-related information concerning disease complications and risk factors. Knowing detailed information about controlling disease risk factors has a great impact on modifying these risks and subsequently preventing disease complications. Text-mining tools provide efficient solutions to extract and integrate vital information related to disease complications hidden in the large volume of the narrative text. However, the development of text-mining tools depends on the availability of an annotated corpus. In response, we have developed the PrevComp corpus, which is annotated with information relevant to the identification of disease complications, underlying risk factors, and prevention measures, in the context of the interaction between hypertension and diabetes. The corpus is unique and novel in terms of the very specific topic in the biomedical domain and as an integration of information from both EHRs and tweets collected from Twitter. The annotation scheme was designed with guidance by a domain expert, and two further domain experts performed the annotation, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.60 and 0.75 for EHRs and tweets, respectively.
Conflict of interest statement
no competing interest.
Figures
Similar articles
-
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1472-6947-15-S2-S3. Epub 2015 Jun 15. BMC Med Inform Decis Mak. 2015. PMID: 26099853 Free PMC article.
-
A manual corpus of annotated main findings of clinical case reports.Database (Oxford). 2019 Jan 1;2019:bay143. doi: 10.1093/database/bay143. Database (Oxford). 2019. PMID: 30657910 Free PMC article.
-
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9. BMC Bioinformatics. 2017. PMID: 28818042 Free PMC article.
-
Clinical data mining and research in the allergy office.Curr Opin Allergy Clin Immunol. 2010 Jun;10(3):171-7. doi: 10.1097/ACI.0b013e328337bce6. Curr Opin Allergy Clin Immunol. 2010. PMID: 20179584 Review.
-
Text mining for drug-drug interaction.Methods Mol Biol. 2014;1159:47-75. doi: 10.1007/978-1-4939-0709-0_4. Methods Mol Biol. 2014. PMID: 24788261 Free PMC article. Review.
References
-
- WHO (World Health Organisation), Action framework for the prevention and control of chronic diseases. 2006.
-
- European Patients’ Acdemy on Therapeutic Innovation-Malta Health Network, Risk factors in health and disease. 2017.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources