Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 18;16(3):e0247319.
doi: 10.1371/journal.pone.0247319. eCollection 2021.

Building a semantically annotated corpus for chronic disease complications using two document types

Affiliations

Building a semantically annotated corpus for chronic disease complications using two document types

Noha Alnazzawi. PLoS One. .

Abstract

Narrative information in electronic health records (EHRs) contains a wealth of information related to patient health conditions. In addition, people use Twitter to express their experiences regarding personal health issues, such as medical complaints, symptoms, treatments, lifestyle, and other factors. Both genres of text include different types of health-related information concerning disease complications and risk factors. Knowing detailed information about controlling disease risk factors has a great impact on modifying these risks and subsequently preventing disease complications. Text-mining tools provide efficient solutions to extract and integrate vital information related to disease complications hidden in the large volume of the narrative text. However, the development of text-mining tools depends on the availability of an annotated corpus. In response, we have developed the PrevComp corpus, which is annotated with information relevant to the identification of disease complications, underlying risk factors, and prevention measures, in the context of the interaction between hypertension and diabetes. The corpus is unique and novel in terms of the very specific topic in the biomedical domain and as an integration of information from both EHRs and tweets collected from Twitter. The annotation scheme was designed with guidance by a domain expert, and two further domain experts performed the annotation, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.60 and 0.75 for EHRs and tweets, respectively.

PubMed Disclaimer

Conflict of interest statement

no competing interest.

Figures

Fig 1
Fig 1. Distribution of the entity types in the PrevComp corpus.
Fig 2
Fig 2. Annotation schema.

Similar articles

References

    1. WHO (World Health Organisation), Action framework for the prevention and control of chronic diseases. 2006.
    1. Long A.N. and Dagogo-Jack S., Comorbidities of diabetes and hypertension: mechanisms and approach to target organ protection. J. Clin. Hypertens, 2011. 13(4): p. 244–51. 10.1111/j.1751-7176.2011.00434.x - DOI - PMC - PubMed
    1. Cade W.T., Diabetes-related microvascular and macrovascular diseases in the physical therapy setting. Phys. Ther., 2008. 88(11): p. 1322–35. 10.2522/ptj.20080008 - DOI - PMC - PubMed
    1. Meeuwisse-Pasterkamp S.H., van der Klauw M.M., and Wolffenbuttel B.H., Type 2 diabetes mellitus: prevention of macrovascular complications. Expert Rev. Cardiovasc. Ther., 2008. 6(3): p. 323–41. 10.1586/14779072.6.3.323 - DOI - PubMed
    1. European Patients’ Acdemy on Therapeutic Innovation-Malta Health Network, Risk factors in health and disease. 2017.

LinkOut - more resources