Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Sep 27:rs.3.rs-3362256.
doi: 10.21203/rs.3.rs-3362256/v1.

Mapping Vaccine Names in Clinical Trials to Vaccine Ontology using Cascaded Fine-Tuned Domain-Specific Language Models

Affiliations

Mapping Vaccine Names in Clinical Trials to Vaccine Ontology using Cascaded Fine-Tuned Domain-Specific Language Models

Jianfu Li et al. Res Sq. .

Update in

Abstract

Background: Vaccines have revolutionized public health by providing protection against infectious diseases. They stimulate the immune system and generate memory cells to defend against targeted diseases. Clinical trials evaluate vaccine performance, including dosage, administration routes, and potential side effects. ClinicalTrials.gov is a valuable repository of clinical trial information, but the vaccine data in them lacks standardization, leading to challenges in automatic concept mapping, vaccine-related knowledge development, evidence-based decision-making, and vaccine surveillance.

Results: In this study, we developed a cascaded framework that capitalized on multiple domain knowledge sources, including clinical trials, Unified Medical Language System (UMLS), and the Vaccine Ontology (VO), to enhance the performance of domain-specific language models for automated mapping of VO from clinical trials. The Vaccine Ontology (VO) is a community-based ontology that was developed to promote vaccine data standardization, integration, and computer-assisted reasoning. Our methodology involved extracting and annotating data from various sources. We then performed pre-training on the PubMedBERT model, leading to the development of CTPubMedBERT. Subsequently, we enhanced CTPubMedBERT by incorporating SAPBERT, which was pretrained using the UMLS, resulting in CTPubMedBERT + SAPBERT. Further refinement was accomplished through fine-tuning using the Vaccine Ontology corpus and vaccine data from clinical trials, yielding the CTPubMedBERT + SAPBERT + VO model. Finally, we utilized a collection of pre-trained models, along with the weighted rule-based ensemble approach, to normalize the vaccine corpus and improve the accuracy of the process. The ranking process in concept normalization involves prioritizing and ordering potential concepts to identify the most suitable match for a given context. We conducted a ranking of the Top 10 concepts, and our experimental results demonstrate that our proposed cascaded framework consistently outperformed existing effective baselines on vaccine mapping, achieving 71.8% on top 1 candidate's accuracy and 90.0% on top 10 candidate's accuracy.

Conclusion: This study provides a detailed insight into a cascaded framework of fine-tuned domain-specific language models improving mapping of VO from clinical trials. By effectively leveraging domain-specific information and applying weighted rule-based ensembles of different pre-trained BERT models, our framework can significantly enhance the mapping of VO from clinical trials.

Keywords: Clinical Trials; Domain-specific Language Models; Normalization; Vaccine Ontology.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Overview of the cascaded framework.
Figure 2:
Figure 2:
PRISMA flowchart for data extraction and screening with processed results.
Figure 3:
Figure 3:
Error types of top-ranked concepts in concept normalization.

Similar articles

References

    1. McClure C. C., Cataldi J. R., and O’Leary S. T., “Vaccine Hesitancy: Where We Are and Where We Are Going,” Clinical Therapeutics, vol. 39, no. 8, pp. 1550–1562, Aug. 2017, doi: 10.1016/j.clinthera.2017.07.003. - DOI - PubMed
    1. Murphy K., Weaver C., and Janeway C., Janeway’s Immunobiology. Garland Science, 2017.
    1. Vetter V., Denizer G., Friedland L. R., Krishnan J., and Shapiro M., “Understanding modern-day vaccines: what you need to know,” Annals of Medicine, vol. 50, no. 2, pp. 110–120, Feb. 2018, doi: 10.1080/07853890.2017.1407035. - DOI - PubMed
    1. Delany I., Rappuoli R., and De Gregorio E., “Vaccines for the 21st century,” EMBO Mol Med, vol. 6, no. 6, pp. 708–720, Jun. 2014, doi: 10.1002/emmm.201403876. - DOI - PMC - PubMed
    1. He Y. et al., “VO: Vaccine Ontology,” Journal of biomedical semantics, vol. 4, no. 1, p. 38, 2013, doi: 10.1186/2041-1480-4-38. - DOI - PMC - PubMed

Publication types

LinkOut - more resources