Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Randomized Controlled Trial
. 2024 Apr:152:104628.
doi: 10.1016/j.jbi.2024.104628. Epub 2024 Mar 26.

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

Affiliations
Randomized Controlled Trial

Automatic categorization of self-acknowledged limitations in randomized controlled trial publications

Mengfei Lan et al. J Biomed Inform. 2024 Apr.

Abstract

Objective: Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications.

Methods: We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale.

Results: Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F1 score (0.821 vs. 0.8) with statistical significance (p<.001). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F1 score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance (p<.001).

Conclusion: The model could support automated screening tools which can be used by journals to draw the authors' attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.

Keywords: Large language models; Natural language processing; Randomized controlled trials; Reporting quality; Self-acknowledged limitations; Text classification.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Halil Kilicoglu reports financial support was provided by National Library of Medicine.

Figures

Fig. E.4.
Fig. E.4.
A sample generated by output-view.
Fig. F.5.
Fig. F.5.
Document-level distribution of SAL types on the manually annotated dataset. The x-axis shows the number of documents that contain a specific SAL type.
Fig. G.6.
Fig. G.6.
Sentence-level distribution of SAL types on the large-scale RCT dataset. The x-axis shows the total number of sentences with the SAL types in the dataset.
Fig. 1.
Fig. 1.
Overview of our soft-prompt based data augmentation (PromDA).
Fig. 2.
Fig. 2.
The sentence-level distribution of SAL types on the manually annotated dataset. Note that in some cases, the total number of fine-grained labels in a top-level category exceeds the total number for the top-level category, because the same sentence could be labeled with a top-level category as well as a fine-grained label belonging to the same top-level category (e.g., 10.6% + 7.5% > 17% for the UnderpoweredStudy category). The document-level distribution of SAL types on this dataset is provided in Appendix F.
Fig. 3.
Fig. 3.
Document-level distribution of SAL types on the large-scale RCT dataset. x-axis shows the number of articles that contain a specific SAL type. The sentence-level distribution of SAL types on this dataset is provided in Appendix G.

Similar articles

Cited by

References

    1. Else H, How a torrent of COVID science changed research publishing-in seven charts, Nature (2020) 553. - PubMed
    1. Watson C, Rise of the preprint: how rapid data sharing during COVID-19 has changed science forever, Nat. Med 28 (1) (2022) 2–5. - PubMed
    1. Bramstedt KA, The carnage of substandard research during the COVID-19 pandemic: a call for quality, J. Med. Ethics 46 (12) (2020) 803–807. - PubMed
    1. Zdravkovic M, Berger-Estilita J, Zdravkovic B, Berger D, Scientific quality of COVID-19 and SARS CoV-2 publications in the highest impact medical journals during the early phase of the pandemic: A case control study, PLoS One 15 (11) (2020) e0241826. - PMC - PubMed
    1. Quinn TJ, Burton JK, Carter B, Cooper N, Dwan K, Field R, Freeman SC, Geue C, Hsieh P-H, McGill K, et al., Following the science? Comparison of methodological and reporting quality of COVID-19 and other research from the first wave of the pandemic, BMC Med. 19 (1) (2021) 1–10. - PMC - PubMed

Publication types

LinkOut - more resources