Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Oct:154:104558.
doi: 10.1016/j.ijmedinf.2021.104558. Epub 2021 Aug 18.

Emergence and evolution of big data science in HIV research: Bibliometric analysis of federally sponsored studies 2000-2019

Affiliations
Review

Emergence and evolution of big data science in HIV research: Bibliometric analysis of federally sponsored studies 2000-2019

Chen Liang et al. Int J Med Inform. 2021 Oct.

Abstract

Background: The rapid growth of inherently complex and heterogeneous data in HIV/AIDS research underscores the importance of Big Data Science. Recently, there have been increasing uptakes of Big Data techniques in basic, clinical, and public health fields of HIV/AIDS research. However, no studies have systematically elaborated on the evolving applications of Big Data in HIV/AIDS research. We sought to explore the emergence and evolution of Big Data Science in HIV/AIDS-related publications that were funded by the US federal agencies.

Methods: We identified HIV/AIDS and Big Data related publications that were funded by seven federal agencies from 2000 to 2019 by integrating data from National Institutes of Health (NIH) ExPORTER, MEDLINE, and MeSH. Building on bibliometrics and Natural Language Processing (NLP) methods, we constructed co-occurrence networks using bibliographic metadata (e.g., countries, institutes, MeSH terms, and keywords) of the retrieved publications. We then detected clusters among the networks as well as the temporal dynamics of clusters, followed by expert evaluation and clinical implications.

Results: We harnessed nearly 600 thousand publications related to HIV/AIDS, of which 19,528 publications relating to Big Data were included in bibliometric analysis. Results showed that (1) the number of Big Data publications has been increasing since 2000, (2) US institutes have been in close collaborations with China, Canada, and Germany, (3) some institutes (e.g., University of California system, MD Anderson Cancer Center, and Harvard Medical School) are among the most productive institutes and started using Big Data in HIV/AIDS research early, (4) Big Data research was not active in public health disciplines until 2015, (5) research topics such as genomics, HIV comorbidities, population-based studies, Electronic Health Records (EHR), social media, precision medicine, and methodologies such as machine learning, Deep Learning, radiomics, and data mining emerge quickly in recent years.

Conclusions: We identified a rapid growth in the cross-disciplinary research of HIV/AIDS and Big Data over the past two decades. Our findings demonstrated patterns and trends of prevailing research topics and Big Data applications in HIV/AIDS research and suggested a number of fast-evolving areas of Big Data Science in HIV/AIDS research including secondary analysis of EHR, machine learning, Deep Learning, predictive analysis, and NLP.

Keywords: AIDS; Bibliometrics; Big data; Data mining; Electronic health records; HIV; PLWH.

PubMed Disclaimer

Conflict of interest statement

Statement on conflicts of interest

None declared.

Figures

Figure 1.
Figure 1.
Data extraction flowchart.
Figure 2.
Figure 2.
Number of publications by year.
Figure 3.
Figure 3.
Time zone figure. Each node shows the inception year of an institute measured by the number of related publications. The size of nodes indicates the frequency of relevant publications by institutes. The colors indicate the years by which an institute started to produce significant amounts of relevant publications, spanning 2000 (dark) to 2019 (light). The edges indicate the collaborations between institutes. For clarity of the figure, the thickness of the edges is unified and does not represent frequency differences. The time zone where an institute displays indicates the year the institute began to produce relevant publications.
Figure 4.
Figure 4.
Top 100 MeSH term and keyword bursts. The red bar indicates the course of bursts over time.
Figure 5.
Figure 5.
Visualization of clusters. Clusters are color labeled and annotated by a name (red) that refers to the term of top log-likelihood-ratio in a cluster. Other terms within the clusters are in maroon. The clusters are ranked by the Silhouette scores, in which a high value indicates well-coherent terms to its own cluster.

Similar articles

Cited by

References

    1. Bourne PE, Bonazzi V, Dunn M, Green ED, Guyer M, Komatsoulis G, Larkin J, Russell B, The NIH big data to knowledge (BD2K) initiative, J. Am. Med. Informatics Assoc 22(2015) 1114. - PMC - PubMed
    1. National Institutes of Health, NIH Strategic Plan for Data Science, (2018).
    1. Murdoch TB, Detsky AS, The inevitable application of big data to health care, Jama. 309 (2013) 1351–1352. - PubMed
    1. Rana AI, Mugavero MJ, How big data science can improve linkage and retention in care, Infect. Dis. Clin. North Am 33 (2019) 807–815. - PubMed
    1. Olatosi B, Vermund SH, Li X, Power of Big Data in ending HIV, (2021). - PMC - PubMed

Publication types