Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 20;191(11):1936-1943.
doi: 10.1093/aje/kwac117.

Using Machine Learning Techniques and National Tuberculosis Surveillance Data to Predict Excess Growth in Genotyped Tuberculosis Clusters

Using Machine Learning Techniques and National Tuberculosis Surveillance Data to Predict Excess Growth in Genotyped Tuberculosis Clusters

Sandy P Althomsons et al. Am J Epidemiol. .

Abstract

The early identification of clusters of persons with tuberculosis (TB) that will grow to become outbreaks creates an opportunity for intervention in preventing future TB cases. We used surveillance data (2009-2018) from the United States, statistically derived definitions of unexpected growth, and machine-learning techniques to predict which clusters of genotype-matched TB cases are most likely to continue accumulating cases above expected growth within a 1-year follow-up period. We developed a model to predict which clusters are likely to grow on a training and testing data set that was generalizable to a validation data set. Our model showed that characteristics of clusters were more important than the social, demographic, and clinical characteristics of the patients in those clusters. For instance, the time between cases before unexpected growth was identified as the most important of our predictors. A faster accumulation of cases increased the probability of excess growth being predicted during the follow-up period. We have demonstrated that combining the characteristics of clusters and cases with machine learning can add to existing tools to help prioritize which clusters may benefit most from public health interventions. For example, consideration of an entire cluster, not only an individual patient, may assist in interrupting ongoing transmission.

Keywords: cluster growth; machine learning; surveillance data; tuberculosis.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest: none declared.

Figures

Figure 1.
Figure 1.
Hypothetical epidemiologic curve of a cluster. Bars indicate the number of cases counted during the indicated quarter. Also indicated are an unexpected growth flag as an arrow, the corresponding baseline as a dashed line, follow-up period within a bracket, and accumulation of excess cases as a dotted line.
Figure 2.
Figure 2.
Data set cohorts and tuberculosis surveillance data used to predict unexpected growth in cases, United States, 2011–2017.
Figure 3.
Figure 3.
Importance plot for all predictors in final model for predicting unexpected growth in tuberculosis cases, United States, 2011–2016. A final model was built using random forest on the 2-quarters time frame (the flagged unexpected growth quarter and 1 preceding quarter) with the “half” quantification (at least half the cases in the cluster were positive for the characteristic). We then calculated the variable importance for each of the predictors in this model. More details can be found in the Web Appendixes 3 and 4. Predictors are shown from high to low importance, where higher importance means that the predictor affects the model more. Filled points indicate predictors based on cluster characteristics, while outlined points indicate predictors based on patient characteristics. A description of each predictor can be found in Web Table 2. HIV, human immunodeficiency virus.

References

    1. National Tuberculosis Controllers Association, Centers for Disease Control and Prevention. Guidelines for the investigation of contacts of persons with infectious tuberculosis. Recommendations from the National Tuberculosis Controllers Association and CDC. MMWR Recomm Rep. 2005;54(RR-15):1–47. - PubMed
    1. Mitruka K, Oeltmann JE, Ijaz K, et al. Tuberculosis outbreak investigations in the United States, 2002–2008. Emerg Infect Dis. 2011;17(3):425–431. - PMC - PubMed
    1. Centers for Disease Control and Prevention. Tuberculosis outbreak associated with a homeless shelter—Kane County, Illinois, 2007–2011. MMWR Morb Mortal Wkly Rep. 2012;61(11):186–189. - PubMed
    1. Powell KM, VanderEnde DS, Holland DP, et al. Outbreak of drug-resistant Mycobacterium tuberculosis among homeless people in Atlanta, Georgia, 2008–2015. Public Health Rep. 2017;132(2):231–240. - PMC - PubMed
    1. Mindra G, Wortham JM, Haddad MB, et al. Tuberculosis outbreaks in the United States, 2009–2015. Public Health Rep. 2017;132(2):157–163. - PMC - PubMed

Publication types