Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 16;25(1):99.
doi: 10.1186/s12874-025-02542-0.

Combining machine learning and dynamic system techniques to early detection of respiratory outbreaks in routinely collected primary healthcare records

Affiliations

Combining machine learning and dynamic system techniques to early detection of respiratory outbreaks in routinely collected primary healthcare records

Dérick G F Borges et al. BMC Med Res Methodol. .

Abstract

Background: Methods that enable early outbreak detection represent powerful tools in epidemiological surveillance, allowing adequate planning and timely response to disease surges. Syndromic surveillance data collected from primary healthcare encounters can be used as a proxy for the incidence of confirmed cases of respiratory diseases. Deviations from historical trends in encounter numbers can provide valuable insights into emerging diseases with the potential to trigger widespread outbreaks.

Methods: Unsupervised machine learning methods and dynamical systems concepts were combined into the Mixed Model of Artificial Intelligence and Next-Generation (MMAING) ensemble, which aims to detect early signs of outbreaks based on primary healthcare encounters. We used data from 27 Brazilian health regions, which cover 41% of the country's territory, from 2017-2023 to identify anomalous increases in primary healthcare encounters that could be associated with an epidemic onset. Our validation approach comprised (i) a comparative analysis across Brazilian capitals; (ii) an analysis of warning signs for the COVID-19 period; and (iii) a comparison with related surveillance methods (namely EARS C1, C2, C3) based on real and synthetic labeled data.

Results: The MMAING ensemble demonstrated its effectiveness in early outbreak detection using both actual and synthetic data, outperforming other surveillance methods. It successfully detected early warning signals in synthetic data, achieving a probability of detection of 86%, a positive predictive value of 85%, and an average reliability of 79%. When compared to EARS C1, C2, and C3, it exhibited superior performance based on receiver operating characteristic (ROC) curve results on synthetic data. When evaluated on real-world data, MMAING performed on par with EARS C2. Notably, the MMAING ensemble accurately predicted the onset of the four waves of the COVID-19 period in Brazil, further validating its effectiveness in real-world scenarios.

Conclusion: Identifying trends in time series data related to primary healthcare encounters indicated the possibility of developing a reliable method for the early detection of outbreaks. MMAING demonstrated consistent identification capabilities across various scenarios, outperforming established reference methods.

Keywords: Machine learning; Outbreak detection; Primary healthcare data; Reproduction number; Syndromic surveillance.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The study is based on secondary, aggregated, non-identified data, and was approved by the Ethical Review Board of Oswaldo Cruz Foundation - Fiocruz Bahia Regional Office, CAAE 61444122.0.0000.0040. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Simplified MMAING workflow with main steps in the central column and details in the lateral ones. The green box represents the initial stages related to data acquisition from Brazil’s National Primary Health Information System (SISAB), filtering of diseases related to URTI using the International Classification of Diseases (ICD-10) and the International Classification of Primary Health Care (ICPC-2) codes, and Data Quality Index (DQI) evaluation. The blue box identifies the pre-processing stages, such as grouping the data at municipality level by Imediate Geographic Regions (IGR), calculation of the upper limit, and data splitting. The orange box indicates the stages for generating and cataloging the synthetic series. The red box describes the stages for Outbreak Detection, EWS Emission, and the comparison of MMAING with EARS on real and synthetic data. The reports and dashboard access are currently of exclusive use by the responsible health teams at municipal, state and national level
Fig. 2
Fig. 2
Process of synthetic data generation: Red - real data from an IGR; Green - set of 200 simulated series; Blue - synthetic time series by averaging over green curves; Orange - synthetic time series with superimposed noise
Fig. 3
Fig. 3
Number of URTI encounters in three Brazilian capitals, from 2020 to 2023, with EWS issued by MMAING indicated by red dots. a Belém (north region, IGR 150001); b Belo Horizonte (southeast region, IGR 310001); and c Porto Alegre (south region, IGR 430001)
Fig. 4
Fig. 4
Details of MMAING’s results for URTI encounters in Belo Horizonte (as already displayed in Fig. 3-b), restricted to the COVID-19 period (2020–2022) and split into four successive time intervals shaded by different colors: initial outbreak (orange), second wave, marked by the arrival of Gamma variant (gray), third wave, influenced by Omicron variant (yellow), and fourth wave (pink), due to reinfections of Omicron and its sublineages. EWS points are indicated in red
Fig. 5
Fig. 5
Details of the MMAING results for synthetic URTI encounters in Belo Horizonte. In a, 5 outbreaks were inserted, and in b, 4 outbreaks, both highlighted in green. The early warning signals (EWS) issued by MMAING are represented by the red points
Fig. 6
Fig. 6
EWS events for three IGRs: a Belém (IGR 150001); b Belo Horizonte (IGR 310001) and c Porto Alegre (IGR 430001) from 2020 to 2023. The top and bottom graphs indicate EARS and MMAING detections. Blue and red markers indicate events by both (EWS Coinciding) or just one method. The sum of red and blue markers corresponds to the total of EWS events
Fig. 7
Fig. 7
Distribution of evaluated scores across the different models considering 30 simulations for 27 IGRs of Brazilian capital states
Fig. 8
Fig. 8
Discriminatory capacity of MMAING and EARS variations (C1, C2 and C3) illustrated by their ROC curves

Similar articles

Cited by

References

    1. Madhav N, Oppenheim B, Gallivan M, Mulembakani P, Rubin E, Wolfe N. 17. In: Pandemics: Risks, Impacts, and Mitigation. 2017. pp. 315–345. 10.1596/978-1-4648-0527-1_ch17. - PubMed
    1. Buehler JW, Hopkins RS, Overhage JM, Sosin DM, Tong V. Framework for evaluating public health surveillance systems for early detection of outbreaks: recommendations from the CDC Working Group. MMWR Recomm Rep. 2004;53(RR-5):1–11. - PubMed
    1. Wagner M, Tsui F, Cooper G, Espino JU, Harkema H, Levander J, et al. Probabilistic, Decision-theoretic Disease Surveillance and Control. Online J Public Health Inform. 2011;3(3):e61012. - PMC - PubMed
    1. Chiolero A, Buckeridge D. Glossary for public health surveillance in the age of data science. J Epidemiol Community Health. 2020;74(7):612–6. - PMC - PubMed
    1. Farrington CP, Andrews NJ, Beale AD, Catchpole MA. A Statistical Algorithm for the Early Detection of Outbreaks of Infectious Disease. J R Stat Soc Ser A Stat Soc. 1996;159(3):547–63.

LinkOut - more resources