Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct;58(5):1119-1130.
doi: 10.1111/1475-6773.14154. Epub 2023 Mar 28.

Disaggregating Latino nativity in equity research using electronic health records

Affiliations

Disaggregating Latino nativity in equity research using electronic health records

Miguel Marino et al. Health Serv Res. 2023 Oct.

Abstract

Objective: To develop and validate prediction models for inference of Latino nativity to advance health equity research.

Data sources/study setting: This study used electronic health records (EHRs) from 19,985 Latino children with self-reported country of birth seeking care from January 1, 2012 to December 31, 2018 at 456 community health centers (CHCs) across 15 states along with census-tract geocoded neighborhood composition and surname data.

Study design: We constructed and evaluated the performance of prediction models within a broad machine learning framework (Super Learner) for the estimation of Latino nativity. Outcomes included binary indicators denoting nativity (US vs. foreign-born) and Latino country of birth (Mexican, Cuban, Guatemalan). The performance of these models was compared using the area under the receiver operating characteristics curve (AUC) from an externally withheld patient sample.

Data collection/extraction methods: Census surname lists, census neighborhood composition, and Forebears administrative data were linked to EHR data.

Principal findings: Of the 19,985 Latino patients, 10.7% reported a non-US country of birth (5.1% Mexican, 4.7% Guatemalan, 0.8% Cuban). Overall, prediction models for nativity showed outstanding performance with external validation (US-born vs. foreign: AUC = 0.90; Mexican vs. non-Mexican: AUC = 0.89; Guatemalan vs. non-Guatemalan: AUC = 0.95; Cuban vs. non-Cuban: AUC = 0.99).

Conclusions: Among challenges facing health equity researchers in health services is the absence of methods for data disaggregation, and the specific ability to determine Latino country of birth (nativity) to inform disparities. Recent interest in more robust health equity research has called attention to the importance of data disaggregation. In a multistate network of CHCs using multilevel inputs from EHR data linked to surname and community data, we developed and validated novel prediction models for the use of available EHR data to infer Latino nativity for health disparities research in primary care and health services research, which is a significant potential methodologic advance in studying this population.

Keywords: U.S. Census location; ethnicity; health disparities; machine learning; surname data.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

FIGURE 1
FIGURE 1
Comparison of cross‐validated area under the curve (AUC) by the two best and two worst performing prediction algorithms for Non‐US‐born, Mexican, Guatemalan, and Cuban models. Latino non‐US‐born only includes the following countries: Cuba, Guatemala, Mexico, Nicaragua, Panama. Specific prediction models for Panama and Nicaragua were not produced as the sample sizes were not conducive to proper modeling (Panama, N = 4; Nicaragua, N = 23). Algorithm labels specify Super Learner wrapper, hyperparameter, and subset of inputs that gives cross‐validated AUC estimate. The EHR suffix refers to predicators that were derived only from patients' Electronic Health Records. Algorithms with a suffix of Name used EHR, neighborhood‐level data, and patient surname data in prediction. The performance of all prediction algorithms are reported in Figure S1 and Table S6. AUC, area under the curve; CI, confidence interval. [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 2
FIGURE 2
Receiver operating characteristics curves for Latino nativity prediction models using the Super Learner in the withheld validation data set. [Color figure can be viewed at wileyonlinelibrary.com]

References

    1. Heintzman J, Marino M. The importance of primary care research in understanding health inequities in the United States. J Am Board Fam Med. 2021;34(4):849‐852. doi:10.3122/jabfm.2021.04.210060 - DOI - PMC - PubMed
    1. Lett E, Asabor E, Beltrán S, Michelle Cannon A, Arah OA. Conceptualizing, contextualizing, and operationalizing race in quantitative health sciences research. Ann Fam Med. 2022;20:157‐163. doi:10.1370/afm.2792 - DOI - PMC - PubMed
    1. Kauh TJ, Read JG, Scheitler AJ. The critical role of racial/ethnic data disaggregation for health equity. Popul Res Policy Rev. 2021;40:1‐7. doi:10.1007/s11113-020-09631-6 - DOI - PMC - PubMed
    1. Office of Management and Budget. Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity. 1997.
    1. Alcántara C, Suglia SF, Ibarra IP, et al. Disaggregation of Latina/o child and adult health data: a systematic review of public health surveillance surveys in the United States. Pop Res Policy Rev. 2021;40(1):61‐79.

Publication types

LinkOut - more resources