Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan 1;108(501):34-47.
doi: 10.1080/01621459.2012.726889.

A Bayesian Procedure for File Linking to Analyze End-of-Life Medical Costs

Affiliations

A Bayesian Procedure for File Linking to Analyze End-of-Life Medical Costs

Roee Gutman et al. J Am Stat Assoc. .

Abstract

End-of-life medical expenses are a significant proportion of all health care expenditures. These costs were studied using costs of services from Medicare claims and cause of death (CoD) from death certificates. In the absence of a unique identifier linking the two datasets, common variables identified unique matches for only 33% of deaths. The remaining cases formed cells with multiple cases (32% in cells with an equal number of cases from each file and 35% in cells with an unequal number). We sampled from the joint posterior distribution of model parameters and the permutations that link cases from the two files within each cell. The linking models included the regression of location of death on CoD and other parameters, and the regression of cost measures with a monotone missing data pattern on CoD and other demographic characteristics. Permutations were sampled by enumerating the exact distribution for small cells and by the Metropolis algorithm for large cells. Sparse matrix data structures enabled efficient calculations despite the large dataset (≈1.7 million cases). The procedure generates m datasets in which the matches between the two files are imputed. The m datasets can be analyzed independently and results combined using Rubin's multiple imputation rules. Our approach can be applied in other file linking applications.

Keywords: Administrative Data; Bayesian Analysis; Missing Data; Record Linkage; Statistical Matching.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Expected information for ρ in bivariate normal model, as a function of ρ, for cells of size 2 (top curve) to 5 (lowest curve)
Figure 2
Figure 2. Simulated partial linkage: Comparison of estimating procedures to exactly known matches for Medicare Part A & B coefficients
Figure 3
Figure 3
Probability of most likely match / permutation in simulated partial linkage, for cells of size 2 (A) and 3 (B, C). Each bar represents a range of predicted probabilities that the most likely match or permutation is the correct one. The combined light/dark bars form a histogram of these predicted probabilities while the dark part of each bar represents the instances in which the correct match or permutation was correctly identified. For example, for cells of size 2, the first bar indicates that among pairs with predicted probability around 0.53 of being matches, slightly over half were correct. The last bar indicates that among those with predicted probabilities near 0.97, almost all were correct.
Figure 4
Figure 4. Estimation of Part A & B coefficients under simulated partial linkage, comparing estimates with balanced cells to estimates with simulated missing cases in either EoL or VSM file
Figure 5
Figure 5. Estimates of mean expenditures by cause of death under simulated partial linkage, comparing estimates with balanced cells to estimates with simulated missing cases in either EoL or VSM file
Figure 6
Figure 6. Estimates of mean expenditures by cause of death, comparing estimates using permutation sampling to to those using exactly matched cases

References

    1. Andridge RR, Little RJA. A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review. 2010;78:40–64. - PMC - PubMed
    1. Belin TR, Rubin DB. A method for calibrating false-match rates in record linkage. Journal of the American Statistical Association. 1995;90:694–707.
    1. DeGroot MH, Goel PK. Estimation of the correlation coefficient from a broken random sample. The Annals of Statistics. 1980;8:264–278.
    1. D'Orazio M, Di Zio M, Scanu M. Statistical Matching Theory and Practice. Hoboken: John Wiley & Sons; 2006.
    1. Felder S, Meier M, Schmitt H. Health care expenditure in the last months of life. Journal of Health Economics. 2000;19:679–695. - PubMed

LinkOut - more resources