Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 17;10(1):4.
doi: 10.1186/1742-7622-10-4.

Improving epidemiologic data analyses through multivariate regression modelling

Affiliations

Improving epidemiologic data analyses through multivariate regression modelling

Fraser I Lewis et al. Emerg Themes Epidemiol. .

Abstract

: Regression modelling is one of the most widely utilized approaches in epidemiological analyses. It provides a method of identifying statistical associations, from which potential causal associations relevant to disease control may then be investigated. Multivariable regression - a single dependent variable (outcome, usually disease) with multiple independent variables (predictors) - has long been the standard model. Generalizing multivariable regression to multivariate regression - all variables potentially statistically dependent - offers a far richer modelling framework. Through a series of simple illustrative examples we compare and contrast these approaches. The technical methodology used to implement multivariate regression is well established - Bayesian network structure discovery - and while a relative newcomer to the epidemiological literature has a long history in computing science. Applications of multivariate analysis in epidemiological studies can provide a greater understanding of disease processes at the population level, leading to the design of better disease control and prevention programs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Globally optimal multivariable regression model with g5 as the response variable and globally optimal multivariate regression model of all 17 variables. (a) Globally optimal multivariable regression model with g5 as the response variable and covariates b3, b6, g9 and g10, log marginal likelihood = -8664.4; (b) Globally optimal multivariate regression model of all 17 variables, log marginal likelihood = -8311.6. Markov blanket for variable g5 are those variables in grey. Squares denote binary variables, ovals continuous.
Figure 2
Figure 2
Globally optimal multivariable regression model with g2 as the response variable and globally optimal multivariate regression model of all 17 variables. (a) Globally optimal multivariable regression model with g2 as the response variable and covariates b4 and g3, blacklog marginal likelihood = -8530.0. (b) Globally optimal multivariate regression model of all 17 variables, blacklog marginal likelihood = -8311.6. Markov blanket for variable g2 are those variables in grey. Squares denote binary variables, ovals continuous.
Figure 3
Figure 3
Globally optimal multivariable regression model with b3 as the response variable and globally optimal multivariate regression model of all 17 variables. (a) Globally optimal multivariable regression model with binary variable b3 as the response and covariates b4, g7 and g8, blacklog marginal likelihood = -8670.9. This is a generalised linear model with logit link function. (b) Globally optimal multivariate regression model of all 17 variables, blacklog marginal likelihood = -8311.6. Markov blanket for variable b3 are those variables in grey. Squares denote binary variables, ovals continuous.

Similar articles

Cited by

References

    1. Buntine W. Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence. Los Angeles: Morgan Kaufmann; 1991. Theory refinement on Bayesian networks; pp. 52–60.
    1. Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks - The combination of knowledge and statistical-data. Mach Learn. 1995;20(3):197–243.
    1. Jensen FV. Bayesian Network and Decision Graphs. New York: Springer-Verlag; 2001.
    1. Lauritzen SL. Graphical Models. Oxford: Univ Press; 1996.
    1. Jansen R, Yu HY, Greenbaum D, Kluger Y, Krogan NJ, Chung SB, Emili A, Snyder M, Greenblatt JF, Gerstein M. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003;302(5644):449–453. doi: 10.1126/science.1087361. - DOI - PubMed

LinkOut - more resources