Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar 1:3:55.
doi: 10.3389/fpsyg.2012.00055. eCollection 2012.

Old and new ideas for data screening and assumption testing for exploratory and confirmatory factor analysis

Affiliations

Old and new ideas for data screening and assumption testing for exploratory and confirmatory factor analysis

David B Flora et al. Front Psychol. .

Abstract

We provide a basic review of the data screening and assumption testing issues relevant to exploratory and confirmatory factor analysis along with practical advice for conducting analyses that are sensitive to these concerns. Historically, factor analysis was developed for explaining the relationships among many continuous test scores, which led to the expression of the common factor model as a multivariate linear regression model with observed, continuous variables serving as dependent variables, and unobserved factors as the independent, explanatory variables. Thus, we begin our paper with a review of the assumptions for the common factor model and data screening issues as they pertain to the factor analysis of continuous observed variables. In particular, we describe how principles from regression diagnostics also apply to factor analysis. Next, because modern applications of factor analysis frequently involve the analysis of the individual items from a single test or questionnaire, an important focus of this paper is the factor analysis of items. Although the traditional linear factor model is well-suited to the analysis of continuously distributed variables, commonly used item types, including Likert-type items, almost always produce dichotomous or ordered categorical variables. We describe how relationships among such items are often not well described by product-moment correlations, which has clear ramifications for the traditional linear factor analysis. An alternative, non-linear factor analysis using polychoric correlations has become more readily available to applied researchers and thus more popular. Consequently, we also review the assumptions and data-screening issues involved in this method. Throughout the paper, we demonstrate these procedures using an historic data set of nine cognitive ability variables.

Keywords: assumption testing; confirmatory factor analysis; data screening; exploratory factor analysis; item factor analysis; regression diagnostics; structural equation modeling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Scatterplot matrix for multivariate normal random sample consistent with Holzinger data (N = 100; no unusual cases).
Figure 2
Figure 2
Histograms of standardized residuals for each observed variable from three-factor model fitted to random sample data (N = 100; no unusual cases).
Figure 3
Figure 3
Distribution of Mahalanobis Distance (MD) for multivariate normal random sample data (N = 100; no unusual cases).
Figure 4
Figure 4
Distribution of generalized Cook’s distance (gCD) for multivariate normal random sample data (N = 100; no unusual cases).
Figure 5
Figure 5
Scatterplot of “Remainders” by “Mixed Arithmetic” for perturbed sample with influential case indicated.
Figure 6
Figure 6
Histograms of standardized residuals for each observed variable from three-factor model fitted to perturbed sample data (N = 100).
Figure 7
Figure 7
Distribution of Mahalanobis distance (MD) for perturbed sample data (N = 100).
Figure 8
Figure 8
Distribution of generalized Cook’s distance (gCD) for perturbed sample data (N = 100).
Figure 9
Figure 9
Scatterplot of Case 1 items Word Meaning (WrdMean) by Sentence Completion (SntComp; N = 100).
Figure 10
Figure 10
Scatterplot of Case 2 items Word Meaning (WrdMean) by Sentence Completion (SntComp; N = 100).

References

    1. Asparouhov T., Muthén B. (2009). Exploratory structural equation modeling. Struct. Equ. Modeling 16, 397–43810.1080/10705510903008204 - DOI
    1. Babakus E., Ferguson C. E., Joreskög K. G. (1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions. J. Mark. Res. 37, 72–141
    1. Bartholomew D. J. (2007). “Three faces of factor analysis,” in Factor Analysis at 100: Historical Developments and Future Directions, eds Cudeck R., MacCallum R. C. (Mahwah, NJ: Lawrence Erlbaum Associates; ), 9–21
    1. Belsley D. A., Kuh E., Welsch R. E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley
    1. Bentler P. M. (2004). EQS 6 Structural Equations Program Manual. Encino, CA: Multivariate Software, Inc

LinkOut - more resources