Analyzing the Measurement Equivalence of a Translated Test in a Statewide Assessment Program

Keywords: Measurement equivalence, structural equation modeling, confirmatory factor analysis


 When tests are translated into one or more languages, the question of the equivalence of items across language forms arises. This equivalence can be assessed at the scale level by means of a multiple group confirmatory factor analysis (CFA) in the context of structural equation modeling. This study examined the measurement equivalence of a Spanish translated version of a statewide Mathematics test originally constructed in English by using a multi-group CFA approach. The study used samples of native speakers of the target language of the translation taking the test in both the source and target language, specifically Hispanics taking the test in English and Spanish. Test items were grouped in twelve facet-representative parcels. The parceling was accomplished by grouping items that corresponded to similar content and computing an average for each parcel. Four models were fitted to examine the equivalence of the test across groups. The multi-group CFA fixed factor loadings across groups and results supported the equivalence of the two language versions (English and Spanish) of the test. The statistical techniques implemented in this study can also be used to address the performance on a test based on dichotomous or dichotomized variables such as gender, socioeconomic status, geographic location and other variables of interest. 

Author Biographies

Jorge Carvajal-Espinoza, Universidad de Costa Rica

Licenciado in Math Education and Master’s in Educational Evualuation from the Universidad de Costa Rica; PhD in Educational Measurement from the University of Kansas. He is a professor at the School of Mathematics, Universidad de Costa Rica, where he has taught for more than 20 years and is a researcher at the Centro de Investigaciones Matemáticas y Meta-Matemáticas, Universidad de Costa Rica. He has published internationally and has presented at international conferences in the field of Educational Measurement. He supervises the development and statistical analysis of Prueba de Diagnóstico, an entrance placement test at the School of Mathematics, Universidad de Costa Rica

Greg Welch, University of Nebraska

Received a Bachelor’s in Psychology and a Master’s in Applied Statistics from the University of Wyoming, and a Master’s and Doctorate in Research Methodology in Education from the University of Pittsburgh. Welch currently leads the evaluation efforts for Center for Research  on Children, Youth, Families & Schools at University of Nebraska-Lincoln (UNL) and has provided formative and summative evaluation expertise on a number of privately and federally funded projects. He also serves as an adjunct faculty member for the Quantitative, Qualitative, and Psychometrics Methods Program in the Department of Educational Psychology at UNL. Welch has taught numerous graduate level courses, including Introduction to Educational Measurement, Structural Equation Modeling, and Program Evaluation. He is a regular member of numerous doctoral committees for students in programs throughout the College of Education and Human Sciences. Greg Welch’s research agenda focuses on utilizing advanced methodological approaches to address important educational policy-related issues.


American Educational Research Association (AERA), Asociación Americana de Psicología (APA), & National Coucil on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

August, D., & Hakuta, K. (Eds.). (1997). Improving schooling for language-minority students. A research agenda. Washington, DC: National Academy of Science.

Bentler, P. M. (1995). EQS: Structural equations program manual. Encino, CA: Multivariate Software.

Byrne, B. M. (2006). Structural equation modeling with EQS: Basic concepts, applications, and programming. Mahwah, NJ: Lawrence Erlbaum. doi:

Carvajal, J. (2015). Using DIF to monitor equivalence of translated tests in large scale assessment: A comparison of native speakers in their primary and the test’s source language. The Tapestry Journal, 7(1), 14-21. Recuperado de

Gierl, M., Rogers, W. T., & Klinger, D. A. (1999). Using statistical and judgmental reviews to identify and interpret translation differential item functioning. The Alberta Journal of Educational Research, 45(4), 353-376. Recuperado de

Hall, R. J., Snell, A. F., & Singer M. (1999). Item parceling strategies in SEM: Investigating the subtle effects of unmodeled secondary constructs. Organizational Research Methods, 2(3), 233-256. doi:

Hirschfeld, G., & von Brachel, R. (2014). Multiple-Group confirmatory factor analysis in R-A tutorial in measurement invariance with continuous and ordinal indicators. Practical Assessment, Research & Evaluation, 19(7), 1-12. Recuperado de

Holmes, D., Hedlund, P., & Nickerson, B. (2000). Accommodating ELLs in state and local assessments. Washington, DC: National Clearinghouse for Bilingual Education.

Hu, L. & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structual Equation Modeling, 6(1), 1-55. doi:

Lara, J., & August, D. (1996). Systemic reform and limited English proficient students. Washington, DC: Council of Chief State School Officers.

Lievens, F., Anseel, F., Harris, M. M., & Eisenberg, J. (2007). Measurement invariance of the Pay Satisfaction Questionnaire across three countries. Educational and Psychological Measurement, 67(6), 1042-1051. doi:

Price, L. R. (1999). Differential functioning of items and tests versus the Mantel-Haenszel technique for detecting differential item functioning in a translated test. Paper presented at the annual meeting of the American Alliance of Health Physical Education, Recreation, and Dance. Boston, MA.

Robin, F., Sireci, S. G., & Hambleton, R. (2003). Evaluating the equivalence of different language versions of a credentialing exam. International Journal of Testing, 3(1), 1-20. doi:

Sireci, S. G., & Khaliq, S. N. (April, 2002). An analysis of the psychometric properties of dual language test forms. (Center for Educational Assessment, Report No. 458). Paper presented at the Annual Meeting of the National Council on Measurement in Education. Amherst: University of Massachusetts, School of Education. Recuperado de

Wu, A. D., Li, Z., & Zumbo, B. D. (2007). Decoding the meaning of factorial invariance and updating the practice of multi-group confirmatory factor analysis: A demonstration with TIMSS data. Practical Assessment, Research and Evaluation, 12(3), 1-26. Recuperado de

How to Cite
Carvajal-Espinoza, J., & Welch, G. (2016). Analyzing the Measurement Equivalence of a Translated Test in a Statewide Assessment Program. Revista Electrónica Educare, 20(3), 1-18.
Articles (Peer Reviewed Section)

Comentarios (ver términos de uso)