Riley McDanal, M.A., a researcher at Stony Brook University, wearing blue-framed glasses, gold hoop earrings, and a tan coat with a black headband. She has a confident expression with a slight smile, set against a plain white background.
Home » Posts » Marginalized Populations Deserve Our Very Best: Measurement Invariance as the Foundation of Mental Health Disparities Research

Marginalized Populations Deserve Our Very Best: Measurement Invariance as the Foundation of Mental Health Disparities Research

by Riley McDanal, M.A., Stony Brook University

Our ability to understand and reduce mental health problems among marginalized groups rests, in part, on our ability to accurately measure disparities in mental health problems across groups with different levels of privilege. If our measures are inadequate, then so will be our results, conclusions, and applications. When we use a common, well-established measure to assess group differences — for example, the Patient Health Questionnaire (PHQ-9) for depression severity — we tend to assume that the scale is measuring the same problem, in the same way, for different groups. This assumption should not be taken for granted. 

Marginalized groups, such as people who identify with a diverse gender identity, demonstrate higher scores on the PHQ-9 relative to cisgender individuals (Borgogna et al., 2019). So, we would typically conclude that gender-diverse people show higher average depression severity than do cisgender people. But what if scores on a measure mean something different for each group? What if we aren’t comparing apples to apples? 

For measures like the PHQ-9, we assume that scores on each item carry the same weight — and therefore, have the same meaning and impact — across different groups of people. Consider the PHQ-9 item assessing self-harm. When both a cisgender person and a gender-diverse person give the same rating, we expect that this rating reflects an equal amount of increased depression severity for each person. But, consider that self-harm may be a stronger indicator of depression for gender-diverse people than it is for cisgender people (Borgogna et al., 2021). If so, then the same rating on this item might actually reflect different levels of severity for different groups of people. A rating of “2” for self-harm could indicate more severe depression for gender-diverse people than the same rating would for cisgender people. 

In comes measurement invariance. A scale’s level of invariance (i.e., the extent to which it does not vary in the way it measures the construct across groups) influences our confidence in conclusions deriving from cross-group comparisons of scale scores. For a scale that is strongly invariant across gender identity, we can more defensibly conclude that disparities in scale scores genuinely reflect real differences in symptom severity on the basis of gender (Eaton, 2020), disparities that could be driven by, for example, minority stress (Eaton et al., 2021). 

When a measure is less invariant, our comparisons are confounded by the possibility that observed differences in scores may reflect group differences in how mental health problems are fundamentally experienced. If depression is different for gender-diverse people (e.g., is more strongly defined by self-harm) relative to cisgender people, then we aren’t comparing the same problem across groups; we are comparing different forms of the problem. As such, insufficient measurement invariance in the PHQ-9 across gender means that we cannot tease apart whether differences are driven by true disparities for the same problem or group differences in the structure of the problem itself. 

Noninvariance can also indicate that items in our measures are interpreted differently across groups. In this case, we cannot determine whether estimated differences are driven by true symptom disparities or by different interpretations of what it means to experience a distressing level of a symptom. For example, in measures of obsessive-compulsive behaviors (e.g., The Maudsley Obsessive-Compulsive Inventory; the Padua Inventory), Black adults on average may require a lower symptom severity level to endorse the presence of compulsive behaviors than do White adults (Thomas et al., 2000; Williams et al., 2005). To put it another way, even when Black adults and White adults demonstrate equal severity of compulsive behaviors as assessed by interview, the same Black adults are more likely to endorse self-report items assessing distressing compulsive behaviors relative to White adults (Thomas et al., 2000). This discrepancy, termed differential item functioning, can lead to overestimation of compulsions in Black adults. If a given Black adult and a given White adult experience the same level of symptom severity, and their diagnostic interviews indicate the same level of severity, but the Black person’s self-report responses are artificially elevated, the likelihood of a clinical diagnosis might be higher for the Black person if significant weight is given to the self-report responses. Statistically accounting for a measure’s noninvariance can reduce the confounding influence of such differential item functioning. 

Measurement invariance is often taken for granted in our research. Unfortunately, almost all studies simply assume complete invariance when reporting differences between groups. Assessing measurement invariance is not only crucial, but also quite feasible. Indeed, in the lavaan package of R, a single line of code can be used to compare each level of cross-group invariance simultaneously (DataCamp, n.d.; McDanal, 2021). Thus, assessing and reporting on measurement invariance is easily within the skillset of graduate students, and it represents an important addition to their toolkit at a time when studies of diversity are beginning to receive the attention they deserve. Invariance analyses can inform readers about how — and whether — to defensibly interpret differences between groups on a given measure. Without sufficient measurement invariance, we should not assume that self-harm equally contributes to depression for gender-diverse people and for cisgender people. Without sufficient measurement invariance, we should not assume that Black people experience higher impairment in compulsive behaviors than do White people. Without measurement invariance (or accounting for measurement noninvariance), our conclusions about cross-group disparities and our resulting approach to assessment and treatment sit on questionable grounds. 

_______________________________________________________________

References

Borgogna, N. C., McDermott, R. C., Aita, S. L., & Kridel, M. M. (2019). Anxiety and depression across gender and sexual minorities: Implications for transgender, gender nonconforming, pansexual, demisexual, asexual, queer, and questioning individuals. Psychology of Sexual Orientation and Gender Diversity, 6(1), 54.

Borgogna, N. C., Brenner, R. E., & McDermott, R. C. (2021). Sexuality and gender invariance of the PHQ-9 and GAD-7: Implications for 16 identity groups. Journal of Affective Disorders, 278, 122-130.

DataCamp. (n.d.). CompareFit: Build an object summarizing fit indices across multiple models. RDocumentation. Retrieved October 18, 2022, from https://www.rdocumentation.org/packages/semTools/versions/0.4-9/topics/compareFit 

Eaton, N. R. (2020). Measurement and mental health disparities: psychopathology classification and identity assessment. Personality and Mental Health, 14, 76-87.

Eaton, N. R., Rodriguez-Seijas, C., & Pachankis, J. E. (2021). Transdiagnostic approaches to sexual and gender minority mental health. Current Directions in Psychological Science, 30(6), 510-518.

McDanal, R. Invariance (R syntax). Retrieved from https://osf.io/cjwfn

Thomas, J., Turkheimer, E., & Oltmanns, T. F. (2000). Psychometric analysis of racial differences on the Maudsley Obsessional Compulsive Inventory. Assessment, 7(3), 247-258.

Williams, M. T., Turkheimer, E., Schmidt, K. M., & Oltmanns, T. F. (2005). Ethnic identification biases responses to the Padua Inventory for obsessive-compulsive disorder. Assessment, 12(2), 174-185.

Disclaimer: The views and opinions expressed in this newsletter are those of the authors alone and do not necessarily reflect the official policy or position of the Psychological Clinical Science Accreditation System (PCSAS).