Chi-Square Tests

There are a total of 2 different chi-square tests for categorical attributes.

Chi-square goodness of fit - Useful for 1 categorical attribute
Chi-square test of independence - Useful for 2 categorical attributes

Let’s explore some data related to the General Social Survey.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

head(gss_cat)

## # A tibble: 6 × 9
##    year marital         age race  rincome        partyid     relig denom tvhours
##   <int> <fct>         <int> <fct> <fct>          <fct>       <fct> <fct>   <int>
## 1  2000 Never married    26 White $8000 to 9999  Ind,near r… Prot… Sout…      12
## 2  2000 Divorced         48 White $8000 to 9999  Not str re… Prot… Bapt…      NA
## 3  2000 Widowed          67 White Not applicable Independent Prot… No d…       2
## 4  2000 Never married    39 White Not applicable Ind,near r… Orth… Not …       4
## 5  2000 Divorced         25 White Not applicable Not str de… None  Not …       1
## 6  2000 Married          25 White $20000 - 24999 Strong dem… Prot… Sout…      NA

Chi-square goodness of fit

Suppose we were interested in knowing if the proportion of people from the GSS data are similar to the political affiliation currently in the United States. This link gives the breakdown of the current political affiliation in the United States according to a continuously running Gallup poll.

The chi-square goodness of fit explores the extent to which a categorical attribute follows a specified distribution. The chi-square goodness of fit test is often framed as a way to evaluate if the sample data is representative of the population. As such, the following hypotheses would be a way to frame this analysis.

\[ H_{0}:\ All\ categories\ follow\ the\ population\ distribution \]

\[ H_{0}:\ All\ categories\ follow\ the\ specified\ percentages\ or\ proportions \]

To frame this analysis in the context of the current data, let’s look at the partyid attribute in the data.

count(gss_cat, partyid) %>%
  mutate(prop = n / sum(n))

## # A tibble: 10 × 3
##    partyid                n      prop
##    <fct>              <int>     <dbl>
##  1 No answer            154 0.00717  
##  2 Don't know             1 0.0000465
##  3 Other party          393 0.0183   
##  4 Strong republican   2314 0.108    
##  5 Not str republican  3032 0.141    
##  6 Ind,near rep        1791 0.0834   
##  7 Independent         4119 0.192    
##  8 Ind,near dem        2499 0.116    
##  9 Not str democrat    3690 0.172    
## 10 Strong democrat     3490 0.162

For this example, I’m going to collapse categories and assign a few to NA to omit from the analysis. The following code does this task and then returns a similar table to that shown above. In particular, since the Gallup poll only includes asks about republicans, independents, or democrats, I’m going to group others into an “other” category.

gss_cat <- gss_cat %>%
  mutate(partyid_collapse = fct_collapse(partyid,
    other = c("No answer", "Don't know", "Other party"),
    rep = c("Strong republican", "Not str republican"),
    ind = c("Ind,near rep", "Independent", "Ind,near dem"),
    dem = c("Not str democrat", "Strong democrat")
  ))

count(gss_cat, partyid_collapse) %>%
  mutate(prop = n / sum(n))

## # A tibble: 4 × 3
##   partyid_collapse     n   prop
##   <fct>            <int>  <dbl>
## 1 other              548 0.0255
## 2 rep               5346 0.249 
## 3 ind               8409 0.391 
## 4 dem               7180 0.334

Chi-Square GoF Mechanics

The chi-square goodness of fit compares the observed cell counts with the expected cell counts. More formally, the chi-square test statistic is as follows:

\[ \chi^2 = \sum \frac{( O - E ) ^ 2}{E} \]

where \(O\) is the observed cell counts and \(E\) are the expected cell counts. The expected cell counts are defined as the sample size times the hypothesized proportions/percentages (this is not completely statistically accurate, however, in many social science situations, this should be sufficient). For example:

\[ E = p_{H_{0}} * N \]

where \(p_{H_{0}}\) is the hypothesized proportions from the null hypothesis. The \(\chi^2\) statistic follows a chi-square distribution with \(k - 1\) degrees of freedom, where \(k\) is the number of categories.

Using the table above, these can be computed from the data and assuming the following as population proportions/percentages from the Gallup poll: Rep = 27%, Ind = 45%, Dem = 27%, other = 1%.

chi_tab <- count(gss_cat, partyid_collapse) %>%
  mutate(prop = n / sum(n), 
         prop_h0 = c(.01, .27, .45, .27),
         E = prop_h0 * sum(n))

chi_tab

## # A tibble: 4 × 5
##   partyid_collapse     n   prop prop_h0     E
##   <fct>            <int>  <dbl>   <dbl> <dbl>
## 1 other              548 0.0255    0.01  215.
## 2 rep               5346 0.249     0.27 5800.
## 3 ind               8409 0.391     0.45 9667.
## 4 dem               7180 0.334     0.27 5800.

The \(\chi^2\) statistic can be computed manually.

chi_tab %>%
  mutate(num = (n - E)^2,
         chi_cell = num / E) %>%
  summarise(chi_square = sum(chi_cell))

## # A tibble: 1 × 1
##   chi_square
##        <dbl>
## 1      1044.

More readily, using the chisq.test() function in R is easier. This function takes one primary argument, the attribute to do the chi-square goodness of fit test on. Optionally, the specific proportions need to be passed as well, which is typically desired unless equal percentages/proportions are desired.

xsq_got <- chisq.test(table(gss_cat$partyid_collapse), p = c(.01, .27, .45, .27))

xsq_got

## 
## 	Chi-squared test for given probabilities
## 
## data:  table(gss_cat$partyid_collapse)
## X-squared = 1044.2, df = 3, p-value < 2.2e-16

Explore Differences

It is often of interest to explore differences, particularly if the chi-square goodness of fit test has a small p-value. This would indicate that the counts likely do not follow the assumed distribution, but where are the differences found? Residuals can help with this. The residuals are the difference in the observed and expected values divided by the square root of the expected values.

\[ \chi^2_{resid} = \frac{(O - E)}{\sqrt{E}} \]

These can be extracted directly from the model object saved when running the chi-square test.

xsq_got$residuals

## 
##      other        rep        ind        dem 
##  22.730994  -5.966485 -12.798166  18.114264

Chi-square Test of Independence

The chi-square test of independence is similar to that of the goodness of fit test, except now instead of a single attribute of interest, there are now more than one categorical attribute to be explored. The test of independence explores if the observed attributes are independent from one another. That is, if the two categorical attributes are indpendent, this would assume that the two attributes are proportionally distributed across all categories. The form of the \(\chi^2\) test is the same as the GoT test:

\[ \chi^2 = \sum \frac{( O - E ) ^ 2}{E} \]

However, different from the goodness of fit test, the expected values are computed differently. The expected cell counts are now defined as:

\[ E = N * p_{r} * p_{c} \]

where \(p_{r}\) is the margin proportion for the rows, ignoring the columns (that is, marginal row proportion) and \(p_{c}\) is the margin proportion for the columns, ignoring the rows. Finally, the test has degrees of freedom equal to \((r - 1)(c - 1)\).

Data Example

To explore this example, let’s see if the political party affiliation differs (or is associated with) across years that the GSS data were collected. The data are collected over 14 years, collected every other year.

count(gss_cat, year)

## # A tibble: 8 × 2
##    year     n
##   <int> <int>
## 1  2000  2817
## 2  2002  2765
## 3  2004  2812
## 4  2006  4510
## 5  2008  2023
## 6  2010  2044
## 7  2012  1974
## 8  2014  2538

Suppose we were interested in exploring if there was a difference in political affiliation before and after 2010.

gss_cat <- gss_cat %>%
  mutate(year_2 = ifelse(year < 2010, "2000 to 2008", "2010 to 2014"))

addmargins(table(gss_cat$year_2, gss_cat$partyid_collapse))

##               
##                other   rep   ind   dem   Sum
##   2000 to 2008   327  3906  5734  4960 14927
##   2010 to 2014   221  1440  2675  2220  6556
##   Sum            548  5346  8409  7180 21483

Giving this table, a single expected value could be computed manually.

\[ E_{1,1} = 21483 * (14927 / 21483) * (548 / 21483) = 380.766 \]

21483 * (14927 / 21483) * (548 / 21483)

## [1] 380.766

These could be computed for subsequent cell expected values, but these can be extracted directly when fitting the chi-square using the chisq.test() function.

xsq_ind <- chisq.test(table(gss_cat$year_2, gss_cat$partyid_collapse))

xsq_ind$expected

##               
##                  other      rep      ind      dem
##   2000 to 2008 380.766 3714.553 5842.813 4988.868
##   2010 to 2014 167.234 1631.447 2566.187 2191.132

xsq_ind

## 
## 	Pearson's Chi-squared test
## 
## data:  table(gss_cat$year_2, gss_cat$partyid_collapse)
## X-squared = 64.399, df = 3, p-value = 6.745e-14

xsq_ind$residuals

##               
##                     other        rep        ind        dem
##   2000 to 2008 -2.7553619  3.1411979 -1.4235351 -0.4087162
##   2010 to 2014  4.1576263 -4.7398226  2.1480035  0.6167208

Effect Sizes for chi-square tests

Effect sizes for chi-square tests can be important, particularly with large sample sizes as the chi-square can be highly sensitive to large sample sizes. In particular, very small differences can be found with small samples sizes.

For the goodness of fit test, \(Cohens\ W\) can be estimated as an effect size measure. This is computed as:

\[ Cohens\ W = \sqrt{\frac{\chi^2}{N}} \]

For the test of independence, Cramer’s V can be used.

\[ V = \sqrt{\frac{\chi^2}{N * df^{*}}} \] where \(df^{*}\) is the smallest of \(r - 1\) or \(c - 1\).

Cramers V ranges from 0 to 1, where values closer to 1 indicate more variation is explained (ie, the attributes are not independent).

Cohen’s W is similar to Cramers V, but it is not limited to range between 0 and 1.

sqrt(693.4 / sum(chi_tab$n))

## [1] 0.1796571

sqrt(64.399 / 21483 * 1)

## [1] 0.05475101

The DescTools package can be used for Cramer’s V as well.

#install.packages("DescTools")
library(DescTools)

CramerV(table(gss_cat$year_2, gss_cat$partyid_collapse))

## [1] 0.05475087

Last updated on Apr 22, 2024