Chi-Square Tests
Chi-Square Tests
There are a total of 2 different chi-square tests for categorical attributes.
- Chi-square goodness of fit - Useful for 1 categorical attribute
- Chi-square test of independence - Useful for 2 categorical attributes
Let’s explore some data related to the General Social Survey.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
head(gss_cat)
## # A tibble: 6 × 9
## year marital age race rincome partyid relig denom tvhours
## <int> <fct> <int> <fct> <fct> <fct> <fct> <fct> <int>
## 1 2000 Never married 26 White $8000 to 9999 Ind,near r… Prot… Sout… 12
## 2 2000 Divorced 48 White $8000 to 9999 Not str re… Prot… Bapt… NA
## 3 2000 Widowed 67 White Not applicable Independent Prot… No d… 2
## 4 2000 Never married 39 White Not applicable Ind,near r… Orth… Not … 4
## 5 2000 Divorced 25 White Not applicable Not str de… None Not … 1
## 6 2000 Married 25 White $20000 - 24999 Strong dem… Prot… Sout… NA
Chi-square goodness of fit
Suppose we were interested in knowing if the proportion of people from the GSS data are similar to the political affiliation currently in the United States. This link gives the breakdown of the current political affiliation in the United States according to a continuously running Gallup poll.
The chi-square goodness of fit explores the extent to which a categorical attribute follows a specified distribution. The chi-square goodness of fit test is often framed as a way to evaluate if the sample data is representative of the population. As such, the following hypotheses would be a way to frame this analysis.
\[ H_{0}:\ All\ categories\ follow\ the\ population\ distribution \]
or
\[ H_{0}:\ All\ categories\ follow\ the\ specified\ percentages\ or\ proportions \]
To frame this analysis in the context of the current data, let’s look at the partyid
attribute in the data.
count(gss_cat, partyid) %>%
mutate(prop = n / sum(n))
## # A tibble: 10 × 3
## partyid n prop
## <fct> <int> <dbl>
## 1 No answer 154 0.00717
## 2 Don't know 1 0.0000465
## 3 Other party 393 0.0183
## 4 Strong republican 2314 0.108
## 5 Not str republican 3032 0.141
## 6 Ind,near rep 1791 0.0834
## 7 Independent 4119 0.192
## 8 Ind,near dem 2499 0.116
## 9 Not str democrat 3690 0.172
## 10 Strong democrat 3490 0.162
For this example, I’m going to collapse categories and assign a few to NA to omit from the analysis. The following code does this task and then returns a similar table to that shown above. In particular, since the Gallup poll only includes asks about republicans, independents, or democrats, I’m going to group others into an “other” category.
gss_cat <- gss_cat %>%
mutate(partyid_collapse = fct_collapse(partyid,
other = c("No answer", "Don't know", "Other party"),
rep = c("Strong republican", "Not str republican"),
ind = c("Ind,near rep", "Independent", "Ind,near dem"),
dem = c("Not str democrat", "Strong democrat")
))
count(gss_cat, partyid_collapse) %>%
mutate(prop = n / sum(n))
## # A tibble: 4 × 3
## partyid_collapse n prop
## <fct> <int> <dbl>
## 1 other 548 0.0255
## 2 rep 5346 0.249
## 3 ind 8409 0.391
## 4 dem 7180 0.334
Chi-Square GoF Mechanics
The chi-square goodness of fit compares the observed cell counts with the expected cell counts. More formally, the chi-square test statistic is as follows:
\[ \chi^2 = \sum \frac{( O - E ) ^ 2}{E} \]
where \(O\) is the observed cell counts and \(E\) are the expected cell counts. The expected cell counts are defined as the sample size times the hypothesized proportions/percentages (this is not completely statistically accurate, however, in many social science situations, this should be sufficient). For example:
\[ E = p_{H_{0}} * N \]
where \(p_{H_{0}}\) is the hypothesized proportions from the null hypothesis. The \(\chi^2\) statistic follows a chi-square distribution with \(k - 1\) degrees of freedom, where \(k\) is the number of categories.
Using the table above, these can be computed from the data and assuming the following as population proportions/percentages from the Gallup poll: Rep = 27%, Ind = 45%, Dem = 27%, other = 1%.
chi_tab <- count(gss_cat, partyid_collapse) %>%
mutate(prop = n / sum(n),
prop_h0 = c(.01, .27, .45, .27),
E = prop_h0 * sum(n))
chi_tab
## # A tibble: 4 × 5
## partyid_collapse n prop prop_h0 E
## <fct> <int> <dbl> <dbl> <dbl>
## 1 other 548 0.0255 0.01 215.
## 2 rep 5346 0.249 0.27 5800.
## 3 ind 8409 0.391 0.45 9667.
## 4 dem 7180 0.334 0.27 5800.
The \(\chi^2\) statistic can be computed manually.
chi_tab %>%
mutate(num = (n - E)^2,
chi_cell = num / E) %>%
summarise(chi_square = sum(chi_cell))
## # A tibble: 1 × 1
## chi_square
## <dbl>
## 1 1044.
More readily, using the chisq.test()
function in R is easier. This function takes one primary argument, the attribute to do the chi-square goodness of fit test on. Optionally, the specific proportions need to be passed as well, which is typically desired unless equal percentages/proportions are desired.
xsq_got <- chisq.test(table(gss_cat$partyid_collapse), p = c(.01, .27, .45, .27))
xsq_got
##
## Chi-squared test for given probabilities
##
## data: table(gss_cat$partyid_collapse)
## X-squared = 1044.2, df = 3, p-value < 2.2e-16
Explore Differences
It is often of interest to explore differences, particularly if the chi-square goodness of fit test has a small p-value. This would indicate that the counts likely do not follow the assumed distribution, but where are the differences found? Residuals can help with this. The residuals are the difference in the observed and expected values divided by the square root of the expected values.
\[ \chi^2_{resid} = \frac{(O - E)}{\sqrt{E}} \]
These can be extracted directly from the model object saved when running the chi-square test.
xsq_got$residuals
##
## other rep ind dem
## 22.730994 -5.966485 -12.798166 18.114264
Chi-square Test of Independence
The chi-square test of independence is similar to that of the goodness of fit test, except now instead of a single attribute of interest, there are now more than one categorical attribute to be explored. The test of independence explores if the observed attributes are independent from one another. That is, if the two categorical attributes are indpendent, this would assume that the two attributes are proportionally distributed across all categories. The form of the \(\chi^2\) test is the same as the GoT test:
\[ \chi^2 = \sum \frac{( O - E ) ^ 2}{E} \]
However, different from the goodness of fit test, the expected values are computed differently. The expected cell counts are now defined as:
\[ E = N * p_{r} * p_{c} \]
where \(p_{r}\) is the margin proportion for the rows, ignoring the columns (that is, marginal row proportion) and \(p_{c}\) is the margin proportion for the columns, ignoring the rows. Finally, the test has degrees of freedom equal to \((r - 1)(c - 1)\).
Data Example
To explore this example, let’s see if the political party affiliation differs (or is associated with) across years that the GSS data were collected. The data are collected over 14 years, collected every other year.
count(gss_cat, year)
## # A tibble: 8 × 2
## year n
## <int> <int>
## 1 2000 2817
## 2 2002 2765
## 3 2004 2812
## 4 2006 4510
## 5 2008 2023
## 6 2010 2044
## 7 2012 1974
## 8 2014 2538
Suppose we were interested in exploring if there was a difference in political affiliation before and after 2010.
gss_cat <- gss_cat %>%
mutate(year_2 = ifelse(year < 2010, "2000 to 2008", "2010 to 2014"))
addmargins(table(gss_cat$year_2, gss_cat$partyid_collapse))
##
## other rep ind dem Sum
## 2000 to 2008 327 3906 5734 4960 14927
## 2010 to 2014 221 1440 2675 2220 6556
## Sum 548 5346 8409 7180 21483
Giving this table, a single expected value could be computed manually.
\[ E_{1,1} = 21483 * (14927 / 21483) * (548 / 21483) = 380.766 \]
21483 * (14927 / 21483) * (548 / 21483)
## [1] 380.766
These could be computed for subsequent cell expected values, but these can be extracted directly when fitting the chi-square using the chisq.test()
function.
xsq_ind <- chisq.test(table(gss_cat$year_2, gss_cat$partyid_collapse))
xsq_ind$expected
##
## other rep ind dem
## 2000 to 2008 380.766 3714.553 5842.813 4988.868
## 2010 to 2014 167.234 1631.447 2566.187 2191.132
xsq_ind
##
## Pearson's Chi-squared test
##
## data: table(gss_cat$year_2, gss_cat$partyid_collapse)
## X-squared = 64.399, df = 3, p-value = 6.745e-14
xsq_ind$residuals
##
## other rep ind dem
## 2000 to 2008 -2.7553619 3.1411979 -1.4235351 -0.4087162
## 2010 to 2014 4.1576263 -4.7398226 2.1480035 0.6167208
Effect Sizes for chi-square tests
Effect sizes for chi-square tests can be important, particularly with large sample sizes as the chi-square can be highly sensitive to large sample sizes. In particular, very small differences can be found with small samples sizes.
For the goodness of fit test, \(Cohens\ W\) can be estimated as an effect size measure. This is computed as:
\[ Cohens\ W = \sqrt{\frac{\chi^2}{N}} \]
For the test of independence, Cramer’s V can be used.
\[ V = \sqrt{\frac{\chi^2}{N * df^{*}}} \] where \(df^{*}\) is the smallest of \(r - 1\) or \(c - 1\).
Cramers V ranges from 0 to 1, where values closer to 1 indicate more variation is explained (ie, the attributes are not independent).
Cohen’s W is similar to Cramers V, but it is not limited to range between 0 and 1.
sqrt(693.4 / sum(chi_tab$n))
## [1] 0.1796571
sqrt(64.399 / 21483 * 1)
## [1] 0.05475101
The DescTools package can be used for Cramer’s V as well.
#install.packages("DescTools")
library(DescTools)
CramerV(table(gss_cat$year_2, gss_cat$partyid_collapse))
## [1] 0.05475087