More details on pairwise contrasts

Last week the idea of pairwise contrasts were introduced. Here is a more formal discussion of pairwise contrasts and controlling of familywise type I error rates.

Bonferroni Method

This method is introduced first, primarily due to its simplicity. This is not the best measure to use however as will be discussed below. The Bonferroni method makes the following adjustment:

$\alpha_{new} = \frac{\alpha}{m}$

Where $$m$$ is the number of hypotheses being tested. Let’s use a specific example to frame this based on the baseball data.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1
## ✔ readr   2.1.3      ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(Lahman)
library(ggformula)
## Loading required package: ggstance
##
## Attaching package: 'ggstance'
##
## The following objects are masked from 'package:ggplot2':
##
##     geom_errorbarh, GeomErrorbarh
##
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
##
## The following object is masked from 'package:readr':
##
##     col_factor
##
##
## New to ggformula?  Try the tutorials:
##  learnr::run_tutorial("introduction", package = "ggformula")
##  learnr::run_tutorial("refining", package = "ggformula")
theme_set(theme_bw(base_size = 18))

career <- Batting %>%
filter(AB > 100) %>%
anti_join(Pitching, by = "playerID") %>%
filter(yearID > 1990) %>%
group_by(playerID, lgID) %>%
summarise(H = sum(H), AB = sum(AB)) %>%
mutate(average = H / AB)
## summarise() has grouped output by 'playerID'. You can override using the
## .groups argument.
career <- Appearances %>%
filter(yearID > 1990) %>%
select(-GS, -G_ph, -G_pr, -G_batting, -G_defense, -G_p, -G_lf, -G_cf, -G_rf) %>%
rowwise() %>%
mutate(g_inf = sum(c_across(G_1b:G_ss))) %>%
select(-G_1b, -G_2b, -G_3b, -G_ss) %>%
group_by(playerID, lgID) %>%
summarise(catcher = sum(G_c),
outfield = sum(G_of),
dh = sum(G_dh),
infield = sum(g_inf),
total_games = sum(G_all)) %>%
pivot_longer(catcher:infield,
names_to = "position") %>%
filter(value > 0) %>%
group_by(playerID, lgID) %>%
slice_max(value) %>%
select(playerID, lgID, position) %>%
inner_join(career)
## summarise() has grouped output by 'playerID'. You can override using the
## .groups argument.
## Joining, by = c("playerID", "lgID")
career <- People %>%
tbl_df() %>%
dplyr::select(playerID, nameFirst, nameLast) %>%
unite(name, nameFirst, nameLast, sep = " ") %>%
inner_join(career, by = "playerID")
## Warning: tbl_df() was deprecated in dplyr 1.0.0.
## ℹ Please use tibble::as_tibble() instead.
career <- career %>%
mutate(league_dummy = ifelse(lgID == 'NL', 1, 0))

head(career)
## # A tibble: 6 × 8
##   playerID  name               lgID  position     H    AB average league_dummy
##   <chr>     <chr>              <fct> <chr>    <int> <int>   <dbl>        <dbl>
## 1 abbotje01 Jeff Abbott        AL    outfield   127   459   0.277            0
## 2 abbotku01 Kurt Abbott        AL    infield     33   123   0.268            0
## 3 abbotku01 Kurt Abbott        NL    infield    455  1780   0.256            1
## 4 abercre01 Reggie Abercrombie NL    outfield    54   255   0.212            1
## 5 abernbr01 Brent Abernathy    AL    infield    194   767   0.253            0
## 6 abnersh01 Shawn Abner        AL    outfield    81   309   0.262            0
position_lm <- lm(average ~ 1 + position, data = career)

broom::tidy(position_lm)
## # A tibble: 4 × 5
##   term             estimate std.error statistic  p.value
##   <chr>               <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)        0.241    0.00140    172.   0
## 2 positiondh         0.0165   0.00345      4.79 1.75e- 6
## 3 positioninfield    0.0136   0.00161      8.44 4.80e-17
## 4 positionoutfield   0.0147   0.00163      9.00 4.05e-19
count(career, position)
## # A tibble: 4 × 2
##   position     n
##   <chr>    <int>
## 1 catcher    410
## 2 dh          81
## 3 infield   1248
## 4 outfield  1136

Given that there are 4 groups, if all pairwise comparisons were of interest, the following would be all possible NULL hypotheses. Note, I am assuming that each of these has a matching alternative hypothesis that states the groups differences are different from 0.

$H_{0}: \mu_{catcher} - \mu_{dh} = 0 \\ H_{0}: \mu_{catcher} - \mu_{infield} = 0 \\ H_{0}: \mu_{catcher} - \mu_{outfield} = 0 \\ H_{0}: \mu_{dh} - \mu_{infield} = 0 \\ H_{0}: \mu_{dh} - \mu_{outfield} = 0 \\ H_{0}: \mu_{infield} - \mu_{outfield} = 0 \\$

In general, to find the total number of hypotheses, you can use the combinations formula:

$C(n, r) = \binom{n}{r} = \frac{n!}{r!(n - r)!}$

For our example this would lead to:

$\binom{4}{2} = \frac{4!}{2!(4 - 2)!} = \frac{4 * 3 * 2 * 1}{2 * 1 * (2 * 1)} = \frac{24}{4} = 6$

Therefore, using bonferroni’s correction, our new alpha would be:

$\alpha_{new} = \frac{.05}{6} = .0083$

Alternatively, you could also adjust the p-values to make them smaller and still use .05 (or any other predetermined familywise $$\alpha$$ value).

$p_{new} = \frac{p_{original}}{m}$

This is what most software programs do automatically for you.

pairwise.t.test(career$average, career$position, p.adjust = 'bonf')
##
##  Pairwise comparisons using t tests with pooled SD
##
## data:  career$average and career$position
##
##          catcher dh infield
## dh       1.0e-05 -  -
## infield  2.9e-16 1  -
## outfield < 2e-16 1  1
##
## P value adjustment method: bonferroni

What is a type I error?

Before moving to why you should care, let’s more formally talk about type I errors. Suppose we have the following table of possible outcomes:

$$H_{0}$$ true $$H_{0}$$ false
Reject $$H_{0}$$ Type I Error ($$\alpha$$) Correct - Power ($$1 - \beta$$)
Retain $$H_{0}$$ Correct ($$1 - \alpha$$) Type II Error ($$\beta$$)

Type I error is when the null hypothesis is true in the population, but the statistical evidence supports the alternative hypothesis. Type II error is when the null hypothesis is false in the population, but the statistical evidence supports the null hypothesis.

Here is a good link for an interactive app that shows many of these terms visually: https://rpsychologist.com/d3/nhst/.

Type M and Type S errors

These were first defined by Gelman and Tuerlinckx (2000) and is also in section 4.4 of Regression and Other Stories textbook.

• Type S (sign) occurs when the sign of the estimated effect is in the opposite direction as the true effect.
• Type M (magnitude) occurse when the magnitude of the estimated effect is much different (often larger) than the true effect.

These really shift the narrative away from achieving statistical signficiance and moving toward proper estimation of effects or precision, this is a better goal in my opinion.

Holm-Bonferroni Procedure

The boneferroni procedure will be overly conservative and I wouldn’t recommend it’s use in practice. If you want a simple approach to p-value adjustment, I’d recommend just setting a specific value for the alpha value to be more conservative, for example setting it to 0.01.

The Holm-Bonferroni approach is an adjustment method that is more powerful then the original bonferroni procedure, but does not come with onerous assumptions. The Holm-Boneferroni procedure uses the following steps for adjustment.

Suppose there are $$m$$ p-values (ie, $$m$$ null hypotheses). First, order these from smallest to largest, be sure to keep track of which hypothese the p-value is associated with. Then, 1. Is the smallest p-value less than $$\alpha / m$$, if yes, provide evidence for the alternative hypothesis and proceed to the next p-value, it not stop. 2. Is the second smallest p-value less than $/ m - 1$, if eys, provide evidence for the alternative hypothesis and proceed to the next p-value, if not stop. 3. Continue, comparing the p-values in order to the following adjust alpha,

$\alpha_{new_{k}} = \frac{\alpha}{m + 1 - k}$

where is $$k$$ is the rank of the p-values when ordered from smallest to largest.

Similar to the bonferroni, the p-values can be adjusted with the following formula:

$p_{adj} = min(1, max((m + 1 - k) * p_{k} ))$

where is $$k$$ is the rank of the p-values when ordered from smallest to largest.

pairwise.t.test(career$average, career$position, p.adjust = 'holm')
##
##  Pairwise comparisons using t tests with pooled SD
##
## data:  career$average and career$position
##
##          catcher dh infield
## dh       7.0e-06 -  -
## infield  2.4e-16 1  -
## outfield < 2e-16 1  1
##
## P value adjustment method: holm
pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value ## catcher dh infield ## dh 1.747816e-06 NA NA ## infield 4.802093e-17 0.3745381 NA ## outfield 4.051053e-19 0.5776818 0.3567626 pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[3] * (6 + 1 - 1)
## [1] 2.430632e-18
pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[2] * (6 + 1 - 2) ## [1] 2.401046e-16 pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[1] * (6 + 1 - 3)
## [1] 6.991264e-06
pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[9] * (6 + 1 - 4) ## [1] 1.070288 pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[5] * (6 + 1 - 5)
## [1] 0.7490762
pairwise.t.test(career$average, career$position, p.adjust = 'none')\$p.value[6] * (6 + 1 - 6)
## [1] 0.5776818

Exploring assumptions for model with categorical attributes

plot(position_lm, which = 1:5)

Previous