# Pairwise Contrasts

# More details on pairwise contrasts

Last week the idea of pairwise contrasts were introduced. Here is a more formal discussion of pairwise contrasts and controlling of familywise type I error rates.

## Bonferroni Method

This method is introduced first, primarily due to its simplicity. This is not the best measure to use however as will be discussed below. The Bonferroni method makes the following adjustment:

\[ \alpha_{new} = \frac{\alpha}{m} \]

Where \(m\) is the number of hypotheses being tested. Let’s use a specific example to frame this based on the baseball data.

`library(tidyverse)`

```
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
```

```
library(Lahman)
library(ggformula)
```

```
## Loading required package: ggstance
##
## Attaching package: 'ggstance'
##
## The following objects are masked from 'package:ggplot2':
##
## geom_errorbarh, GeomErrorbarh
##
## Loading required package: scales
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
##
## Loading required package: ggridges
##
## New to ggformula? Try the tutorials:
## learnr::run_tutorial("introduction", package = "ggformula")
## learnr::run_tutorial("refining", package = "ggformula")
```

```
theme_set(theme_bw(base_size = 18))
career <- Batting %>%
filter(AB > 100) %>%
anti_join(Pitching, by = "playerID") %>%
filter(yearID > 1990) %>%
group_by(playerID, lgID) %>%
summarise(H = sum(H), AB = sum(AB)) %>%
mutate(average = H / AB)
```

```
## `summarise()` has grouped output by 'playerID'. You can override using the
## `.groups` argument.
```

```
career <- Appearances %>%
filter(yearID > 1990) %>%
select(-GS, -G_ph, -G_pr, -G_batting, -G_defense, -G_p, -G_lf, -G_cf, -G_rf) %>%
rowwise() %>%
mutate(g_inf = sum(c_across(G_1b:G_ss))) %>%
select(-G_1b, -G_2b, -G_3b, -G_ss) %>%
group_by(playerID, lgID) %>%
summarise(catcher = sum(G_c),
outfield = sum(G_of),
dh = sum(G_dh),
infield = sum(g_inf),
total_games = sum(G_all)) %>%
pivot_longer(catcher:infield,
names_to = "position") %>%
filter(value > 0) %>%
group_by(playerID, lgID) %>%
slice_max(value) %>%
select(playerID, lgID, position) %>%
inner_join(career)
```

```
## `summarise()` has grouped output by 'playerID'. You can override using the
## `.groups` argument.
## Joining, by = c("playerID", "lgID")
```

```
career <- People %>%
tbl_df() %>%
dplyr::select(playerID, nameFirst, nameLast) %>%
unite(name, nameFirst, nameLast, sep = " ") %>%
inner_join(career, by = "playerID")
```

```
## Warning: `tbl_df()` was deprecated in dplyr 1.0.0.
## ℹ Please use `tibble::as_tibble()` instead.
```

```
career <- career %>%
mutate(league_dummy = ifelse(lgID == 'NL', 1, 0))
head(career)
```

```
## # A tibble: 6 × 8
## playerID name lgID position H AB average league_dummy
## <chr> <chr> <fct> <chr> <int> <int> <dbl> <dbl>
## 1 abbotje01 Jeff Abbott AL outfield 127 459 0.277 0
## 2 abbotku01 Kurt Abbott AL infield 33 123 0.268 0
## 3 abbotku01 Kurt Abbott NL infield 455 1780 0.256 1
## 4 abercre01 Reggie Abercrombie NL outfield 54 255 0.212 1
## 5 abernbr01 Brent Abernathy AL infield 194 767 0.253 0
## 6 abnersh01 Shawn Abner AL outfield 81 309 0.262 0
```

```
position_lm <- lm(average ~ 1 + position, data = career)
broom::tidy(position_lm)
```

```
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.241 0.00140 172. 0
## 2 positiondh 0.0165 0.00345 4.79 1.75e- 6
## 3 positioninfield 0.0136 0.00161 8.44 4.80e-17
## 4 positionoutfield 0.0147 0.00163 9.00 4.05e-19
```

`count(career, position)`

```
## # A tibble: 4 × 2
## position n
## <chr> <int>
## 1 catcher 410
## 2 dh 81
## 3 infield 1248
## 4 outfield 1136
```

Given that there are 4 groups, if all pairwise comparisons were of interest, the following would be all possible **NULL** hypotheses. Note, I am assuming that each of these has a matching alternative hypothesis that states the groups differences are *different* from 0.

\[ H_{0}: \mu_{catcher} - \mu_{dh} = 0 \\ H_{0}: \mu_{catcher} - \mu_{infield} = 0 \\ H_{0}: \mu_{catcher} - \mu_{outfield} = 0 \\ H_{0}: \mu_{dh} - \mu_{infield} = 0 \\ H_{0}: \mu_{dh} - \mu_{outfield} = 0 \\ H_{0}: \mu_{infield} - \mu_{outfield} = 0 \\ \]

In general, to find the total number of hypotheses, you can use the combinations formula:

\[ C(n, r) = \binom{n}{r} = \frac{n!}{r!(n - r)!} \]

For our example this would lead to:

\[ \binom{4}{2} = \frac{4!}{2!(4 - 2)!} = \frac{4 * 3 * 2 * 1}{2 * 1 * (2 * 1)} = \frac{24}{4} = 6 \]

Therefore, using bonferroni’s correction, our new alpha would be:

\[ \alpha_{new} = \frac{.05}{6} = .0083 \]

Alternatively, you could also adjust the p-values to make them smaller and still use .05 (or any other predetermined familywise \(\alpha\) value).

\[ p_{new} = \frac{p_{original}}{m} \]

This is what most software programs do automatically for you.

`pairwise.t.test(career$average, career$position, p.adjust = 'bonf')`

```
##
## Pairwise comparisons using t tests with pooled SD
##
## data: career$average and career$position
##
## catcher dh infield
## dh 1.0e-05 - -
## infield 2.9e-16 1 -
## outfield < 2e-16 1 1
##
## P value adjustment method: bonferroni
```

## What is a type I error?

Before moving to why you should care, let’s more formally talk about type I errors. Suppose we have the following table of possible outcomes:

\(H_{0}\) true | \(H_{0}\) false | |
---|---|---|

Reject \(H_{0}\) | Type I Error (\(\alpha\)) | Correct - Power (\(1 - \beta\)) |

Retain \(H_{0}\) | Correct (\(1 - \alpha\)) | Type II Error (\(\beta\)) |

**Type I error** is when the null hypothesis is true in the population, but the statistical evidence *supports the alternative hypothesis*.
**Type II error** is when the null hypothesis is false in the population, but the statistical evidence *supports the null hypothesis*.

Here is a good link for an interactive app that shows many of these terms visually: https://rpsychologist.com/d3/nhst/.

### Type M and Type S errors

These were first defined by Gelman and Tuerlinckx (2000) and is also in section 4.4 of *Regression and Other Stories* textbook.

- Type S (sign) occurs when the sign of the estimated effect is in the opposite direction as the true effect.
- Type M (magnitude) occurse when the magnitude of the estimated effect is much different (often larger) than the true effect.

These really shift the narrative away from achieving statistical signficiance and moving toward proper estimation of effects or precision, this is a better goal in my opinion.

## Holm-Bonferroni Procedure

The boneferroni procedure will be overly conservative and I wouldn’t recommend it’s use in practice. If you want a simple approach to p-value adjustment, I’d recommend just setting a specific value for the alpha value to be more conservative, for example setting it to 0.01.

The Holm-Bonferroni approach is an adjustment method that is more powerful then the original bonferroni procedure, but does not come with onerous assumptions. The Holm-Boneferroni procedure uses the following steps for adjustment.

Suppose there are \(m\) p-values (ie, \(m\) null hypotheses). First, order these from smallest to largest, be sure to keep track of which hypothese the p-value is associated with. Then,
1. Is the smallest p-value **less than** \(\alpha / m\), if yes, provide evidence for the alternative hypothesis and proceed to the next p-value, it not stop.
2. Is the second smallest p-value **less than** $ / m - 1$, if eys, provide evidence for the alternative hypothesis and proceed to the next p-value, if not stop.
3. Continue, comparing the p-values in order to the following adjust alpha,

\[ \alpha_{new_{k}} = \frac{\alpha}{m + 1 - k} \]

where is \(k\) is the rank of the p-values when ordered from smallest to largest.

Similar to the bonferroni, the p-values can be adjusted with the following formula:

\[ p_{adj} = min(1, max((m + 1 - k) * p_{k} )) \]

where is \(k\) is the rank of the p-values when ordered from smallest to largest.

`pairwise.t.test(career$average, career$position, p.adjust = 'holm')`

```
##
## Pairwise comparisons using t tests with pooled SD
##
## data: career$average and career$position
##
## catcher dh infield
## dh 7.0e-06 - -
## infield 2.4e-16 1 -
## outfield < 2e-16 1 1
##
## P value adjustment method: holm
```

`pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value`

```
## catcher dh infield
## dh 1.747816e-06 NA NA
## infield 4.802093e-17 0.3745381 NA
## outfield 4.051053e-19 0.5776818 0.3567626
```

`pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[3] * (6 + 1 - 1)`

`## [1] 2.430632e-18`

`pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[2] * (6 + 1 - 2)`

`## [1] 2.401046e-16`

`pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[1] * (6 + 1 - 3)`

`## [1] 6.991264e-06`

`pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[9] * (6 + 1 - 4)`

`## [1] 1.070288`

`pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[5] * (6 + 1 - 5)`

`## [1] 0.7490762`

`pairwise.t.test(career$average, career$position, p.adjust = 'none')$p.value[6] * (6 + 1 - 6)`

`## [1] 0.5776818`

## Exploring assumptions for model with categorical attributes

`plot(position_lm, which = 1:5)`