Effect Sizes
Effect Sizes
An effect size is the magnitude of the relationship between two attributes. This magnitude can be reported and interpreted on many different metrics, many of which that we have already considered, some of which we have not. The term effect size is often miscontrued to mean a standardized magnitude of relationship, but this is not needed, but can be helpful in some situations.
As a slight aside, there is a field, called meta-analysis that is used to statistically combine effect sizes across multiple studies conducted on a similar topic. The goal of these studies is to understand the aggregate effect size (i.e., average effect size), but also if there is any heterogeneity in those effect sizes.
Simple Effect Size - slopes
To understand the magnitude of the effect, the slope estimates can be an unstandardized effect size measure. This is true for both continuous and categorical predictor attributes. For the continuous attributes it would represent a true slope, but for a categorical predictors it would represent a mean difference going from the reference group to the focal group.
Standardized effect sizes
Standardized effect sizes can also be considered, these can be helpful to compare across studies where the sample is different or is from a different population. The standardization usually comes from taking the standard deviations into account, this then turns the slope into a metric that is interpreted in terms of standard deviation units. We discussed this earlier in the semester with standardized regression slopes.
Continuous predictor attributes
We can also use this formula to convert any unstandardized regression coefficients into a standardized metric.
\[ b^{'}_{k} = b_{k} * \frac{s_{x_{k}}}{s_{y}} \]
Let’s do an example as a reminder.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggformula)
## Loading required package: ggstance
##
## Attaching package: 'ggstance'
##
## The following objects are masked from 'package:ggplot2':
##
## geom_errorbarh, GeomErrorbarh
##
## Loading required package: scales
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
##
## Loading required package: ggridges
##
## New to ggformula? Try the tutorials:
## learnr::run_tutorial("introduction", package = "ggformula")
## learnr::run_tutorial("refining", package = "ggformula")
library(mosaic)
## Registered S3 method overwritten by 'mosaic':
## method from
## fortify.SpatialPolygonsDataFrame ggplot2
##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected by this.
##
## Attaching package: 'mosaic'
##
## The following object is masked from 'package:Matrix':
##
## mean
##
## The following object is masked from 'package:scales':
##
## rescale
##
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
##
## The following object is masked from 'package:purrr':
##
## cross
##
## The following object is masked from 'package:ggplot2':
##
## stat
##
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
## quantile, sd, t.test, var
##
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
theme_set(theme_bw(base_size = 18))
airquality <- readr::read_csv("https://raw.githubusercontent.com/lebebr01/psqf_6243/main/data/iowa_air_quality_2021.csv")
## Rows: 6917 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): date, site_name, aqs_parameter_desc, cbsa_name, county
## dbl (5): id, poc, pm2.5, daily_aqi, cbsa_code
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
wind <- readr::read_csv("https://raw.githubusercontent.com/lebebr01/psqf_6243/main/data/daily_WIND_2021-iowa.csv")
## Rows: 1537 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): date, cbsa_name
## dbl (3): avg_wind, max_wind, max_wind_hours
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
airquality <- airquality |>
left_join(wind, by = c('cbsa_name', 'date')) |>
drop_na() |>
mutate(avg_wind_mc = avg_wind - mean(avg_wind))
head(airquality)
## # A tibble: 6 × 14
## date id poc pm2.5 daily_aqi site_…¹ aqs_p…² cbsa_…³ cbsa_…⁴ county
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr> <chr>
## 1 1/1/21 190130009 1 15.1 57 Water … PM2.5 … 47940 Waterl… Black…
## 2 1/4/21 190130009 1 13.3 54 Water … PM2.5 … 47940 Waterl… Black…
## 3 1/7/21 190130009 1 20.5 69 Water … PM2.5 … 47940 Waterl… Black…
## 4 1/10/21 190130009 1 14.3 56 Water … PM2.5 … 47940 Waterl… Black…
## 5 1/13/21 190130009 1 13.7 54 Water … PM2.5 … 47940 Waterl… Black…
## 6 1/16/21 190130009 1 5.3 22 Water … PM2.5 … 47940 Waterl… Black…
## # … with 4 more variables: avg_wind <dbl>, max_wind <dbl>,
## # max_wind_hours <dbl>, avg_wind_mc <dbl>, and abbreviated variable names
## # ¹site_name, ²aqs_parameter_desc, ³cbsa_code, ⁴cbsa_name
air_lm_mc <- lm(daily_aqi ~ avg_wind_mc + max_wind, data = airquality)
broom::tidy(air_lm_mc)
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 27.4 1.47 18.6 2.80e-74
## 2 avg_wind_mc -4.43 0.300 -14.8 3.31e-48
## 3 max_wind 1.58 0.201 7.87 4.52e-15
There is a package called QuantPsyc
that has a convenience function to compute standardized regression coefficients, named lm.beta()
.
QuantPsyc::lm.beta(air_lm_mc) |> round(3)
## avg_wind_mc max_wind
## -0.584 0.312
Categorical Predictor Attributes
When including categorical attributes, the slope represents a mean difference between the reference group to the focal group.
air_county <- lm(daily_aqi ~ county, data = airquality)
broom::tidy(air_county)
## # A tibble: 8 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 38.7 0.923 42.0 0
## 2 countyClinton 3.56 1.20 2.97 0.00299
## 3 countyJohnson 4.37 1.36 3.21 0.00132
## 4 countyLinn 4.03 1.34 3.02 0.00256
## 5 countyMuscatine -1.48 1.06 -1.39 0.164
## 6 countyPolk -5.90 1.13 -5.21 0.000000197
## 7 countyPottawattamie -6.52 3.69 -1.76 0.0777
## 8 countyScott 1.84 1.09 1.69 0.0915
count(airquality, county)
## # A tibble: 8 × 2
## county n
## <chr> <int>
## 1 Black Hawk 405
## 2 Clinton 588
## 3 Johnson 348
## 4 Linn 371
## 5 Muscatine 1243
## 6 Polk 806
## 7 Pottawattamie 27
## 8 Scott 1033
There are two related effect sizes, cohen’s D and hedges g. Hedges g is related to cohen’s D by implementing a correction factor as cohen’s D tend to have a slight bias.
\[ d = \frac{\bar{Y}_{1} - \bar{Y}_{2}}{s_{within}} \]
where the numerator represents the raw mean difference between two independent groups. The denominator is the pooled standard deviation. The pooled standard deviation for two independent groups is:
\[ s_{within} = \sqrt{\frac{(n_{1} - 1) s^{2}_{1} + (n_{2} - 1) s^{2}_{2}}{n_{1} + n_{2} - 2}} \]
The correction factor to convert cohen’s d to hedges g is:
\[ J = 1 - \frac{3}{4 (n_{1} + n_{2} - 2) - 1} \]
Then, take
\[ g = J * d \]
In terms of regression coefficients, we can do this directly with the regression slopes that represent a mean difference.
\[ g = J * \frac{\hat{\beta_{1}}}{s_{within}} \]
broom::tidy(air_county)
## # A tibble: 8 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 38.7 0.923 42.0 0
## 2 countyClinton 3.56 1.20 2.97 0.00299
## 3 countyJohnson 4.37 1.36 3.21 0.00132
## 4 countyLinn 4.03 1.34 3.02 0.00256
## 5 countyMuscatine -1.48 1.06 -1.39 0.164
## 6 countyPolk -5.90 1.13 -5.21 0.000000197
## 7 countyPottawattamie -6.52 3.69 -1.76 0.0777
## 8 countyScott 1.84 1.09 1.69 0.0915
df_stats(daily_aqi ~ county, mean, sd, length, data = airquality)
## response county mean sd length
## 1 daily_aqi Black Hawk 38.73827 17.33495 405
## 2 daily_aqi Clinton 42.30272 20.06328 588
## 3 daily_aqi Johnson 43.10345 19.10914 348
## 4 daily_aqi Linn 42.76819 20.28354 371
## 5 daily_aqi Muscatine 37.25664 17.68327 1243
## 6 daily_aqi Polk 32.84243 18.57304 806
## 7 daily_aqi Pottawattamie 32.22222 13.75705 27
## 8 daily_aqi Scott 40.57696 18.52664 1033
pooled_sd <- sqrt(((405 - 1) * 17.3^2 + (348 - 1) * 19.1^2) / (405 + 348 - 2))
pooled_sd
## [1] 18.15389
round(4.365 / pooled_sd, 3)
## [1] 0.24
1 - (3 / (4 * (405 + 348 - 2) - 1))
## [1] 0.999001
round(0.999 * (4.365 / pooled_sd), 4)
## [1] 0.2402
When you have an ANCOVA type model, you would want to use the regression slopes which would represent adjusted means rather than the raw means.
air_county <- lm(daily_aqi ~ avg_wind_mc + max_wind + county, data = airquality)
broom::tidy(air_county)
## # A tibble: 10 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 29.0 1.77 16.4 8.95e-59
## 2 avg_wind_mc -4.29 0.303 -14.1 1.74e-44
## 3 max_wind 1.48 0.203 7.29 3.69e-13
## 4 countyClinton 0.203 1.17 0.174 8.62e- 1
## 5 countyJohnson 1.68 1.31 1.28 2.01e- 1
## 6 countyLinn 3.88 1.28 3.02 2.51e- 3
## 7 countyMuscatine -4.02 1.04 -3.88 1.07e- 4
## 8 countyPolk -2.83 1.09 -2.59 9.71e- 3
## 9 countyPottawattamie -9.37 3.54 -2.65 8.05e- 3
## 10 countyScott 0.862 1.04 0.827 4.08e- 1
round(0.999 * (1.679 / pooled_sd), 4)
## [1] 0.0924
For more details on effect sizes from regression models see:
Ariel M. Aloe, Christopher G. Thompson, Zhijiang Liu & Lifeng Lin (2022) Estimating Partial Standardized Mean Differences from Regression Models, The Journal of Experimental Education, 90:4, 898-915, DOI: 10.1080/00220973.2021.1966605