Effect Sizes

Effect Sizes

An effect size is the magnitude of the relationship between two attributes. This magnitude can be reported and interpreted on many different metrics, many of which that we have already considered, some of which we have not. The term effect size is often miscontrued to mean a standardized magnitude of relationship, but this is not needed, but can be helpful in some situations.

As a slight aside, there is a field, called meta-analysis that is used to statistically combine effect sizes across multiple studies conducted on a similar topic. The goal of these studies is to understand the aggregate effect size (i.e., average effect size), but also if there is any heterogeneity in those effect sizes.

Simple Effect Size - slopes

To understand the magnitude of the effect, the slope estimates can be an unstandardized effect size measure. This is true for both continuous and categorical predictor attributes. For the continuous attributes it would represent a true slope, but for a categorical predictors it would represent a mean difference going from the reference group to the focal group.

Standardized effect sizes

Standardized effect sizes can also be considered, these can be helpful to compare across studies where the sample is different or is from a different population. The standardization usually comes from taking the standard deviations into account, this then turns the slope into a metric that is interpreted in terms of standard deviation units. We discussed this earlier in the semester with standardized regression slopes.

Continuous predictor attributes

We can also use this formula to convert any unstandardized regression coefficients into a standardized metric.

\[ b^{'}_{k} = b_{k} * \frac{s_{x_{k}}}{s_{y}} \]

Let’s do an example as a reminder.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggformula)
## Loading required package: ggstance
## 
## Attaching package: 'ggstance'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     geom_errorbarh, GeomErrorbarh
## 
## Loading required package: scales
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
## 
## Loading required package: ggridges
## 
## New to ggformula?  Try the tutorials: 
##  learnr::run_tutorial("introduction", package = "ggformula")
##  learnr::run_tutorial("refining", package = "ggformula")
library(mosaic)
## Registered S3 method overwritten by 'mosaic':
##   method                           from   
##   fortify.SpatialPolygonsDataFrame ggplot2
## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.
## 
## Attaching package: 'mosaic'
## 
## The following object is masked from 'package:Matrix':
## 
##     mean
## 
## The following object is masked from 'package:scales':
## 
##     rescale
## 
## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally
## 
## The following object is masked from 'package:purrr':
## 
##     cross
## 
## The following object is masked from 'package:ggplot2':
## 
##     stat
## 
## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var
## 
## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum
theme_set(theme_bw(base_size = 18))

airquality <- readr::read_csv("https://raw.githubusercontent.com/lebebr01/psqf_6243/main/data/iowa_air_quality_2021.csv")
## Rows: 6917 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): date, site_name, aqs_parameter_desc, cbsa_name, county
## dbl (5): id, poc, pm2.5, daily_aqi, cbsa_code
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
wind <- readr::read_csv("https://raw.githubusercontent.com/lebebr01/psqf_6243/main/data/daily_WIND_2021-iowa.csv")
## Rows: 1537 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): date, cbsa_name
## dbl (3): avg_wind, max_wind, max_wind_hours
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
airquality <- airquality |>
   left_join(wind, by = c('cbsa_name', 'date')) |> 
   drop_na() |>
   mutate(avg_wind_mc = avg_wind - mean(avg_wind))
head(airquality)
## # A tibble: 6 × 14
##   date           id   poc pm2.5 daily_aqi site_…¹ aqs_p…² cbsa_…³ cbsa_…⁴ county
##   <chr>       <dbl> <dbl> <dbl>     <dbl> <chr>   <chr>     <dbl> <chr>   <chr> 
## 1 1/1/21  190130009     1  15.1        57 Water … PM2.5 …   47940 Waterl… Black…
## 2 1/4/21  190130009     1  13.3        54 Water … PM2.5 …   47940 Waterl… Black…
## 3 1/7/21  190130009     1  20.5        69 Water … PM2.5 …   47940 Waterl… Black…
## 4 1/10/21 190130009     1  14.3        56 Water … PM2.5 …   47940 Waterl… Black…
## 5 1/13/21 190130009     1  13.7        54 Water … PM2.5 …   47940 Waterl… Black…
## 6 1/16/21 190130009     1   5.3        22 Water … PM2.5 …   47940 Waterl… Black…
## # … with 4 more variables: avg_wind <dbl>, max_wind <dbl>,
## #   max_wind_hours <dbl>, avg_wind_mc <dbl>, and abbreviated variable names
## #   ¹​site_name, ²​aqs_parameter_desc, ³​cbsa_code, ⁴​cbsa_name
air_lm_mc <- lm(daily_aqi ~ avg_wind_mc + max_wind, data = airquality)

broom::tidy(air_lm_mc)
## # A tibble: 3 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    27.4      1.47      18.6  2.80e-74
## 2 avg_wind_mc    -4.43     0.300    -14.8  3.31e-48
## 3 max_wind        1.58     0.201      7.87 4.52e-15

There is a package called QuantPsyc that has a convenience function to compute standardized regression coefficients, named lm.beta().

QuantPsyc::lm.beta(air_lm_mc) |> round(3)
## avg_wind_mc    max_wind 
##      -0.584       0.312

Categorical Predictor Attributes

When including categorical attributes, the slope represents a mean difference between the reference group to the focal group.

air_county <- lm(daily_aqi ~ county, data = airquality)

broom::tidy(air_county)
## # A tibble: 8 × 5
##   term                estimate std.error statistic     p.value
##   <chr>                  <dbl>     <dbl>     <dbl>       <dbl>
## 1 (Intercept)            38.7      0.923     42.0  0          
## 2 countyClinton           3.56     1.20       2.97 0.00299    
## 3 countyJohnson           4.37     1.36       3.21 0.00132    
## 4 countyLinn              4.03     1.34       3.02 0.00256    
## 5 countyMuscatine        -1.48     1.06      -1.39 0.164      
## 6 countyPolk             -5.90     1.13      -5.21 0.000000197
## 7 countyPottawattamie    -6.52     3.69      -1.76 0.0777     
## 8 countyScott             1.84     1.09       1.69 0.0915
count(airquality, county)
## # A tibble: 8 × 2
##   county            n
##   <chr>         <int>
## 1 Black Hawk      405
## 2 Clinton         588
## 3 Johnson         348
## 4 Linn            371
## 5 Muscatine      1243
## 6 Polk            806
## 7 Pottawattamie    27
## 8 Scott          1033

There are two related effect sizes, cohen’s D and hedges g. Hedges g is related to cohen’s D by implementing a correction factor as cohen’s D tend to have a slight bias.

\[ d = \frac{\bar{Y}_{1} - \bar{Y}_{2}}{s_{within}} \]

where the numerator represents the raw mean difference between two independent groups. The denominator is the pooled standard deviation. The pooled standard deviation for two independent groups is:

\[ s_{within} = \sqrt{\frac{(n_{1} - 1) s^{2}_{1} + (n_{2} - 1) s^{2}_{2}}{n_{1} + n_{2} - 2}} \]

The correction factor to convert cohen’s d to hedges g is:

\[ J = 1 - \frac{3}{4 (n_{1} + n_{2} - 2) - 1} \]

Then, take

\[ g = J * d \]

In terms of regression coefficients, we can do this directly with the regression slopes that represent a mean difference.

\[ g = J * \frac{\hat{\beta_{1}}}{s_{within}} \]

broom::tidy(air_county)
## # A tibble: 8 × 5
##   term                estimate std.error statistic     p.value
##   <chr>                  <dbl>     <dbl>     <dbl>       <dbl>
## 1 (Intercept)            38.7      0.923     42.0  0          
## 2 countyClinton           3.56     1.20       2.97 0.00299    
## 3 countyJohnson           4.37     1.36       3.21 0.00132    
## 4 countyLinn              4.03     1.34       3.02 0.00256    
## 5 countyMuscatine        -1.48     1.06      -1.39 0.164      
## 6 countyPolk             -5.90     1.13      -5.21 0.000000197
## 7 countyPottawattamie    -6.52     3.69      -1.76 0.0777     
## 8 countyScott             1.84     1.09       1.69 0.0915
df_stats(daily_aqi ~ county, mean, sd, length, data = airquality)
##    response        county     mean       sd length
## 1 daily_aqi    Black Hawk 38.73827 17.33495    405
## 2 daily_aqi       Clinton 42.30272 20.06328    588
## 3 daily_aqi       Johnson 43.10345 19.10914    348
## 4 daily_aqi          Linn 42.76819 20.28354    371
## 5 daily_aqi     Muscatine 37.25664 17.68327   1243
## 6 daily_aqi          Polk 32.84243 18.57304    806
## 7 daily_aqi Pottawattamie 32.22222 13.75705     27
## 8 daily_aqi         Scott 40.57696 18.52664   1033
pooled_sd <- sqrt(((405 - 1) * 17.3^2 + (348 - 1) * 19.1^2) / (405 + 348 - 2))
pooled_sd
## [1] 18.15389
round(4.365 / pooled_sd, 3)
## [1] 0.24
1 - (3 / (4 * (405 + 348 - 2) - 1))
## [1] 0.999001
round(0.999 * (4.365 / pooled_sd), 4)
## [1] 0.2402

When you have an ANCOVA type model, you would want to use the regression slopes which would represent adjusted means rather than the raw means.

air_county <- lm(daily_aqi ~ avg_wind_mc + max_wind + county, data = airquality)

broom::tidy(air_county)
## # A tibble: 10 × 5
##    term                estimate std.error statistic  p.value
##    <chr>                  <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)           29.0       1.77     16.4   8.95e-59
##  2 avg_wind_mc           -4.29      0.303   -14.1   1.74e-44
##  3 max_wind               1.48      0.203     7.29  3.69e-13
##  4 countyClinton          0.203     1.17      0.174 8.62e- 1
##  5 countyJohnson          1.68      1.31      1.28  2.01e- 1
##  6 countyLinn             3.88      1.28      3.02  2.51e- 3
##  7 countyMuscatine       -4.02      1.04     -3.88  1.07e- 4
##  8 countyPolk            -2.83      1.09     -2.59  9.71e- 3
##  9 countyPottawattamie   -9.37      3.54     -2.65  8.05e- 3
## 10 countyScott            0.862     1.04      0.827 4.08e- 1
round(0.999 * (1.679 / pooled_sd), 4)
## [1] 0.0924

For more details on effect sizes from regression models see:

Ariel M. Aloe, Christopher G. Thompson, Zhijiang Liu & Lifeng Lin (2022) Estimating Partial Standardized Mean Differences from Regression Models, The Journal of Experimental Education, 90:4, 898-915, DOI: 10.1080/00220973.2021.1966605

Previous