X = predictor, independent variable, exogenous variable a = path from X to M
M = mediator, intermediate variable, endogenous variable b = path from M to Y (controlling for X)
Y = outcome, endogenous variable c' = path from X to Y (controlling for M)
The effect of X is transmitted through M.
The effect of X is transmitted through M.
The effect of X depends on the level of Moderator.
Why even mention it?
Mediation analysis has similar assumptions to regression but they are somewhat more pronounced in mediation
Why even mention it?
Mediation analysis has similar assumptions to regression but they are somewhat more pronounced in mediation
Why even mention it?
Mediation analysis has similar assumptions to regression but they are somewhat more pronounced in mediation
These are (often) difficult to assess and correct
Two Main Principles:
Two Main Principles:
Example: If we are assessing religiosity (X) and heavy drinking behavior (Y), what are some variables that should be included?
If X is randomized (e.g., treatment or control), then statistical theory says no other variables are related to X.
But we cannot randomize M (at least in a single study) so even if we can get a causal relationship from X to M and X to Y, we cannot obtain causal M to Y.
Measurement error is always a problem (unless we use latent variable methods):
But can be more pronounced in mediation analysis
Difficult to know in many situations how extensive measurement error is
The other assumptions are all more like that of regression
Mediation uses a series of regressions and combines results to draw conclusions about the overall model
Ordinary Least Squares (OLS) and Generalized Linear Models (GLM) Regression
Structural Equation Modeling (SEM)
These are very related, but distinct, approaches
Here are two examples of OLS/GLM fitted mediation models
There are many, many others using SEM or Regression frameworks.
Mediation models provides lots of information:
Mediation models provides lots of information:
Estimate | What | |
---|---|---|
1 | Individual paths | a, b, and c' paths |
2 | Indirect Effect | a path estimates * b path estimates |
3 | Direct Effect | c' estimate |
4 | Total Effect | a * b + c' |
Many resources suggest ways of looking at this
c'
is significant or not (while a * b
is significant)I think it paints an incomplete picture of the model because:
A better approach is looking at the effect sizes -- how big is the indirect effect size compared to the direct or total effect sizes?
When the mediator and outcome are both continuous (an roughly normal), interpretation is straightforward
When the mediator and outcome are both continuous (an roughly normal), interpretation is straightforward
Example: If X is continuous, the indirect effect is 2.5, and the outcome is in quality of life rating, what is the interpretation?
When the mediator and outcome are both continuous (an roughly normal), interpretation is straightforward
Example: If X is continuous, the indirect effect is 2.5, and the outcome is in quality of life rating, what is the interpretation?
But what if mediator(s) and/or outcome(s) are categorical?
These generalize the regression framework to more data situations.
These generalize the regression framework to more data situations.
To do so:
Can use a different distribution 📊
Uses a link function ⛓
Examples: Logistic Regression, Poisson Regression
This presents a new challenge in interpreting the results
A few options:
Intepret the individual pathways and note the percent of mediation. This approach is commonly used in the literature.
New: Marginal Mediation Analysis. Is being prepared right now (shows serious promise for these situations).
Three Steps:
a
, b
, c'
and c
)c
to c'
as a proportion of c
-- c−c′c. This is a representation of how much of the total effect is mediated.Three Steps:
a
, b
, c'
and c
)c
to c'
as a proportion of c
-- c−c′c. This is a representation of how much of the total effect is mediated.I recommend two books to get more information about mediation topics
Statistical Mediation Analysis by MacKinnon
Introduction to Mediation, Moderation, and Conditional Process Analysis by Hayes
R
If you are not an R
user you can ignore the syntax but pay attention to the logic of it
We'll use a fake data set about two popular TV shows--The Office and Parks and Recreation.
Note: We'll be ignoring some assumptions (like the fact the data are nested).
Check for small cells, understand missingness
───────────────────────────────────────────────────── SubsUse No Yes P-Value n = 25 n = 7 ------------------- ----------- ----------- ------- Income 51.8 (16.0) 32.1 (17.5) 0.008 Productivity 3.5 (1.2) 1.6 (0.8) <.001 Physical_Health 5.4 (2.1) 4.0 (2.2) 0.145 Married: Yes 8 (32%) 1 (14.3%) 0.656 Race 0.558 White 20 (80%) 6 (85.7%) Black 2 (8%) 0 (0%) Mexican American 1 (4%) 1 (14.3%) Indian 2 (8%) 0 (0%) ─────────────────────────────────────────────────────
Check for high correlations (can cause multi-collinearity problems)
────────────────────────────────────────────────────── [1] [2] [3] [1]Income 1.00 [2]Productivity 0.573 (<.001) 1.00 [3]Physical_Health 0.609 (<.001) 0.516 (0.002) 1.00 ──────────────────────────────────────────────────────
library(lavaan)model = "prod1 ~ a*subsinco ~ b*prod1 + c1*subsind := a * bdir := c1tot := a * b + c1"fit_sem = sem(model, data = df)parameterEstimates(fit_sem)fitMeasures(fit_sem)
Parameter Estimates
lhs op rhs label est se z pvalue ci.lower ci.upper 1 prod1 ~ subs a -2.005 0.457 -4.386 0.000 -2.902 -1.109 2 inco ~ prod1 b 6.035 2.304 2.620 0.009 1.520 10.550 3 inco ~ subs c1 -7.869 7.614 -1.034 0.301 -22.791 7.053 4 prod1 ~~ prod1 1.153 0.284 4.062 0.000 0.597 1.710 5 inco ~~ inco 201.978 49.723 4.062 0.000 104.522 299.434 6 subs ~~ subs 0.167 0.000 NA NA 0.167 0.167 7 ind := a*b ind -12.103 5.382 -2.249 0.025 -22.651 -1.556 8 dir := c1 dir -7.869 7.614 -1.034 0.301 -22.791 7.053 9 tot := a*b+c1 tot -19.973 6.651 -3.003 0.003 -33.009 -6.936
Fit Statistics
npar fmin chisq df 5.000 0.000 0.000 0.000 pvalue baseline.chisq baseline.df baseline.pvalue NA 29.361 3.000 0.000 cfi tli nnfi rfi 1.000 1.000 1.000 1.000 nfi pnfi ifi rni 1.000 0.000 1.000 1.000 logl unrestricted.logl aic bic -183.589 -183.589 377.177 384.660 ntotal bic2 rmsea rmsea.ci.lower 33.000 369.064 0.000 0.000 rmsea.ci.upper rmsea.pvalue rmr rmr_nomean 0.000 NA 0.000 0.000 srmr srmr_bentler srmr_bentler_nomean crmr 0.000 0.000 0.000 0.000 crmr_nomean srmr_mplus srmr_mplus_nomean cn_05 0.000 0.000 0.000 1.000 cn_01 gfi agfi pgfi 1.000 1.000 1.000 0.000 mfi ecvi 1.000 0.303
library(MarginalMediation)patha = glm(prod1 ~ subs, data = df)pathbc = glm(inco ~ prod1 + subs, data = df)mma(pathbc, patha, ind_effects = c("subs-prod1"), boot = 500)
calculating a paths... b and c paths... Done.
┌───────────────────────────────┐ │ Marginal Mediation Analysis │ └───────────────────────────────┘ A marginal mediation model with: 1 mediators 1 indirect effects 1 direct effects 500 bootstrapped samples 95% confidence interval n = 33 Formulas: ◌ inco ~ prod1 + subs ◌ prod1 ~ subs Regression Models: inco ~ Est SE Est/SE P-Value (Intercept) 30.52837 9.12314 3.34626 0.00221 prod1 6.03508 2.41608 2.49788 0.01821 subs -7.86921 7.98516 -0.98548 0.33227 prod1 ~ Est SE Est/SE P-Value (Intercept) 3.57692 0.21730 16.46039 0.00000 subs -2.00549 0.47182 -4.25054 0.00018 Unstandardized Mediated Effects: Indirect Effects: inco ~ Indirect Lower Upper subs => prod1 -12.10332 -24.32765 -1.15705 Direct Effects: inco ~ Direct Lower Upper subs -7.86921 -26.49848 6.8156 Standardized Mediated Effects: Indirect Effects: inco ~ Indirect Lower Upper subs => prod1 -0.67622 -1.35919 -0.06464 Direct Effects: inco ~ Direct Lower Upper subs -0.43965 -1.48048 0.38079
Depends on type of model used but the basics:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |