Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Mediation Analysis

🤗

Tyson S. Barrett
StatStudio

Fall 2017

1 / 40

What is mediation analysis?

How do we interpret it?

Current Issues (and Solutions)

How do we use it? (in R)

3 / 40

What is Mediation Analysis?

X = predictor, independent variable, exogenous variable           a = path from X to M
M = mediator, intermediate variable, endogenous variable       b = path from M to Y (controlling for X)
Y = outcome, endogenous variable                                                   c' = path from X to Y (controlling for M)

4 / 40

Different than Moderation (Interactions)

Mediation Analysis Model

The effect of X is transmitted through M.

5 / 40

Different than Moderation (Interactions)

Mediation Analysis Model

The effect of X is transmitted through M.

Moderation (Interaction)

The effect of X depends on the level of Moderator.

5 / 40

Mediation Analysis?

  • Built on theory, prior literature, and other observations
  • Has similar assumptions to regression
  • Built up of 2+ regression models
  • Can combine with moderation
6 / 40

Mediation Analysis?

  • Built on theory, prior literature, and other observations
  • Has similar assumptions to regression
  • Built up of 2+ regression models
  • Can combine with moderation
  • Helps explain how X affects Y
  • Provides additional targets of intervention
  • Helps explain strange results (conflicting results)
  • Provides a more holistic view of the relationships
6 / 40

Built on Theory

Why even mention it?

Mediation analysis has similar assumptions to regression but they are somewhat more pronounced in mediation

7 / 40

Built on Theory

Why even mention it?

Mediation analysis has similar assumptions to regression but they are somewhat more pronounced in mediation

  • Well-behaved residuals (normality and homoskedasticity)
  • No omitted influences
  • No measurement error
  • Correct functional form
7 / 40

Built on Theory

Why even mention it?

Mediation analysis has similar assumptions to regression but they are somewhat more pronounced in mediation

  • Well-behaved residuals (normality and homoskedasticity)
  • No omitted influences
  • No measurement error
  • Correct functional form

These are (often) difficult to assess and correct

7 / 40

No Omitted Influences

Two Main Principles:

  1. If something is related to X (or M) and Y, it needs to be in the path b and c' model
  2. If something is related to X and M, it needs to be in the path a model
8 / 40

No Omitted Influences

Two Main Principles:

  1. If something is related to X (or M) and Y, it needs to be in the path b and c' model
  2. If something is related to X and M, it needs to be in the path a model

Example: If we are assessing religiosity (X) and heavy drinking behavior (Y), what are some variables that should be included?

8 / 40

Slight Caveat

If X is randomized (e.g., treatment or control), then statistical theory says no other variables are related to X.

But we cannot randomize M (at least in a single study) so even if we can get a causal relationship from X to M and X to Y, we cannot obtain causal M to Y.

9 / 40

No Measurement Error

Measurement error is always a problem (unless we use latent variable methods):

  • But can be more pronounced in mediation analysis

    • If M has measurement error, it not only messes with M's estimate but also X's estimate
  • Difficult to know in many situations how extensive measurement error is

10 / 40

The other assumptions are all more like that of regression

Quick Review of Linear Regression

11 / 40

Mediation is 2+ Regression Models

Mediation uses a series of regressions and combines results to draw conclusions about the overall model

12 / 40

Break Time

Take a short break but be thinking about mediation models you have seen in your field

13 / 40

Mediation Frameworks

  1. Ordinary Least Squares (OLS) and Generalized Linear Models (GLM) Regression

  2. Structural Equation Modeling (SEM)

These are very related, but distinct, approaches

14 / 40

Two Frameworks

OLS/GLM Regression

  • Multiple regressions, fit separately and then combined
  • Provides great flexibility (assumptions are lighter)
  • Provides model fit for each sub-model but not the entire model
  • Continuous, binary, categorical, count, proportion, and other variable types
15 / 40

Two Frameworks

OLS/GLM Regression

  • Multiple regressions, fit separately and then combined
  • Provides great flexibility (assumptions are lighter)
  • Provides model fit for each sub-model but not the entire model
  • Continuous, binary, categorical, count, proportion, and other variable types

SEM

  • Multiple regressions fit simultaneously
  • More restrictive assumptions
  • Provides more information regarding overall model fit
  • Mostly continuous variables (can handle binary, ordinal in some cases)
15 / 40

Here are two examples of OLS/GLM fitted mediation models

16 / 40

There are many, many others using SEM or Regression frameworks.

Interpretation of Mediation

17 / 40

Interpretation of Mediation

Mediation models provides lots of information:

  1. Individual path estimates
  2. Indirect Effect estimates
  3. Direct Effect estimates
  4. Total Effect estimates
18 / 40

Interpretation of Mediation

Mediation models provides lots of information:

19 / 40

Complete or Partial Mediation?

Many resources suggest ways of looking at this

  • I recommend not focusing on this but feel free to check
  • Its based on whether c' is significant or not (while a * b is significant)

I think it paints an incomplete picture of the model because:

  1. It only focuses on significance, not effect size
  2. To really make this conclusion, we need really large sample sizes
  3. It is almost always "partial" mediation
20 / 40

A better approach is looking at the effect sizes -- how big is the indirect effect size compared to the direct or total effect sizes?

Continuous Mediators and Outcomes

When the mediator and outcome are both continuous (an roughly normal), interpretation is straightforward

  1. Paths are in terms of the corresponding endogeous variable's units
  2. Indirect effects are in the outcome's units
  3. Direct effects are in the outcome's units
  4. Total effect is in the outcome's units
21 / 40

Continuous Mediators and Outcomes

When the mediator and outcome are both continuous (an roughly normal), interpretation is straightforward

  1. Paths are in terms of the corresponding endogeous variable's units
  2. Indirect effects are in the outcome's units
  3. Direct effects are in the outcome's units
  4. Total effect is in the outcome's units

Example: If X is continuous, the indirect effect is 2.5, and the outcome is in quality of life rating, what is the interpretation?

21 / 40

Continuous Mediators and Outcomes

When the mediator and outcome are both continuous (an roughly normal), interpretation is straightforward

  1. Paths are in terms of the corresponding endogeous variable's units
  2. Indirect effects are in the outcome's units
  3. Direct effects are in the outcome's units
  4. Total effect is in the outcome's units

Example: If X is continuous, the indirect effect is 2.5, and the outcome is in quality of life rating, what is the interpretation?

But what if mediator(s) and/or outcome(s) are categorical?

21 / 40

Mediation Analysis with Categorical Variables

22 / 40

Generalized Linear Models

These generalize the regression framework to more data situations.

23 / 40

Generalized Linear Models

These generalize the regression framework to more data situations.

To do so:

  1. Can use a different distribution 📊

  2. Uses a link function ⛓

Examples: Logistic Regression, Poisson Regression

23 / 40

Use GLMs with Mediation Analysis

24 / 40

But...

This presents a new challenge in interpreting the results

Interpretation with Categorical Mediator/Outcome

A few options:

  1. Intepret the individual pathways and note the percent of mediation. This approach is commonly used in the literature.

  2. New: Marginal Mediation Analysis. Is being prepared right now (shows serious promise for these situations).

25 / 40

Interpret Individual Pathways

Three Steps:

  1. Fit individual GLM regressions for all pathways (a, b, c' and c)
  2. Discuss basic effect size information for each pathway
  3. Evaluate the change from c to c' as a proportion of c -- ccc. This is a representation of how much of the total effect is mediated.
26 / 40

Interpret Individual Pathways

Three Steps:

  1. Fit individual GLM regressions for all pathways (a, b, c' and c)
  2. Discuss basic effect size information for each pathway
  3. Evaluate the change from c to c' as a proportion of c -- ccc. This is a representation of how much of the total effect is mediated.

Marginal Mediation Analysis

  • Uses Average Marginal Effects
  • Interpretation and steps for use are exactly like mediation with continuous mediators/outcomes (can interpet individual paths, indirect and direct effect sizes)
  • Uses bootstrapping to get confidence intervals (recommended in most situations)
26 / 40

How to use it?

27 / 40

Before Talking About Syntax

I recommend two books to get more information about mediation topics

  1. Statistical Mediation Analysis by MacKinnon

  2. Introduction to Mediation, Moderation, and Conditional Process Analysis by Hayes

28 / 40

Break Time

If you do not care about learning how to do these analyses in R then feel free to take off (thanks for attending 😄)

29 / 40

Mediation Analysis in R

If you are not an R user you can ignore the syntax but pay attention to the logic of it

We'll use a fake data set about two popular TV shows--The Office and Parks and Recreation.


Note: We'll be ignoring some assumptions (like the fact the data are nested).

30 / 40

Dataset

31 / 40

Start with Cross-Tabulations

Check for small cells, understand missingness

─────────────────────────────────────────────────────
SubsUse
No Yes P-Value
n = 25 n = 7
------------------- ----------- ----------- -------
Income 51.8 (16.0) 32.1 (17.5) 0.008
Productivity 3.5 (1.2) 1.6 (0.8) <.001
Physical_Health 5.4 (2.1) 4.0 (2.2) 0.145
Married: Yes 8 (32%) 1 (14.3%) 0.656
Race 0.558
White 20 (80%) 6 (85.7%)
Black 2 (8%) 0 (0%)
Mexican American 1 (4%) 1 (14.3%)
Indian 2 (8%) 0 (0%)
─────────────────────────────────────────────────────
32 / 40

And with Correlations

Check for high correlations (can cause multi-collinearity problems)

──────────────────────────────────────────────────────
[1] [2] [3]
[1]Income 1.00
[2]Productivity 0.573 (<.001) 1.00
[3]Physical_Health 0.609 (<.001) 0.516 (0.002) 1.00
──────────────────────────────────────────────────────
33 / 40

SEM Framework

library(lavaan)
model = "
prod1 ~ a*subs
inco ~ b*prod1 + c1*subs
ind := a * b
dir := c1
tot := a * b + c1"
fit_sem = sem(model, data = df)
parameterEstimates(fit_sem)
fitMeasures(fit_sem)
34 / 40
Parameter Estimates
lhs op rhs label est se z pvalue ci.lower ci.upper
1 prod1 ~ subs a -2.005 0.457 -4.386 0.000 -2.902 -1.109
2 inco ~ prod1 b 6.035 2.304 2.620 0.009 1.520 10.550
3 inco ~ subs c1 -7.869 7.614 -1.034 0.301 -22.791 7.053
4 prod1 ~~ prod1 1.153 0.284 4.062 0.000 0.597 1.710
5 inco ~~ inco 201.978 49.723 4.062 0.000 104.522 299.434
6 subs ~~ subs 0.167 0.000 NA NA 0.167 0.167
7 ind := a*b ind -12.103 5.382 -2.249 0.025 -22.651 -1.556
8 dir := c1 dir -7.869 7.614 -1.034 0.301 -22.791 7.053
9 tot := a*b+c1 tot -19.973 6.651 -3.003 0.003 -33.009 -6.936
Fit Statistics
npar fmin chisq df
5.000 0.000 0.000 0.000
pvalue baseline.chisq baseline.df baseline.pvalue
NA 29.361 3.000 0.000
cfi tli nnfi rfi
1.000 1.000 1.000 1.000
nfi pnfi ifi rni
1.000 0.000 1.000 1.000
logl unrestricted.logl aic bic
-183.589 -183.589 377.177 384.660
ntotal bic2 rmsea rmsea.ci.lower
33.000 369.064 0.000 0.000
rmsea.ci.upper rmsea.pvalue rmr rmr_nomean
0.000 NA 0.000 0.000
srmr srmr_bentler srmr_bentler_nomean crmr
0.000 0.000 0.000 0.000
crmr_nomean srmr_mplus srmr_mplus_nomean cn_05
0.000 0.000 0.000 1.000
cn_01 gfi agfi pgfi
1.000 1.000 1.000 0.000
mfi ecvi
1.000 0.303
35 / 40

OLS Framework (Using Marginal Mediation Analysis)

library(MarginalMediation)
patha = glm(prod1 ~ subs, data = df)
pathbc = glm(inco ~ prod1 + subs, data = df)
mma(pathbc, patha,
ind_effects = c("subs-prod1"),
boot = 500)
36 / 40
calculating a paths... b and c paths... Done.
┌───────────────────────────────┐
│ Marginal Mediation Analysis │
└───────────────────────────────┘
A marginal mediation model with:
1 mediators
1 indirect effects
1 direct effects
500 bootstrapped samples
95% confidence interval
n = 33
Formulas:
◌ inco ~ prod1 + subs
◌ prod1 ~ subs
Regression Models:
inco ~
Est SE Est/SE P-Value
(Intercept) 30.52837 9.12314 3.34626 0.00221
prod1 6.03508 2.41608 2.49788 0.01821
subs -7.86921 7.98516 -0.98548 0.33227
prod1 ~
Est SE Est/SE P-Value
(Intercept) 3.57692 0.21730 16.46039 0.00000
subs -2.00549 0.47182 -4.25054 0.00018
Unstandardized Mediated Effects:
Indirect Effects:
inco ~
Indirect Lower Upper
subs => prod1 -12.10332 -24.32765 -1.15705
Direct Effects:
inco ~
Direct Lower Upper
subs -7.86921 -26.49848 6.8156
Standardized Mediated Effects:
Indirect Effects:
inco ~
Indirect Lower Upper
subs => prod1 -0.67622 -1.35919 -0.06464
Direct Effects:
inco ~
Direct Lower Upper
subs -0.43965 -1.48048 0.38079
37 / 40

Some Final Considerations

38 / 40

Diagnostics

Depends on type of model used but the basics:

  • Model fit (BIC, Chi-Square, R-Squared)
  • Multi-collinearity
  • Prediction Accuracy
39 / 40

Questions?

40 / 40
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow