Mediation Analysis🤗Tyson S. Barrett 
 StatStudio 
 
 Fall 20171 / 40

What is Real?

https://www.youtube.com/watch?v=ym6NEuUYHuE

2 / 40

What is mediation analysis?How do we interpret it?Current Issues (and Solutions)How do we use it? (in R)3 / 40

What is Mediation Analysis?

X = predictor, independent variable, exogenous variable           a = path from X to M
M = mediator, intermediate variable, endogenous variable     b = path from M to Y (controlling for X)
Y = outcome, endogenous variable                                                   c' = path from X to Y (controlling for M)

4 / 40

Different than Moderation (Interactions)

Mediation Analysis Model

The effect of X is transmitted through M.

5 / 40

Different than Moderation (Interactions)

Mediation Analysis Model

The effect of X is transmitted through M.

Moderation (Interaction)

The effect of X depends on the level of Moderator.

5 / 40

Mediation Analysis?Built on theory, prior literature, and other observations
Has similar assumptions to regression
Built up of 2+ regression models
Can combine with moderation

6 / 40

Mediation Analysis?Built on theory, prior literature, and other observations
Has similar assumptions to regression
Built up of 2+ regression models
Can combine with moderation

Helps explain how X affects Y
Provides additional targets of intervention
Helps explain strange results (conflicting results)
Provides a more holistic view of the relationships

6 / 40

Built on Theory

Why even mention it?

Mediation analysis has similar assumptions to regression but they are somewhat more pronounced in mediation

7 / 40

Built on Theory

Why even mention it?

Mediation analysis has similar assumptions to regression but they are somewhat more pronounced in mediation

Well-behaved residuals (normality and homoskedasticity)
No omitted influences

No measurement error
Correct functional form

7 / 40

Built on Theory

Why even mention it?

Mediation analysis has similar assumptions to regression but they are somewhat more pronounced in mediation

Well-behaved residuals (normality and homoskedasticity)
No omitted influences

No measurement error
Correct functional form

These are (often) difficult to assess and correct

7 / 40

No Omitted Influences

Two Main Principles:

If something is related to X (or M) and Y, it needs to be in the path b and c' model
If something is related to X and M, it needs to be in the path a model

8 / 40

No Omitted Influences

Two Main Principles:

If something is related to X (or M) and Y, it needs to be in the path b and c' model
If something is related to X and M, it needs to be in the path a model

Example: If we are assessing religiosity (X) and heavy drinking behavior (Y), what are some variables that should be included?

8 / 40

Slight Caveat

If X is randomized (e.g., treatment or control), then statistical theory says no other variables are related to X.

But we cannot randomize M (at least in a single study) so even if we can get a causal relationship from X to M and X to Y, we cannot obtain causal M to Y.

9 / 40

No Measurement Error

Measurement error is always a problem (unless we use latent variable methods):

But can be more pronounced in mediation analysis
- If M has measurement error, it not only messes with M's estimate but also X's estimate
Difficult to know in many situations how extensive measurement error is

10 / 40

The other assumptions are all more like that of regression

Quick Review of Linear Regression

11 / 40

Mediation is 2+ Regression Models

Mediation uses a series of regressions and combines results to draw conclusions about the overall model

12 / 40

Break TimeTake a short break but be thinking about mediation models you have seen in your field13 / 40

Mediation Frameworks

Ordinary Least Squares (OLS) and Generalized Linear Models (GLM) Regression
Structural Equation Modeling (SEM)

These are very related, but distinct, approaches

14 / 40

Two FrameworksOLS/GLM RegressionMultiple regressions, fit separately and then combined
Provides great flexibility (assumptions are lighter)
Provides model fit for each sub-model but not the entire model
Continuous, binary, categorical, count, proportion, and other variable types

15 / 40

Two FrameworksOLS/GLM RegressionMultiple regressions, fit separately and then combined
Provides great flexibility (assumptions are lighter)
Provides model fit for each sub-model but not the entire model
Continuous, binary, categorical, count, proportion, and other variable types

SEMMultiple regressions fit simultaneously
More restrictive assumptions
Provides more information regarding overall model fit
Mostly continuous variables (can handle binary, ordinal in some cases)

15 / 40

Here are two examples of OLS/GLM fitted mediation models

16 / 40

There are many, many others using SEM or Regression frameworks.

Interpretation of Mediation17 / 40

Interpretation of Mediation

Mediation models provides lots of information:

Individual path estimates
Indirect Effect estimates
Direct Effect estimates
Total Effect estimates

18 / 40

Interpretation of Mediation

Mediation models provides lots of information:

	Estimate	What
1	Individual paths	a, b, and c' paths
2	Indirect Effect	a path estimates * b path estimates
3	Direct Effect	c' estimate
4	Total Effect	a * b + c'

19 / 40

Complete or Partial Mediation?

Many resources suggest ways of looking at this

I recommend not focusing on this but feel free to check
Its based on whether c' is significant or not (while a * b is significant)

I think it paints an incomplete picture of the model because:

It only focuses on significance, not effect size
To really make this conclusion, we need really large sample sizes
It is almost always "partial" mediation

20 / 40

A better approach is looking at the effect sizes -- how big is the indirect effect size compared to the direct or total effect sizes?

Continuous Mediators and Outcomes

When the mediator and outcome are both continuous (an roughly normal), interpretation is straightforward

Paths are in terms of the corresponding endogeous variable's units
Indirect effects are in the outcome's units
Direct effects are in the outcome's units
Total effect is in the outcome's units

21 / 40

Continuous Mediators and Outcomes

When the mediator and outcome are both continuous (an roughly normal), interpretation is straightforward

Paths are in terms of the corresponding endogeous variable's units
Indirect effects are in the outcome's units
Direct effects are in the outcome's units
Total effect is in the outcome's units

Example: If X is continuous, the indirect effect is 2.5, and the outcome is in quality of life rating, what is the interpretation?

21 / 40

Continuous Mediators and Outcomes

When the mediator and outcome are both continuous (an roughly normal), interpretation is straightforward

Paths are in terms of the corresponding endogeous variable's units
Indirect effects are in the outcome's units
Direct effects are in the outcome's units
Total effect is in the outcome's units

Example: If X is continuous, the indirect effect is 2.5, and the outcome is in quality of life rating, what is the interpretation?

But what if mediator(s) and/or outcome(s) are categorical?

21 / 40

Mediation Analysis with Categorical Variables22 / 40

Generalized Linear Models

These generalize the regression framework to more data situations.

23 / 40

Generalized Linear Models

These generalize the regression framework to more data situations.

To do so:

Can use a different distribution 📊
Uses a link function ⛓

Examples: Logistic Regression, Poisson Regression

23 / 40

Use GLMs with Mediation Analysis24 / 40

But...

This presents a new challenge in interpreting the results

Interpretation with Categorical Mediator/Outcome

A few options:

Intepret the individual pathways and note the percent of mediation. This approach is commonly used in the literature.
New: Marginal Mediation Analysis. Is being prepared right now (shows serious promise for these situations).

25 / 40

Interpret Individual Pathways

Three Steps:

Fit individual GLM regressions for all pathways (a, b, c' and c)
Discuss basic effect size information for each pathway
Evaluate the change from c to c' as a proportion of c -- $\frac{c-c'}{c}$ . This is a representation of how much of the total effect is mediated.

26 / 40

Interpret Individual Pathways

Three Steps:

Fit individual GLM regressions for all pathways (a, b, c' and c)
Discuss basic effect size information for each pathway
Evaluate the change from c to c' as a proportion of c -- $\frac{c-c'}{c}$ . This is a representation of how much of the total effect is mediated.

Marginal Mediation Analysis

Uses Average Marginal Effects
Interpretation and steps for use are exactly like mediation with continuous mediators/outcomes (can interpet individual paths, indirect and direct effect sizes)
Uses bootstrapping to get confidence intervals (recommended in most situations)

26 / 40

How to use it?27 / 40

Before Talking About Syntax

I recommend two books to get more information about mediation topics

Statistical Mediation Analysis by MacKinnon
Introduction to Mediation, Moderation, and Conditional Process Analysis by Hayes

28 / 40

Break TimeIf you do not care about learning how to do these analyses in R then feel free to take off (thanks for attending 😄)29 / 40

Mediation Analysis in `R`

If you are not an R user you can ignore the syntax but pay attention to the logic of it

We'll use a fake data set about two popular TV shows--The Office and Parks and Recreation.

Note: We'll be ignoring some assumptions (like the fact the data are nested).

30 / 40

Dataset

	nam	prod1	ment1	phys	marr	gend	race	inco	chil	subs	alco	spor
1	Michael	2	3	8	0	0	White	55	0	1	1	1
2	Pam	3	8	7	1	1	White	35	2	0	1	1
3	Jim	3	8	8	1	0	White	70	2	0	1	1
4	Dwight	5	6	8	0	0	White	70	0	0	1	0
5	Stanley	4	7	4	1	0	Black	70	1	0	1	0

31 / 40

Start with Cross-Tabulations

Check for small cells, understand missingness


                      ─────────────────────────────────────────────────────
                                                 SubsUse 
                                           No          Yes         P-Value
                                           n = 25      n = 7              
                       ------------------- ----------- ----------- -------
                       Income              51.8 (16.0) 32.1 (17.5) 0.008  
                       Productivity        3.5 (1.2)   1.6 (0.8)   <.001  
                       Physical_Health     5.4 (2.1)   4.0 (2.2)   0.145  
                       Married: Yes        8 (32%)     1 (14.3%)   0.656  
                       Race                                        0.558  
                          White            20 (80%)    6 (85.7%)          
                          Black            2 (8%)      0 (0%)             
                          Mexican American 1 (4%)      1 (14.3%)          
                          Indian           2 (8%)      0 (0%)             
                      ─────────────────────────────────────────────────────

32 / 40

And with Correlations

Check for high correlations (can cause multi-collinearity problems)


                      ──────────────────────────────────────────────────────
                                          [1]           [2]           [3]  
                       [1]Income          1.00                             
                       [2]Productivity    0.573 (<.001) 1.00               
                       [3]Physical_Health 0.609 (<.001) 0.516 (0.002) 1.00 
                      ──────────────────────────────────────────────────────

33 / 40

SEM Frameworklibrary(lavaan)
model = "
prod1 ~ a*subs
inco ~ b*prod1 + c1*subs
ind := a * b
dir := c1
tot := a * b + c1"
fit_sem = sem(model, data = df)
parameterEstimates(fit_sem)
fitMeasures(fit_sem)

34 / 40

        Parameter Estimates

            lhs op    rhs label     est     se      z pvalue ci.lower ci.upper
        1 prod1  ~   subs     a  -2.005  0.457 -4.386  0.000   -2.902   -1.109
        2  inco  ~  prod1     b   6.035  2.304  2.620  0.009    1.520   10.550
        3  inco  ~   subs    c1  -7.869  7.614 -1.034  0.301  -22.791    7.053
        4 prod1 ~~  prod1         1.153  0.284  4.062  0.000    0.597    1.710
        5  inco ~~   inco       201.978 49.723  4.062  0.000  104.522  299.434
        6  subs ~~   subs         0.167  0.000     NA     NA    0.167    0.167
        7   ind :=    a*b   ind -12.103  5.382 -2.249  0.025  -22.651   -1.556
        8   dir :=     c1   dir  -7.869  7.614 -1.034  0.301  -22.791    7.053
        9   tot := a*b+c1   tot -19.973  6.651 -3.003  0.003  -33.009   -6.936

        Fit Statistics

                       npar                fmin               chisq                  df 
                      5.000               0.000               0.000               0.000 
                     pvalue      baseline.chisq         baseline.df     baseline.pvalue 
                         NA              29.361               3.000               0.000 
                        cfi                 tli                nnfi                 rfi 
                      1.000               1.000               1.000               1.000 
                        nfi                pnfi                 ifi                 rni 
                      1.000               0.000               1.000               1.000 
                       logl   unrestricted.logl                 aic                 bic 
                   -183.589            -183.589             377.177             384.660 
                     ntotal                bic2               rmsea      rmsea.ci.lower 
                     33.000             369.064               0.000               0.000 
             rmsea.ci.upper        rmsea.pvalue                 rmr          rmr_nomean 
                      0.000                  NA               0.000               0.000 
                       srmr        srmr_bentler srmr_bentler_nomean                crmr 
                      0.000               0.000               0.000               0.000 
                crmr_nomean          srmr_mplus   srmr_mplus_nomean               cn_05 
                      0.000               0.000               0.000               1.000 
                      cn_01                 gfi                agfi                pgfi 
                      1.000               1.000               1.000               0.000 
                        mfi                ecvi 
                      1.000               0.303

35 / 40

OLS Framework (Using Marginal Mediation Analysis)library(MarginalMediation)
patha  = glm(prod1 ~ subs, data = df)
pathbc = glm(inco ~ prod1 + subs, data = df)
mma(pathbc, patha,
    ind_effects = c("subs-prod1"),
    boot = 500)

36 / 40


      calculating a paths... b and c paths... Done.

      ┌───────────────────────────────┐
      │  Marginal Mediation Analysis  │
      └───────────────────────────────┘
      A marginal mediation model with:
         1 mediators
         1 indirect effects
         1 direct effects
         500 bootstrapped samples
         95% confidence interval
         n = 33 
      Formulas:
         ◌ inco ~ prod1 + subs
         ◌ prod1 ~ subs 
      Regression Models: 
           inco ~ 
                               Est      SE   Est/SE P-Value
              (Intercept) 30.52837 9.12314  3.34626 0.00221
              prod1        6.03508 2.41608  2.49788 0.01821
              subs        -7.86921 7.98516 -0.98548 0.33227
           prod1 ~ 
                               Est      SE   Est/SE P-Value
              (Intercept)  3.57692 0.21730 16.46039 0.00000
              subs        -2.00549 0.47182 -4.25054 0.00018
      Unstandardized Mediated Effects: 
         Indirect Effects: 
           inco ~ 
                             Indirect     Lower    Upper
              subs => prod1 -12.10332 -24.32765 -1.15705
         Direct Effects: 
           inco ~ 
                     Direct     Lower  Upper
              subs -7.86921 -26.49848 6.8156
      Standardized Mediated Effects: 
         Indirect Effects: 
           inco ~ 
                            Indirect    Lower    Upper
              subs => prod1 -0.67622 -1.35919 -0.06464
         Direct Effects: 
           inco ~ 
                     Direct    Lower   Upper
              subs -0.43965 -1.48048 0.38079

37 / 40

Some Final Considerations38 / 40

Diagnostics

Depends on type of model used but the basics:

Model fit (BIC, Chi-Square, R-Squared)
Multi-collinearity
Prediction Accuracy

39 / 40

Questions?40 / 40

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Mediation Analysis

🤗

Tyson S. Barrett StatStudio Fall 2017

What is Real?

What is mediation analysis?

How do we interpret it?

Current Issues (and Solutions)

How do we use it? (in R)

What is Mediation Analysis?

Different than Moderation (Interactions)

Mediation Analysis Model

Different than Moderation (Interactions)

Mediation Analysis Model

Moderation (Interaction)

Mediation Analysis?

Mediation Analysis?

Built on Theory

Built on Theory

Built on Theory

No Omitted Influences

No Omitted Influences

Slight Caveat

No Measurement Error

Quick Review of Linear Regression

Mediation is 2+ Regression Models

Break Time

Take a short break but be thinking about mediation models you have seen in your field

Mediation Frameworks

Two Frameworks

OLS/GLM Regression

Two Frameworks

OLS/GLM Regression

SEM

Interpretation of Mediation

Interpretation of Mediation

Interpretation of Mediation

Complete or Partial Mediation?

Continuous Mediators and Outcomes

Continuous Mediators and Outcomes

Continuous Mediators and Outcomes

Mediation Analysis with Categorical Variables

Generalized Linear Models

Generalized Linear Models

Use GLMs with Mediation Analysis

But...

Interpretation with Categorical Mediator/Outcome

Interpret Individual Pathways

Interpret Individual Pathways

Marginal Mediation Analysis

How to use it?

Before Talking About Syntax

Break Time

If you do not care about learning how to do these analyses in R then feel free to take off (thanks for attending 😄)

Mediation Analysis in R

Dataset

Start with Cross-Tabulations

And with Correlations

SEM Framework

OLS Framework (Using Marginal Mediation Analysis)

Some Final Considerations

Diagnostics

Questions?

What is Real?

Help

Tyson S. Barrett
StatStudio

Fall 2017

Mediation Analysis in `R`