18  Mediation

Mediation analysis is used to study the effect of an exposure variable (X) on the outcome (Y) through a third factor, known as mediator (M). Mathematically, this relationship can be described as follows:

\[ Y \thicksim X * M \]

The contribution of a mediator is typically quantified in terms of Average Causal Mediated Effect (ACME), that is, the portion of the association between X and Y that is explained by M. In practice, this corresponds to the difference between the Total Effect (TE) and the Average Direct Effect (ADE):

\[ ACME = TE - ADE \]

The microbiome can mediate the effects of multiple environmental stimuli on human health. However, the importance of its role as a mediator depends on the nature of the stimulus. For example, the effect of dietary fiber intake on host behaviour is largely mediated by the gut microbiome (Logan and Jacka 2014). In contrast, the indirect impact of antibiotic use on mental health through an altered microbiome represents a more subtle process (Dinan and Dinan 2022).

Logan, Alan C, and Felice N Jacka. 2014. “Nutritional Psychiatry Research: An Emerging Discipline and Its Intersection with Global Urbanization, Environmental Challenges and the Evolutionary Mismatch.” Journal of Physiological Anthropology 33: 1–16.
Dinan, Katherine, and Timothy Dinan. 2022. “Antibiotics and Mental Health: The Good, the Bad and the Ugly.” Journal of Internal Medicine 292 (6): 858–69.

In general, the wide range of mediation effects can be divided into two classes:

Figure 18.1: Directed acyclic graphs for two possible relationships involving mediation. A. The effect of x on y is mostly direct, and only a portion thereof is mediated by m. B. The effect of x on y is completely mediated by m.

Mediation analysis is based on the assumption that the exposure variable, the mediator and the outcome follow one another in this temporal sequence. Therefore, investigating mediation is a suitable analytical choice in longitudinal studies, whereas it is discouraged in cross-sectional ones (Fairchild and McDaniel 2017).

Fairchild, Amanda J, and Heather L McDaniel. 2017. “Best (but Oft-Forgotten) Practices: Mediation Analysis.” The American Journal of Clinical Nutrition 105 (6): 1259–71.

We demonstrate a standard mediation analysis with the hitchip1006 dataset from the miaTime package, which contains a genus-level assay for 1006 Western adults of 6 different nationalities.

# Import libraries
library(mia)
library(miaViz)
library(scater)
library(patchwork)
library(knitr)

# Load dataset
data(hitchip1006, package = "miaTime")
tse <- hitchip1006

In our analyses, nationality and BMI group will represent the exposure (X) and outcome (Y) variables, respectively. We make the broad assumption that nationality reflects differences in the living environment between subjects.

# Convert BMI variable to numeric
tse$bmi_group <- as.numeric(tse$bmi_group)

# Agglomerate features by phylum
tse <- agglomerateByRank(tse, rank = "Phylum")

# Apply clr transformation to counts assay
tse <- transformAssay(
    tse,
    method = "clr",
    pseudocount = 1
)

In the following examples, the effect of living environment on BMI mediated by the microbiome is investigated in three different steps:

  1. global contribution by alpha diversity
  2. individual contributions by assay features
  3. joint contributions by reduced dimensions

18.1 Alpha diversity as mediator

First, we ask whether alpha diversity mediates the effect of living environment on BMI. Using the getMediation function, the variables X, Y and M are specified with the arguments treatment, outcome and mediator, respectively. We control for sex and age and limit comparisons to two nationality groups, Central Europeans (control) vs. Scandinavians (treatment).

# Analyse mediated effect of nationality on BMI via alpha diversity
# 100 permutations were done to speed up execution, but ~1000 are recommended 
med_df <- getMediation(
    tse,
    treatment = "nationality",
    outcome = "bmi_group",
    mediator = "diversity",
    covariates = c("sex", "age"),
    treat.value = "Scandinavia",
    control.value = "CentralEurope",
    boot = TRUE, sims = 100
)
 
# Plot results as a forest plot
plotMediation(med_df, layout = "forest")

The forest plot above shows significance for both ACME and ADE, which suggests that alpha diversity is a partial mediator of living environment on BMI. In contrast, if ACME but not ADE were significant, complete mediation would be inferred. The negative sign of the effect means that a lower BMI and alpha diversity are associated with the control group (Scandinavians).

18.2 Assay features as mediators

If we suspect that only certain features of the microbiome act as mediators, we can estimate their individual contributions by fitting one model for each feature in a selected assay. As multiple tests are performed, it is good practice to correct the significance of the findings with a method of choice to adjust p-values.

# Analyse mediated effect of nationality on BMI via clr-transformed features
# 100 permutations were done to speed up execution, but ~1000 are recommended     
tse <- addMediation(
    tse, name = "assay_mediation",
    treatment = "nationality",
    outcome = "bmi_group",
    assay.type = "clr",
    covariates = c("sex", "age"),
    treat.value = "Scandinavia",
    control.value = "CentralEurope",
    boot = TRUE, sims = 100,
    p.adj.method = "fdr"
)

# View results
kable(metadata(tse)$assay_mediation)
mediator acme acme_pval acme_lower acme_upper ade ade_pval ade_lower ade_upper total total_lower total_upper total_pval acme_padj ade_padj total_padj
2 Bacteroidetes -0.1130 0.00 -0.1777 -0.0534 -0.2417 0 -0.3866 -0.1167 -0.3547 -0.5145 -0.2204 0 0.000 0 0
4 Firmicutes -0.0378 0.00 -0.0782 -0.0142 -0.3168 0 -0.4400 -0.1570 -0.3547 -0.4930 -0.1975 0 0.000 0 0
6 Proteobacteria -0.0411 0.00 -0.0833 -0.0113 -0.3136 0 -0.4780 -0.1694 -0.3547 -0.5146 -0.2043 0 0.000 0 0
8 Verrucomicrobia -0.0345 0.00 -0.0690 -0.0099 -0.3201 0 -0.4443 -0.1947 -0.3547 -0.4942 -0.2335 0 0.000 0 0
1 Actinobacteria -0.0458 0.06 -0.0982 0.0032 -0.3088 0 -0.4469 -0.1766 -0.3547 -0.4826 -0.2318 0 0.096 0 0
3 Cyanobacteria -0.0037 0.66 -0.0334 0.0280 -0.3510 0 -0.5070 -0.2277 -0.3547 -0.5152 -0.2261 0 0.880 0 0
5 Fusobacteria 0.0001 0.92 -0.0229 0.0282 -0.3547 0 -0.4951 -0.1960 -0.3547 -0.4851 -0.2010 0 0.940 0 0
7 Spirochaetes 0.0034 0.94 -0.0194 0.0206 -0.3581 0 -0.4974 -0.1987 -0.3547 -0.4940 -0.1996 0 0.940 0 0

For convenience, results can be visualized with a heatmap, where rows represent features and columns correspond to the coefficients for TE, ADE and ACME. Significant findings can be marked with p-values or stars.

# Plot results as a heatmap
plotMediation(
    tse, "assay_mediation",
    layout = "heatmap",
    add.significance = "symbol"
)

Results suggest that only four out of eight features (Bacteroidetes, Firmicutes, Proteobacteria and Verrucomicrobia) partially mediate the effect of living environment on BMI. As the sign is negative, a smaller abundance of these mediators is found in Scandinavians compared to Central Europeans, which matches the negative trend of alpha diversity.

While analyses were conducted at the phylum level to simplify results, using original assays without agglomeration also represents a valid option. However, the increase in phylogenetic resolution also implies a higher probability of spurious findings, which in turn necessitates a stronger correction for multiple comparisons. A solution to this issue is proposed in the following section.

18.3 Reduced dimensions as mediators

Performing mediation analysis for each feature provides insight into individual contributions. However, this approach greatly increases the number of multiple tests to correct for and thus it reduces statistical power. To overcome this issue, it is possible to assess the joint contributions of groups of features by means of dimensionality reduction.

# Reduce dimensions with PCA
tse <- runPCA(
    tse, name = "PCA",
    assay.type = "clr",
    ncomponents = 3
)

# Analyse mediated effect of nationality on BMI via principal components 
# 100 permutations were done to speed up execution, but ~1000 are recommended       
tse <- addMediation(
    tse, name = "reddim_mediation",
    treatment = "nationality",
    outcome = "bmi_group",
    dimred = "PCA",
    covariates = c("sex", "age"),
    treat.value = "Scandinavia",
    control.value = "CentralEurope",
    boot = TRUE, sims = 100,
    p.adj.method = "fdr"
)

# View results
kable(metadata(tse)$reddim_mediation)
mediator acme acme_pval acme_lower acme_upper ade ade_pval ade_lower ade_upper total total_lower total_upper total_pval acme_padj ade_padj total_padj
PCA1 -0.1317 0.00 -0.2231 -0.0716 -0.2230 0 -0.3927 -0.0750 -0.3547 -0.5065 -0.2337 0 0.00 0 0
PCA2 0.0104 0.28 -0.0099 0.0322 -0.3651 0 -0.4924 -0.2411 -0.3547 -0.4846 -0.2309 0 0.42 0 0
PCA3 -0.0033 0.46 -0.0146 0.0068 -0.3514 0 -0.4848 -0.2125 -0.3547 -0.4886 -0.2128 0 0.46 0 0

Results can be displayed as one forest plot for each reduced dimension. When combined with a heatmap of the feature loadings by dimension, it helps deduce whether certain groups of features act as mediators.

# Plot results as multiple forest plots
p1 <- plotMediation(
    tse, "reddim_mediation",
    layout = "forest"
)

# Plot loadings by principal component
p2 <- plotLoadings(
    tse, "PCA",
    ncomponents = 3, n = 8,
    layout = "heatmap"
)

# Combine plots
p1 / p2

The plot above suggests that only PC1 partially mediates the effect of living environment on BMI. Within this dimension, Bacteroidetes and Actinobacteria are the largest contributors in opposite directions. Interestingly, in the previous section the former but not the latter appeared significant individually, which implies that mediation might emerge from their joint contribution.

18.4 Final remarks

This chapter introduced the concept of mediation and demonstrated a standard analysis of the microbiome as mediator at three different levels (global, individual and joint contributions). Importantly, the provided method is based on the mediation package and is limited to univariate comparisons and binary conditions for the exposure variable (Tingley et al. 2014). Therefore, it is recommended to reduce the number of mediators under study by means of a knowledge-based strategy to preserve statistical power.

Tingley, Dustin, Teppei Yamamoto, Kentaro Hirose, Luke Keele, and Kosuke Imai. 2014. mediation: R Package for Causal Mediation Analysis.” Journal of Statistical Software 59 (5): 1–38. http://www.jstatsoft.org/v59/i05/.
Xia, Yinglin. 2021. “Mediation Analysis of Microbiome Data and Detection of Causality in Microbiome Studies.” Inflammation, Infection, and Microbiome in Cancers: Evidence, Mechanisms, and Implications, 457–509.

A few methods for multivariate mediation analysis of high-dimensional omic data also exist (Xia 2021). However, no one solution has emerged yet to become the golden standard in microbiome data analysis, mainly because the available approaches can only partially accommodate for the specific properties of microbiome data, such as compositionality, sparsity and its hierarchical structure. While this chapter proposed a standard approach to mediation analysis, in the future fine-tuned solutions for the microbiome may also become common.

Back to top