Appendix B — Meta-analyses

Meta-analysis is a statistical analysis that combines the results of multiple scientific studies by appropriately weighting the individual study-specific effect sizes (Y. H. Lee 2019), (Finckh and Tramèr 2008), (Israel and Richter 2011). A common meta-analytic combination strategy is to take a weighted average of the study-specific summary measures. In fixed-effects meta-analysis, the weights are based on the assumption that there is a single true parameter underlying all the studies, while in random-effects meta-analysis, the weights are based on a model where the true parameter varies across studies according to a probability distribution.

Lee, Young Ho. 2019. “Strengths and Limitations of Meta-Analysis.” The Korean Journal of Internal Medicine 94 (5): 391–95. https://doi.org/https://doi.org/10.3904/kjm.2019.94.5.391.

Finckh, Axel, and Martin R Tramèr. 2008. “Primer: Strengths and Weaknesses of Meta-Analysis.” Nature Clinical Practice Rheumatology 4 (3): 146–52. https://doi.org/https://doi.org/10.1038/ncprheum0732.

Israel, Heidi, and Randy R. Richter. 2011. “A Guide to Understanding Meta-Analysis.” Journal of Orthopaedic & Sports Physical Therapy 41 (7): 496–504. https://doi.org/https://www.jospt.org/doi/10.2519/jospt.2011.3333.

Hoffman, Julien I. E. 2019. “Basic Biostatistics for Medical & Biomedical Practitioners.” Indian Journal of Medical Research 154 (6): 899–900. https://doi.org/https://doi.org/10.1016/C2018-0-02190-8.

Haidich, A. B. 2010. “Meta-Analysis in Medical Research.” Hippokratia 14 (Suppl 1): 29–37. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3049418/.

Meta-analysis is particularly useful when individual studies are small, and a joint analysis of these studies can enhance the precision of effect size estimation and improve statistical power. Especially in microbiome studies, individual effect sizes from each study tend to be small (Hoffman 2019), (Haidich 2010), (Ma et al. 2022). Only through meta-analysis can these small effects reach statistical significance. Therefore, by jointly analyzing these studies, we amplify the collective voice of the data, offering a more comprehensive understanding of the effects under investigation.

In this chapter, our focus is on using the R/Bioconductor package MMUPHin (Meta-Analysis Methods with a Uniform Pipeline for Heterogeneity in microbiome studies) (Ma et al. 2022) to perform the meta-analysis of microbiome data. We utilize publicly available cancer microbiome datasets from the R/Bioconductor package curatedMetagenomicData (Pasolli et al. 2017).

Ma, Siyuan, Dmitry Shungin, Himel Mallick, Melanie Schirmer, Long H. Nguyen, Raivo Kolde, Eric Franzosa, Hera Vlamakis, Ramnik Xavier, and Curtis Huttenhower. 2022. “Population Structure Discovery in Meta-Analyzed Microbial Communities and Inflammatory Bowel Disease Using MMUPHin.” Genome Biology 23 (1): 208. https://doi.org/https://doi.org/10.1186/s13059-022-02753-4.

Pasolli, Edoardo, Lucas Schiffer, Paolo Manghi, Audrey Renson, Valerie Obenchain, Duy Tin Truong, Francesco Beghini, et al. 2017. “Accessible, Curated Metagenomic Data Through ExperimentHub.” Nature Methods 14 (11): 1023–24. https://doi.org/https://doi.org/10.1038/nmeth.4468.

B.1 Meta-analysis of cancer microbiome studies using relative abundance data

Recent work has indicated a possible involvement of the gut microbiome in influencing the efficacy of immune checkpoint inhibitor (ICI) treatment strategies for PD-1/PD-L1 (Programmed Cell Death Protein 1/Programmed Cell Death Ligand 1) targeting checkpoint inhibitors Peters et al. (2019). Several investigations into the microbiome have provided clinical data suggesting that the gut microbiome modulates the response to inhibitors of the PD-1/PD-L1 axis. However, published cancer microbiome studies face challenges, such as limited sample sizes, inconsistent data analysis methods, and a lack of functionally relevant consensus signatures.

Elucidation of the mechanisms by which the gut microbiome alters the function of the immune system to enable or promote cancer development may reveal novel pathways to explore in cancer therapy. Consequently, a high-quality re-analysis of public cancer microbiome data through a systematic meta-analysis approach could provide valuable insights into the microbiome’s role in cancer development and progression in a rapid and cost-effective manner.

Using advanced melanoma as a model, extensively studied in the context of PD-1/PD-L1 immune checkpoint inhibitors, our aim is to identify functional biomarkers positively and negatively associated with ICI response by analyzing both taxonomic and functional profiles with ICI response.

library(curatedMetagenomicData)

In the following section, we provide detailed examples of how to perform batch (study) effect correction and meta-analytic differential abundance testing on publicly available cancer microbiome data from multiple studies. We use the R package MMUPHin for these tasks. MMUPHin is a recently developed R/Bioconductor package designed for meta-analysis of microbiome taxonomic and functional profiles. It is agnostic to the data type and leverages another R/Bioconductor package, MaAsLin2 (Mallick et al. 2021), as a backend to conduct the meta-analysis. MaAsLin2 is particularly designed to be applicable to various microbial community data types (taxonomy or functional profiles) and environments (human or otherwise). It is modular, including implementations of alternative normalization/transformation schemes and statistical models (e.g., amplicon vs. shotgun metagenomic profiles). Leveraging this flexible framework under the hood, MMUPHin performs normalization, study (batch) effect correction, and meta-analysis using the default random effects meta-regression.

Mallick, Himel, Ali Rahnavard, Lauren J. McIver, Siyuan Ma, Yancong Zhang, Long H. Nguyen, Timothy L. Tickle, et al. 2021. “Multivariable Association Discovery in Population-Scale Meta-Omics Studies.” PLOS Computational Biology 17 (11): e1009442. https://doi.org/https://doi.org/10.1371/journal.pcbi.1009442.

Frankel, Arthur E, Laura A Coughlin, Jiwoong Kim, Thomas W Froehlich, Yang Xie, Eugene P Frenkel, and Andrew Y Koh. 2017. “Metagenomic Shotgun Sequencing and Unbiased Metabolomic Profiling Identify Specific Human Gut Microbiota and Metabolites Associated with Immune Checkpoint Therapy Efficacy in Melanoma Patients.” Neoplasia 19 (10): 848–55. https://doi.org/10.1016/j.neo.2017.08.004.

Lee, Karla A., Andrew Maltez Thomas, Laura A. Bolte, Johannes R. Björk, Laura Kist de Ruijter, Federica Armanini, Francesco Asnicar, et al. 2022. “Cross-Cohort Gut Microbiome Associations with Immune Checkpoint Inhibitor Response in Advanced Melanoma.” Nature Medicine 28 (3): 535–44. https://doi.org/https://doi.org/10.1038/s41591-022-01695-5.

Matson, Vyara, Jessica Fessler, Riyue Bao, Tara Chongsuwat, Yuanyuan Zha, Maria-Luisa Alegre, Jason J Luke, and Thomas F Gajewski. 2018. “The Commensal Microbiome Is Associated with Anti-PD-1 Efficacy in Metastatic Melanoma Patients.” Science 359 (6371): 104–8. https://doi.org/10.1126/science.aao3290.

Peters, Brandilyn A., Melissa Wilson, Una Moran, Anna Pavlick, Allison Izsak, Todd Wechter, Jeffrey S. Weber, Iman Osman, and Jiyoung Ahn. 2019. “Relating the Gut Metagenome and Metatranscriptome to Immunotherapy Responses in Melanoma Patients.” Genome Medicine 11 (1): 61. https://doi.org/10.1186/s13073-019-0672-4.

Wind, Thijs T, Ranko Gacesa, Arnau Vich Vila, Jacco J de Haan, Mathilde Jalving, Rinse K Weersma, and Geke A P Hospers. 2020. “Gut Microbial Species and Metabolic Pathways Associated with Response to Treatment with Immune Checkpoint Inhibitors in Metastatic Melanoma.” Melanoma Research 30 (3): 235–46. https://doi.org/10.1097/CMR.0000000000000656.

Here, we will first perform the batch effect correction analysis. We use both the relative abundance data and the pathway abundance data from 5 public datasets: Frankel et al. (2017), K. A. Lee et al. (2022), Matson et al. (2018), Peters et al. (2019), and Wind et al. (2020). All these datasets are available from the curatedMetagenomicData package, with a total of 285 subjects included.

######################
# Load melanoma data #
######################

library(dplyr)

se_relative <- sampleMetadata |>
    filter(study_name %in% c(
        "FrankelAE_2017", "GopalakrishnanV_2018", "LeeKA_2022",
        "MatsonV_2018", "PetersBA_2019", "WindTT_2020")) |>
    returnSamples("relative_abundance", rownames = "short")

se_pathway <- sampleMetadata |>
    filter(study_name %in% c(
        "FrankelAE_2017", "GopalakrishnanV_2018", "LeeKA_2022",
        "MatsonV_2018", "PetersBA_2019", "WindTT_2020")) |>
    returnSamples("pathway_abundance", rownames = "short")

First, let’s look into the relative abundance data.

B.1.1 Performing batch (study) effect adjustment with adjust_batch for relative abundance

In this analysis, we are using two input files. The first input file is the relative abundance data file, which consists of a feature-by-sample matrix. The second input is the metadata associated with the relative abundance file, which includes the study names and overall response rate (ORR) the response variable along with other metadata. By the end of this step, we obtain a batch-adjusted relative abundance matrix.

library(MMUPHin)

##########################
# Create sample metadata #
##########################
data_meta <- select(
    as.data.frame(colData(se_relative)),
    c("study_name", "ORR", "PFS12", "anti_PD_1"))

# Define response variable
data_meta$resvar <- NA
data_meta$anti_PD_1[data_meta$anti_PD_1 == "responder"] <- "yes"
data_meta$anti_PD_1[data_meta$anti_PD_1 == "non_responder"] <- "no"
data_meta$resvar[!is.na(data_meta$anti_PD_1)] <- data_meta$anti_PD_1[
    !is.na(data_meta$anti_PD_1)]
data_meta$resvar[!is.na(data_meta$PFS12)] <- data_meta$PFS12[
    !is.na(data_meta$PFS12)]
data_meta$resvar[!is.na(data_meta$ORR)] <- data_meta$ORR[!is.na(data_meta$ORR)]

# Filter individuals without response variable
data_meta <- data_meta[!is.na(data_meta$resvar), c("study_name", "resvar")]

# Convert the "study_name" to factor variable
data_meta$study_name <- as.factor(data_meta$study_name)

###########################
# Create Species Features #
###########################
# Transpose the abundance matrix and change the value of abundance data to
# proportion unit
data_abd <- assay(se_relative)
data_abd <- data_abd / 100

# Match the individuals in the data_abd
data_abd <- data_abd[, colnames(data_abd) %in% rownames(data_meta)]

#  Change the rownames names for variables of interest
rownames(data_abd) <- sub('.*s__', '', rownames(data_abd))

# Use adjust_batch to correct for differences in the five studies, while
# controlling for the effect of cases versus control (variable resvar in
# data_meta).
fit_adjust_batch <- adjust_batch(
    feature_abd = data_abd,
    batch = "study_name",
    covariates = "resvar",
    data = data_meta,
    control = list(verbose = FALSE, diagnostic_plot = NULL))

data_abd_adj <- fit_adjust_batch$feature_abd_adj

The first figure shows the relationship between the original batch mean parameters (Gamma) and the shrunk (regularized) batch mean parameters (Gamma (shrunken)). The points represent the 5 different study groups within the dataset. The var_batch points represent the variance within each batch, with different colors indicating different levels or categories of variance. The scatter plot on the right represents the change in mean abundance between the original and adjusted ones, displaying an obvious decrease after the batch effect adjustment analysis.

B.1.2 Evaluation for batch effect adjustment

After obtaining the adjusted abundance matrix, we can evaluate the improvement of the batch effect adjustment. Here, the total variation from study difference will be assessed through the PERMANOVA test (Tang, Chen, and Alekseyenko 2016).

Tang, Zhengzheng, Guanhua Chen, and Alexander V Alekseyenko. 2016. “PERMANOVA-S: Association Test for Microbial Community Composition That Accommodates Confounders and Multiple Distances.” Bioinformatics 32 (17): 2618–25. https://doi.org/10.1093/bioinformatics/btw311.

library(vegan)

D_before <- vegdist(t(data_abd))
D_after <- vegdist(t(data_abd_adj))

set.seed(1)
fit_adonis_before <- adonis2(D_before ~ study_name, data = data_meta)
fit_adonis_after <- adonis2(D_after ~ study_name, data = data_meta)

print(fit_adonis_before$aov.tab)
##  NULL
print(fit_adonis_after$aov.tab)
##  NULL

Based on the results, we can see that the study differences can explain a total of 11.922% (\(R^2\)) of the variability in microbial abundance profiles before study effect adjustment, whereas after adjustment this was reduced to 2.806% (\(R^2\)).

Let’s visualize the results of the PERMANOVA test.

library(ggplot2)
library(patchwork)

##############
# Ordination #
##############

# Before
R2_before <- round(fit_adonis_before$aov.tab[1, 5]*100, 1)
pcoa_before <- cmdscale(D_before, eig = TRUE)
ord_before <- as.data.frame(pcoa_before$points)
percent_var_before <- round(pcoa_before$eig / sum(pcoa_before$eig) * 100, 1)[1:2]
before_labels <- c(
    paste('Axis 1 (', percent_var_before[1], '%)', sep = ''),
    paste('Axis 2 (', percent_var_before[2], '%)', sep = ''))

before_phrase <- paste('Before (PERMANOVA R2 = ', R2_before, '%)', sep = '')
colnames(ord_before) <- c('PC1', 'PC2')
ord_before$Study <- data_meta$study_name

# After
R2_after <- round(fit_adonis_after$aov.tab[1, 5]*100, 1)
pcoa_after <- cmdscale(D_after, eig = TRUE)
ord_after <- as.data.frame(pcoa_after$points)
percent_var_after <- round(pcoa_after$eig / sum(pcoa_after$eig) * 100, 1)[1:2]
after_labels <- c(paste('Axis 1 (', percent_var_after[1], '%)', sep = ''),
                paste('Axis 2 (', percent_var_after[2], '%)', sep = ''))

after_phrase <- paste('After (PERMANOVA R2 = ', R2_after, '%)', sep = '')
colnames(ord_after) <- c('PC1', 'PC2')
ord_after$Study <- data_meta$study_name

# Ordination Plot
p_before <- ggplot(ord_before, aes(x = PC1, y = PC2, color = Study)) +
    geom_point(size = 4) +
    theme_bw() +
    xlab(before_labels[1]) +
    ylab(before_labels[2]) +
    ggtitle(before_phrase) +
    theme(plot.title = element_text(hjust = 0.5)) +
    theme(legend.position ="none")

p_after <- ggplot(ord_after, aes(x = PC1, y = PC2, color = Study)) +
    geom_point(size = 4) +
    theme_bw() +
    xlab(after_labels[1]) +
    ylab(after_labels[2]) +
    ggtitle(after_phrase) +
    theme(plot.title = element_text(hjust = 0.5)) +
    theme(legend.position ="none")

p_before + p_after

B.1.3 Meta-analytical differential abundance testing with lm_meta

Here, we illustrate the details of the meta-analytical differential abundance testing. We have several choices for the analysis methods. First, we apply linear models (LM) for individual study-specific analyses with various transformations.

library(Maaslin2)

transform <- c("NONE", "AST", "LOGIT", "LOG")
num <- NULL

for (i in 1:length(transform)) {
    fit_lm_meta <- lm_meta(
        feature_abd = data_abd_adj,
        batch = "study_name",
        exposure = "resvar",
        data = data_meta,
        control = list(analysis_method = 'LM', transform = transform[i]))

    num[i] <- sum(fit_lm_meta$meta_fits$qval.fdr < 0.05, na.rm = TRUE)
}

print(num)
##  [1] 0 0 0 0

Since none of the linear models are able to generate significant results, we next apply the Tweedie model (using the “CPLM” analysis method in MaAsLin2, also implemented in the R package Tweedieverse (Mallick et al. 2022).

Mallick, Himel, Suvo Chatterjee, Shrabanti Chowdhury, Saptarshi Chatterjee, Ali Rahnavard, and Stephanie C Hicks. 2022. “Differential Expression of Single-Cell RNA-Seq Data Using Tweedie Models.” Statistics in Medicine 41 (18): 3492–3510. https://doi.org/10.1002/sim.9430.

fit_lm_meta <- lm_meta(
    feature_abd = data_abd_adj,
    batch = "study_name",
    exposure = "resvar",
    data = data_meta,
    control = list(analysis_method = 'CPLM', transform = 'NONE'))

meta_fits <- fit_lm_meta$meta_fits

meta_fits |>
    filter(qval.fdr < 0.05) |>
    arrange(coef) |>
    mutate(feature = factor(feature, levels = feature)) |>
    ggplot(aes(y = coef, x = feature)) +
    geom_bar(stat = "identity") +
    coord_flip()

Based on the results, we see that there are 22 significant features in total, among which the has the strongest positive effect, while the species has the strongest negative effect.

B.2 Meta-analysis for pathway abundance data

In this section, we repeat all the analysis mentioned above using the pathway relative abundance data. We first prepare the input data for the analysis.

# Prepare the meta data
data_meta <- select(
    as.data.frame(colData(se_pathway)),
    c("study_name", "ORR", "PFS12", "anti_PD_1"))

# Define response variable
data_meta$resvar <- NA
data_meta$anti_PD_1[data_meta$anti_PD_1 == "responder"] <- "yes"
data_meta$anti_PD_1[data_meta$anti_PD_1 == "non_responder"] <- "no"
data_meta$resvar[!is.na(data_meta$anti_PD_1)] <- data_meta$anti_PD_1[
    !is.na(data_meta$anti_PD_1)]
data_meta$resvar[!is.na(data_meta$PFS12)] <- data_meta$PFS12[
    !is.na(data_meta$PFS12)]
data_meta$resvar[!is.na(data_meta$ORR)] <- data_meta$ORR[!is.na(data_meta$ORR)]

# Filter individuals without response variable
data_meta <- data_meta[!is.na(data_meta$resvar), c("study_name", "resvar")]

# Convert the "study_name" to factor variable
data_meta$study_name <- as.factor(data_meta$study_name)

# Prepare the abundance data
# Transpose the abundance matrix and change the value of abundance data to
# proportion unit
data_abd <- assay(se_pathway)
data_abd <- data_abd/100

# Filter out stratified features
data_abd<-data_abd[!grepl("\\|", rownames(data_abd)),]
data_abd<-as.data.frame(data_abd)
data_abd<-data_abd[-c(1:2), ]

# Match the individuals in the data_abd
data_abd <- data_abd[, colnames(data_abd) %in% rownames(data_meta)]

# Use adjust_batch to correct for differences in the five studies, while
# controlling for the effect of cases versus control (variable resvar in
# data_meta).
fit_adjust_batch <- adjust_batch(
    feature_abd = data_abd,
    batch = "study_name",
    covariates = "resvar",
    data = data_meta,
    control = list(verbose = FALSE, diagnostic_plot = NULL))

data_abd_adj <- fit_adjust_batch$feature_abd_adj

Next, we will evaluate the improvement of the batch effect adjustment and apply the PERMANOVA test.

D_before <- vegdist(t(data_abd))
D_after <- vegdist(t(data_abd_adj))

set.seed(1)
fit_adonis_before <- adonis2(D_before ~ study_name, data = data_meta)
fit_adonis_after <- adonis2(D_after ~ study_name, data = data_meta)

print(fit_adonis_before$aov.tab)
##  NULL
print(fit_adonis_after$aov.tab)
##  NULL

Based on the results, we see that the study differences can explain a total of 15.508% (\(R^2\)) of the variability in microbial pathway profiles before study effect adjustment, whereas after adjustment this was reduced to 5.037% (\(R^2\)).

Let’s also visualize the results of the PERMANOVA test.

# Ordination
# Before
R2_before <- round(fit_adonis_before$aov.tab[1, 5]*100, 1)
pcoa_before <- cmdscale(D_before, eig = TRUE)
ord_before <- as.data.frame(pcoa_before$points)
percent_var_before <- round(pcoa_before$eig / sum(pcoa_before$eig) * 100, 1)[1:2]
before_labels <- c(
    paste('Axis 1 (', percent_var_before[1], '%)', sep = ''),
    paste('Axis 2 (', percent_var_before[2], '%)', sep = ''))

before_phrase <- paste('Before (PERMANOVA R2 = ', R2_before, '%)', sep = '')
colnames(ord_before) <- c('PC1', 'PC2')
ord_before$Study <- data_meta$study_name

# After
R2_after <- round(fit_adonis_after$aov.tab[1, 5]*100, 1)
pcoa_after <- cmdscale(D_after, eig = TRUE)
ord_after <- as.data.frame(pcoa_after$points)
percent_var_after <- round(pcoa_after$eig / sum(pcoa_after$eig) * 100, 1)[1:2]
after_labels <- c(
    paste('Axis 1 (', percent_var_after[1], '%)', sep = ''),
    paste('Axis 2 (', percent_var_after[2], '%)', sep = ''))

after_phrase <- paste('After (PERMANOVA R2 = ', R2_after, '%)', sep = '')
colnames(ord_after) <- c('PC1', 'PC2')
ord_after$Study <- data_meta$study_name

# Ordination Plot
p_before <- ggplot(ord_before, aes(x = PC1, y = PC2, color = Study)) +
    geom_point(size = 4) +
    theme_bw() +
    xlab(before_labels[1]) +
    ylab(before_labels[2]) +
    ggtitle(before_phrase) +
    theme(plot.title = element_text(hjust = 0.5)) +
    theme(legend.position ="none")

p_after <- ggplot(ord_after, aes(x = PC1, y = PC2, color = Study)) +
    geom_point(size = 4) +
    theme_bw() +
    xlab(after_labels[1]) +
    ylab(after_labels[2]) +
    ggtitle(after_phrase) +
    theme(plot.title = element_text(hjust = 0.5)) +
    theme(legend.position ="none")

p <- p_before + p_after
p

As before, we have several choices on the analysis methods. However, since we don’t have significance at the FDR level, we use the unadjusted p-values to showcase the ``top hits’’.

transform <- c("NONE", "AST", "LOGIT", "LOG")
num <- NULL

for (i in 1:length(transform)) {
    fit_lm_meta <- lm_meta(
        feature_abd = data_abd_adj,
        batch = "study_name",
        exposure = "resvar",
        data = data_meta,
        control = list(analysis_method = 'LM', transform = transform[i]))

    num[i] <- sum(fit_lm_meta$meta_fits$pval < 0.05, na.rm = TRUE)
}

print(num)
##  [1] 11 10  9 11

fit_lm_meta2 <- lm_meta(
    feature_abd = data_abd_adj,
    batch = "study_name",
    exposure = "resvar",
    data = data_meta,
    control = list(analysis_method = 'CPLM', transform = 'NONE'))

num <- sum(fit_lm_meta2$meta_fits$pval < 0.05, na.rm = TRUE)
print(num)
##  [1] 15

Here is the summary table for the number of significant pathways using various meta-analytic differential analysis methods and transformations:

library(knitr)

analysis_data <- data.frame(
    Analysis_Method = c("LM", "LM", "LM", "LM", "CPLM"),
    Transform_Parameter = c("NONE", "AST", "LOGIT", "LOG", "NONE"),
    Number_Of_Significant_Pathways = c(11, 10, 9, 11, 15)
)

names(analysis_data) <- c(
    "Analysis Method", "Transformation", "Number of Significant Pathways")

kable(analysis_data, caption = "Analysis Results", align = c('l','l','r'))

Analysis Results
Analysis Method	Transformation	Number of Significant Pathways
LM	NONE	11
LM	AST	10
LM	LOGIT	9
LM	LOG	11
CPLM	NONE	15

Finally, let’s visualize the results using the “CPLM” analysis method, which identified largest number of significant pathways.

# Extract the results
meta_fits2 <- fit_lm_meta2$meta_fits
# meta_fits3 <- meta_fits2[meta_fits2$pval < 0.05 & !is.na(meta_fits2$pval), ]

# Create the figure
meta_fits2 |>
    filter(pval < 0.05) |>
    arrange(coef) |>
    mutate(feature = factor(feature, levels = feature)) |>
    ggplot(aes(y = coef, x = feature)) +
    theme(axis.text.y = element_text(size = 10)) +
    geom_bar(stat = "identity") +
    coord_flip()

Based on the results, we observe 15 significant features in total. Among these, queuosine biosynthesis exhibits the strongest negative effects. Additionally, L-lysine degradation XI is the most positively significant pathway, while xylose degradation IV is the most negatively significant pathway.