16  Differential abundance

An important component of many microbiome studies is differential abundance analysis (DAA). It is mainly used to identify differences in the abundances of individual taxa (at any taxonomic level) between two or more groups, such as treatment versus control groups.

The goal of DAA is thus to identify biomarkers of a certain phenotype or condition, and gain understanding of a complex system by looking at its isolated components. For example, the identification of a bacterial taxon that is more abundant in healthy patients compared to diseased patients can lead to important insights into the underlying mechanisms of the disease. In other words, differentially abundant taxa can be involved in the dynamics of the disease, which in turn helps to understand the system as a whole.

16.1 Challenges in differential abundance analysis

As discussed in Section 11.1, microbiome data display unique properties, such as compositionality and skewed/zero-inflated distributions. Evidently, these properties need to be addressed properly also in DAA. Unfortunately, there is currently no consensus on how that should be done. Instead, there exist dozens of methods employing varying techniques and, consequently, providing highly discordant results.

The discordancy of results provided by different DAA methods was recently well illustrated by Nearing et al. (2022). In this study, multiple DAA methods were applied to 38 different datasets and their results were compared to one another. Similar findings were also made by Yang and Chen (2022) who evaluated DAA methods 106 real datasets (along with simulated data). They found especially that on many datasets almost any taxon was detected by at least one DAA method (see Figure 7 in their article). They thus raised concerns of potential cherry-picking, that is, a researcher would be likely to find the taxa in favor of their hypothesis by just trying out several different DAA methods.

16.2 Recommendations based on recent literature

Usually, when a new DAA method is introduced, the authors demonstrate it to perform better compared to some of the existing alternatives. That is often achieved by employing simulated data and possibly one or more real datasets. The authors may have, however, selected their simulation models, example datasets, evaluation metrics, compared methods, etc. specifically to show the introduced method in a favorable light. Assessing the performance of different methods based solely on the studies where they were originally introduced may therefore not be advisable.

A more neutral evaluation of the performance of different DAA methods can be found from several independent benchmarking studies (e.g. Weiss et al. (2017), Hawinkel et al. (2019), Calgaro et al. (2020), Wallen (2021), Nearing et al. (2022), Yang and Chen (2022), Cappellato, Baruzzo, and Camillo (2022), Roche and Mukherjee (2022), Swift et al. (2023), Wirbel et al. (2024), Pelto et al. (2025), Gamboa-Tuz et al. (2025)). Unfortunately, these studies have ended up with very discordant conclusions on the methods. However, if we look specifically at some benchmarking studies that have used only real datasets (instead of simulated data), we can see that analysis of relative abundances (TSS normalized counts) with a simple non-parametric method or linear regression has performed well.

Hawinkel, Stijn, Federico Mattiello, Luc Bijnens, and Olivier Thas. 2019. “A Broken Promise: Microbiome Differential Abundance Methods Do Not Control the False Discovery Rate.” Briefings in Bioinformatics 20 (January): 210–21. https://doi.org/10.1093/BIB/BBX104.
Wallen, Zachary D. 2021. “Comparison Study of Differential Abundance Testing Methods Using Two Large Parkinson Disease Gut Microbiome Datasets Derived from 16S Amplicon Sequencing.” BMC Bioinformatics 22 (December): 1–29. https://doi.org/10.1186/S12859-021-04193-6/FIGURES/4.
Cappellato, Marco, Giacomo Baruzzo, and Barbara Di Camillo. 2022. “Investigating Differential Abundance Methods in Microbiome Data: A Benchmark Study.” PLOS Computational Biology 18 (September): e1010467. https://doi.org/10.1371/JOURNAL.PCBI.1010467.
Swift, Dionne, Kellen Cresswell, Robert Johnson, Spiro Stilianoudakis, and Xingtao Wei. 2023. “A Review of Normalization and Differential Abundance Methods for Microbiome Counts Data.” Wiley Interdisciplinary Reviews: Computational Statistics 15 (January): e1586. https://doi.org/10.1002/WICS.1586.
Wirbel, Jakob, Morgan Essex, Sofia Kirke Forslund, and Georg Zeller. 2024. “A Realistic Benchmark for Differential Abundance Testing and Confounder Adjustment in Human Microbiome Studies.” Genome Biology 2024 25:1 25 (September): 1–26. https://doi.org/10.1186/S13059-024-03390-9.
  • Nearing et al. (2022) studied the performance of DAA methods using 38 datasets. They evaluated the number of detected taxa (ASVs), consistency of the detected taxa between methods, replication of results between studies and error rate on datasets with randomly assigned groups. They emphasized conservatism (low error rate) in their recommendations and ended up recommending ALDEx2 (Fernandes et al. 2014) and ANCOMBC (Mandal et al. 2015). Additionally, they recommended Maaslin2 (Mallick, Rahnavard, and McIver 2020) (i.e. analyzing arcsine square root transformed relative abundances with linear regression) as a more powerful option.

  • In addition to simulations, Yang and Chen (2022) estimated the FDR of different DAA methods on 106 real datasets with randomly created groups. They found that, the observed FDR was at the nominal level when, for instance, analyzing relative abundances either with the Wilcoxon test or with linear regression (Maaslin2).

  • Pelto et al. (2025) studied how the results from different DAA methods replicate between random partitions of datasets and between datasets from independent studies. They found that when considering consistency along with sensitivity, the best performance was obtained by analyzing relative abundances with a non-parametric method or (after log transformation) with linear regression/t-test (Maaslin2) or analyzing presence/absence of taxa with logistic regression.

  • Gamboa-Tuz et al. (2025) employed three real datasets to benchmark different DAA methods against biological ground truth. They found that methods employing the TSS normalization (with e.g. the Wilcoxon test) and RNA-seq-derived methods performed best.

Nearing, Jacob T., Gavin M. Douglas, Molly G. Hayes, Jocelyn MacDonald, Dhwani K. Desai, Nicole Allward, Casey M. A. Jones, et al. 2022. “Microbiome Differential Abundance Methods Produce Different Results Across 38 Datasets.” Nature Communications 13 (1): 342. https://doi.org/10.1038/s41467-022-28034-z.
Fernandes, Andrew D, Jennifer NS Reid, Jean M Macklaim, Thomas A McMurrough, David R Edgell, and Gregory B Gloor. 2014. “Unifying the Analysis of High-Throughput Sequencing Datasets: Characterizing RNA-Seq, 16S rRNA Gene Sequencing and Selective Growth Experiments by Compositional Data Analysis.” Microbiome 2 (December): 15. https://doi.org/10.1186/2049-2618-2-15.
Mandal, Siddhartha, Will Van Treuren, Richard A. White, Merete Eggesbø, Rob Knight, and Shyamal D. Peddada. 2015. “Analysis of Composition of Microbiomes: A Novel Method for Studying Microbial Composition.” Microbial Ecology in Health & Disease 26 (0). https://doi.org/10.3402/mehd.v26.27663.

We therefore recommend performing DAA by analyzing relative (TSS normalized) abundances by a non-parametric method (e.g., the Wilcoxon test or ordinal regression model) or, after a log transformation, with t-test/linear regression. In addition to their good performance on real data they have the benefit of being intuitive and easy to understand.

We note, however, that the performance of DAA methods likely depends on the characteristics of the analyzed data (Weiss et al. (2017), Calgaro et al. (2020)). Consequently, while the methods recommended above seem to be good general default methods, they may not necessarily be the optimal methods in many cases. If one wants to evaluate which method may perform best specifically on their data, one option to use the benchdamic package by Calgaro et al. (2022) (see vignette here).

Weiss, Sophie, Zhenjiang Zech Xu, Shyamal Peddada, Amnon Amir, Kyle Bittinger, Antonio Gonzalez, Catherine Lozupone, et al. 2017. “Normalization and Microbial Differential Abundance Strategies Depend Upon Data Characteristics.” Microbiome 5 (March): 1–18. https://doi.org/10.1186/S40168-017-0237-Y/FIGURES/8.
Calgaro, Matteo, Chiara Romualdi, Levi Waldron, Davide Risso, and Nicola Vitulo. 2020. “Assessment of Statistical Methods from Single Cell, Bulk RNA-Seq, and Metagenomics Applied to Microbiome Data.” Genome Biology 21 (1): 191. https://doi.org/10.1186/s13059-020-02104-1.
Calgaro, Matteo, Chiara Romualdi, Davide Risso, and Nicola Vitulo. 2022. “Benchdamic: Benchmarking of Differential Abundance Methods for Microbiome Data.” Bioinformatics 39 (1). https://doi.org/10.1093/bioinformatics/btac778.
Giliberti, Sara AND Mauriello, Renato AND Cavaliere. 2022. “Host Phenotype Classification from Human Microbiome Data Is Mainly Driven by the Presence of Microbial Taxa.” PLOS Computational Biology 18 (4): 1–22. https://doi.org/10.1371/journal.pcbi.1010066.
Karwowska, Zuzanna, Oliver Aasmets, Mait Metspalu, Andres Metspalu, Lili Milani, Tõnu Esko, Tomasz Kosciolek, and Elin Org. 2025. “Effects of Data Transformation and Model Selection on Feature Importance in Microbiome Classification Data.” Microbiome 13 (1). https://doi.org/10.1186/s40168-024-01996-6.

Lastly, based on recent studies on classification accuracy of healthy vs. non-healthy patients, it seems that a high proportion of information in microbiome data is included in the mere presence/absence of taxa (Giliberti (2022) Karwowska et al. (2025)). A viable, and even simpler, alternative to DAA is thus differential prevalence analysis (see also Pelto et al. (2025)).

16.3 Performing differential abundance analysis

Following the recommendations above, we show in this section how to perform DAA by analyzing (log transformed) relative abundances with linear regression models. We first illustrate the analysis in the simplest case where two groups are compared, and no other covariates are included in the analysis. After that we illustrate more complex analyses, namely, the analysis with additional covariate(s) included and the analysis with interaction of two variables. Furthermore, we illustrate how to perform differential prevalence analysis and the differential analysis of positive (non-zero) abundances.

16.3.1 Preparing the data

We use the Tengeler2020 dataset from mia package. The dataset includes samples from people with ADHD (N = 13) and without ADHD (Control group, N = 14). The people are from three different cohorts.

# Load the needed packages
library(mia)
library(knitr)
library(dplyr)

# Import dataset
data("Tengeler2020", package = "mia")
tse <- Tengeler2020

# Show patient status by cohort
table(tse$patient_status, tse$cohort) |>
    kable()
Cohort_1 Cohort_2 Cohort_3
ADHD 4 5 4
Control 6 5 3

Before starting the analysis, we agglomerate the features by genus and filter them by a prevalence threshold of 10%. Furthermore, we create assays for relative abundances and presence/absence of taxa. Lastly, we transform variable patient_status to a factor with Control as the reference level. (It is a standard practice to use the control group as the reference group in many statistical analyses.)

# Agglomerate by genus and subset by prevalence
tse <- subsetByPrevalent(tse, rank = "Genus", prevalence = 0.10)

# Transform count data and create assay of relative abundances
tse <- transformAssay(tse, assay.type = "counts", method = "relabundance")

# Transform count data and create assay of presence/absence of taxa
tse <- transformAssay(tse, assay.type = "counts", method = "pa")

# Transform patient_status to a factor and set "Control" as the reference level
tse$patient_status <- factor(tse$patient_status, levels = c("Control", "ADHD"))

# We also transform cohort as a factor
tse$cohort <- factor(tse$cohort, levels = c("Cohort_1", "Cohort_2", "Cohort_3"))

16.3.2 Simple two-group comparison

Here we perform the simplest version of DAA, namely, the comparison of two groups without any other covariates included in the analysis. In our case this means comparing the relative abundances of genera between the people with and without ADHD.

We employ the Maaslin2 function from the Maaslin2 package (Mallick, Rahnavard, and McIver 2020). The function

Mallick, Himel, Ali Rahnavard, and Lauren J. McIver. 2020. MaAsLin 2: Multivariable Association in Population-Scale Meta-Omics Studies. http://huttenhower.sph.harvard.edu/maaslin2.
  • transforms counts to relative abundances,
  • replaces zeros by half of the smallest non-zero relative abundance, (separately for each taxon),
  • applies log2 transformation to the relative abundances, and
  • runs a linear regression model for each taxon using the transformed relative abundances as the response variable.

To run the function we need to provide it with the count matrix and metadata as inputs. We also need to provide it with the variable indicating the compared groups, i.e. patient_status in our case. Furthermore, we need to specify the folder for some output files. Lastly, we prevent Maaslin2() to create certain plots. (Note that Maaslin2() prints a lot of some output which is suppressed here for convenience.)

knitr::opts_chunk$set(eval = FALSE)
library(Maaslin2)

obj_daa_basic <- Maaslin2(
    input_data = assay(tse), # The count matrix
    input_metadata = colData(tse) |> as.data.frame(), # The metadata
    fixed_effects = "patient_status", # The predictor variable

    output = "output", # The name of the folder for output files.
    plot_heatmap = FALSE,
    plot_scatter = FALSE
)

Next, we extract the data.frame of results from the list object returned by Maaslin2(). We also calculate the 95% confidence intervals and transform the results to a slightly more convenient format. (The sample size N is used as an approximation for the residual degrees of freedom (df) in the calculation of confidence intervals.)

# Extract the results from the list object and calculate the 95% CIs
res_daa_basic <- obj_daa_basic$results |>
    mutate(
        ci_lwr = coef - qt(.975, df = N) * stderr,
        ci_upr = coef + qt(.975, df = N) * stderr
    ) |>
    select(feature, name, coef, pval, qval, ci_lwr, ci_upr)

# Print the results for taxa with the smallest p-values
res_daa_basic |>
    arrange(pval) |>
    head() |>
    kable()

The columns of res_daa_basic are as follows:

  • feature indicates the name of a taxon (here genus).
  • name indicates the predictor variable. As patient_status is a categorical variable, it also tells the group used as the non-reference level (here ADHD as Control was set as the reference level).
  • coef is the log2-fold change estimate, a standard way to express effect size in DAA. Most importantly, positive values indicate abundance being higher in the non-reference group (here ADHD). More precisely, 2 ^ coef would be interpreted as the ratio of geometric means of the relative abundances between the ADHD and Control groups.
  • pval is the unadjusted p-value.
  • qval (q-value) is the FDR adjusted p-value. This is generally used to define statistical significance. Typically, q-values < .05 are considered (statistically) significant.
  • ci_lwr and ci_upr are the lower and upper bounds of the 95% confidence intervals for the log2-fold change.

Now, if we use the standard criterion for statistical significance, namely, q-value < 0.05, we can see that the DAA result is statistically significant for the genera named as [Ruminococcus] gauvreauii group, Faecalibacterium, Catabacter, [Clostridium] innocuum group and Erysipelatoclostridium. (Maaslin2 has some problems with special characters such as [ or ].) Moreover, for instance, the positive log2-fold change estimate of Faecalibacterium tells that the average relative abundance of Faecalibacterium is higher in the ADHD group.

16.3.2.1 Visualizing the results

While the numbers above provide summarized information about the group differences for each taxon, it is often useful to further examine these associations in more detail. That can be done, for instance, using strip plots with boxplots. We show here how such plots can be created for the significant taxa by using plotBoxplot function from the miaViz package.

library(miaViz)

# Create strip plots with boxplots (and prevalence bars)
plotBoxplot(
    tse,
    features = c(
        "[Ruminococcus]_gauvreauii_group",
        "Faecalibacterium",
        "Catabacter",
        "Erysipelatoclostridium",
        "[Clostridium]_innocuum_group"
    ),
    assay.type = "relabundance", # To plot relative abundances
    x = "patient_status", # The variable on the x (horizontal) axis
    facet.by = "rownames", # To create a separate facet for each feature
    scales = "free_y", # To allow different y-axis scales
    add.proportion = TRUE, # Add bars indicating prevalence
    point.offset = "quasirandom", # To plot points nicely
    point.size = 2, # The size of the points
    point.color = "black", # The color of the edges of the points
    point.alpha = 1, # The transparency of the edges of the points
    box.alpha = .20
) + # The transparency of the boxplots
    theme(strip.text = element_text(size = 7)) # The size of the facet labels

In each facet,

  • the thicker horizontal line in each boxplot indicates the median,
  • each box indicates the interquartile range (IQR), namely, the range between the 25th (Q1) and 75th percentile (Q3),
  • the lower whisker indicates the smallest value above \(Q1 - 1.5 \times IQR\), and
  • the lower whisker indicates the largest value below \(Q3 + 1.5 \times IQR\).
  • The horizontal bars at the bottom indicate the prevalence of each genus in each group.

The strip plots and boxplots clearly show, for instance, how the relative abundance of Faecalibacterium is higher in the ADHD group, as was indicated by the positive the log2-fold estimate (coef) in the results table above. Moreover, for instance, the prevalence bars of Erysipelatoclostridium show how Erysipelatoclostridium is more prevalent in the control group.

Lastly, we show how the results for all genera can be summarized visually by plotting the log2-fold estimates and their 95% confidence intervals.

# First arrange the genera by log2-fold estimate
top_features <- res_daa_basic[order(res_daa_basic[["coef"]], decreasing = TRUE), "feature"]
res_daa_basic[["genus_ord"]] <- factor(res_daa_basic[["feature"]], levels = top_features)

# Plot the log2-fold changes and the 95% confidence intervals
ggplot(res_daa_basic, aes(x = coef, y = genus_ord)) +
    geom_vline(xintercept = 0, linetype = "dashed") +
    geom_point() +
    geom_errorbarh(aes(xmin = ci_lwr, xmax = ci_upr), height = 0.2) +
    labs(x = "Log2-fold change (95% CI)") +
    theme_light() +
    theme(
        axis.title.y = element_blank(),
        axis.text.y = element_text(size = 7),
        plot.title = element_text(hjust = 0.5)
    )

16.3.3 DAA with additional covariates

Here we show how to perform DAA with an additional covariate included in the analysis. One usually wants to include a variable as a covariate if they suspect that it may act as a confounding variable, namely, a variable that causes a spurious association between the variable of main interest and the abundances.

We use here cohort as a covariate. We proceed as above except that we now set fixed_effects = c("patient_status", "cohort"). Furthermore, we now calculate the q-values manually so that they are calculated separately for each variable in the model (indicated by the variable metadata in obj_daa_cov$results). This is because Maaslin2() calculates the q-values over all variables in the model which is not what here as we are only interested in the variable patient_status.

# Run Maaslin2 with patient_status and cohort as predictor variables
obj_daa_cov <- Maaslin2(
    input_data = assay(tse),
    input_metadata = colData(tse) |> as.data.frame(),
    fixed_effects = c("patient_status", "cohort"),
    output = "output",
    plot_heatmap = FALSE,
    plot_scatter = FALSE
)

# Extract the results and calculate the 95% CIs and q-values
res_daa_cov <- obj_daa_cov$results |>
    mutate(
        ci_lwr = coef - qt(.975, df = N) * stderr,
        ci_upr = coef + qt(.975, df = N) * stderr
    ) |>
    group_by(metadata) |>
    mutate(qval = p.adjust(pval, method = "fdr")) |>
    ungroup() |>
    select(feature, name, coef, pval, qval, ci_lwr, ci_upr)
# Print a sample of the results
res_daa_cov |>
    arrange(feature, name) |>
    head() |>
    kable()

Note that there now appear results also for the variable cohort. As cohort consists of three groups, namely, Cohort_1 (the reference group), Cohort_2 and Cohort_3, there are results separately for the comparison between Cohort_2 vs. Cohort_1 (cohortCohort_2) and for the comparison between Cohort_3 vs. Cohort_1 (cohortCohort_3).

We can now examine how the inclusion of cohort as a covariate affects the results for patient_status.

# Print the results for taxa with the smallest p-values for patient_status
res_daa_cov |>
    filter(name == "patient_statusADHD") |>
    arrange(pval) |>
    head() |>
    kable()

When comparing these results to the results in Section 16.3.2, we can see that, for instance, the q-values of Faecalibacterium and Catabacter are now somewhat smaller. Nevertheless, the results look overall rather similar to the results without the covariate included. Therefore, cohort does not seem to be a confounder causing spurious associations between patient status and the abundances of any taxa.

16.3.4 DAA with an interaction

Here we show a more complex version of DAA, namely, a case where the interaction effect of two variables is analyzed. We give this example here for illustration purposes but note that such complex analysis may not be advisable in a small dataset such as our example dataset.

We analyze the interaction between patient_status and cohort. We proceed as in Section 16.3.3 above except that we now add also the interaction term into the formula. As Maaslin2() does not directly support the analysis with interactions we must first create the interaction dummy variables in to the tse object. The main effects patient_status and cohort as well as the interaction dummy variables are then included in the Maaslin2() function.

# Create dummy variables for the interaction term
tse$patient_statusADHD_cohortCohort_2 <-
    ifelse(tse$patient_status == "ADHD" & tse$cohort == "Cohort_2", 1, 0)
tse$patient_statusADHD_cohortCohort_3 <-
    ifelse(tse$patient_status == "ADHD" & tse$cohort == "Cohort_3", 1, 0)

# Run Maaslin2 with patient_status, cohort and their interaction terms
obj_daa_ia <- Maaslin2(
    input_data = assay(tse),
    input_metadata = colData(tse) |> as.data.frame(),
    fixed_effects = c(
        "patient_status",
        "cohort",
        "patient_statusADHD_cohortCohort_2",
        "patient_statusADHD_cohortCohort_3"
    ),
    output = "output",
    plot_heatmap = FALSE,
    plot_scatter = FALSE
)

# Extract the results and calculate the 95% CIs and q-values
res_daa_ia <- obj_daa_ia$results |>
    mutate(
        ci_lwr = coef - qt(.975, df = N) * stderr,
        ci_upr = coef + qt(.975, df = N) * stderr,
        metadata = ifelse(
            metadata %in% c(
                "patient_statusADHD_cohortCohort_2",
                "patient_statusADHD_cohortCohort_3"
            ),
            "patient_satus:cohort",
            metadata
        )
    ) |>
    group_by(metadata) |>
    mutate(qval = p.adjust(pval, method = "fdr")) |>
    ungroup() |>
    select(feature, name, coef, pval, qval, ci_lwr, ci_upr)
# Print a sample of the results
res_daa_ia |>
    arrange(feature, name) |>
    head() |>
    kable()

The results object res_daa_ia now also includes the interaction terms patient_statusADHD_cohortCohort_2 and patient_statusADHD_cohortCohort_3.

We examine the term patient_status_cohortCohort_3. It indicates how the differences between the ADHD and Control groups vary between cohorts 1 and 3.

# Print the results for the interaction term patient_statusADHD_cohortCohort_3
res_daa_ia |>
    filter(name == "patient_statusADHD_cohortCohort_3") |>
    arrange(pval) |>
    head() |>
    kable()

The smallest p-value is observed for Coprobacter. As the corresponding log2-fold change estimate is negative it means that the difference in the relative abundance of Coprobacter between the ADHD and Control groups is higher in Cohort 1 than in Cohort 3.

This interaction can be better illustrated by plotting the relative abundances of Coprobacter by patient_status and cohort.

# Plot the abundances of Coprobacter by patient_status and cohort
plotBoxplot(
    tse,
    features = c("Coprobacter"),
    assay.type = "relabundance", # To plot relative abundances
    x = "patient_status", # The variable on the x (horizontal) axis
    facet.by = "cohort", # To create a separate plot for each cohort
    scales = "fixed", # To have the same y-axis scale in each facet
    add.proportion = TRUE, # Add bars indicating prevalence
    point.offset = "quasirandom", # To plot points nicely
    point.size = 2, # The size of the points
    point.color = "black", # The color of the edges of the points
    point.alpha = 1, # The transparency of the edges of the points
    box.alpha = .20
) # The transparency of the boxplots

Here we can clearly see how the relative abundances of Coprobacter are higher in the ADHD group in Cohort 1 but they are higher in the Control group in Cohort 3. As the association is positive in Cohort 1 (the baseline cohort) and negative in Cohort 3, sign of interaction estimate is negative (the association becomes more negative when we move from the baseline cohort to a non-baseline cohort).

Note that a similar plot could also be used to illustrate a possible confounding effect.

knitr::opts_chunk$set(eval = TRUE)

16.3.5 Differential prevalence analysis (DPA)

We show here how to perform DPA with logistic regression. We show the case where we compare subjects with and without ADHD while controlling for cohort. Due to the convenient implementation of logistic regression for DPA, we employ here the maaslin3() function from the maaslin3 package (Nickols et al. 2026).

Nickols, William A., Thomas Kuntz, Jiaxian Shen, Sagun Maharjan, Himel Mallick, Eric A. Franzosa, Kelsey N. Thompson, Jacob T. Nearing, and Curtis Huttenhower. 2026. MaAsLin 3: Refining and Extending Generalized Multivariable Linear Models for Meta-Omic Association Discovery.” Nature Methods, January, 1–11. https://doi.org/10.1038/s41592-025-02923-9.

To run the maaslin3() function, we need to provide it with

  • a tse object,
  • a model formula, in this case ~ patient_status + cohort + total, and
  • a folder for output files

Note that we also control for the sequencing depth/library size/total read count (total) in the analysis. This is because the detection (observed presence) of taxa may depend on the sequencing depth.

library(scater)
library(maaslin3)

# Calculate the sequencing depth (the total number of reads/counts) for each
# sample
tse <- addPerCellQC(tse)

# Run maaslin3 with patient_status, cohort and reads as the predictor variables
obj_DPA_cov <- maaslin3(
    tse,
    formula = ~ patient_status + cohort + total, # Model formula
    output = "output", # The name of the folder for some output files
    plot_associations = FALSE, # To prevent some unnecessary plots
    plot_summary_plot = FALSE,
    verbosity = "ERROR"
) # To prevent some unnecessary output

Next, we extract the data.frame of results from the list object returned by maaslin3(). We remove the variable total from the output as it is merely a technical covariate that we are not interested in. We also calculate the 95% confidence intervals. Furthermore, as with Maaslin2, we calculate the q-values separately for patient_status and cohort. Lastly, we transform the results to a slightly more convenient format.

# Extract the results and calculate the 95% CIs and q-values
res_DPA_cov <- obj_DPA_cov$fit_data_prevalence$results |>
    filter(name != "total") |>
    mutate(
        ci_lwr = coef - qnorm(.975) * stderr,
        ci_upr = coef + qnorm(.975) * stderr
    ) |>
    group_by(metadata) |>
    mutate(qval = p.adjust(pval_individual, method = "fdr")) |>
    ungroup() |>
    select(feature, name, coef, pval = pval_individual, qval, ci_lwr, ci_upr)

# Print the results for taxa with the smallest p-values for patient_status
res_DPA_cov |>
    filter(name == "patient_statusADHD") |>
    arrange(pval) |>
    head() |>
    kable()

The results are now in the same format as the DAA results in Section 16.3.3. The only difference is coef now indicates the log-odds ratio for the difference in prevalences. The odds ratios (OR) would thus be calculated as or = exp(coef). Note also that the results are not provided for taxa that were present in all samples (as it is not sensible to analyze presence/absence differences for such taxa).

By examining the results we see that the smallest p-value was observed for genus Erysipelatoclostridium. As the sign of the estimate is negative is means that the genus is more prevalent in the control group. (This is can be seen from the prevalence bars of Erysipelatoclostridium in the strip plots in Section 16.3.2.1.)

16.3.5.1 DPA with an interaction

Contrary to Maaslin2(), the maaslin3() function supports also the analysis with interaction terms. Therefore, in the analysis with interaction(s), one can just add the interaction term(s) in to the model formula by using the standard R syntax, e.g., formula = ~ patient_status + cohort + patient_status:cohort + reads or alternatively formula = ~ patient_status * cohort + reads

16.3.6 Analysis of positive (non-zero) abundances

It may sometimes be biologically relevant to analyze only the positive (non-zero) relative abundances of taxa. We thus briefly show here how to perform such analysis using the maaslin3() function. Purely for illustration purposes, we show here the analysis with an interaction term.

We proceed as in the previous sections. The difference compared to DPA (Section 16.3.5) is that we now have to extract fit_data_abundance (and not fit_data_prevalence) from the list object returned by maaslin3(). We also need to set median_comparison_abundance = FALSE in the function call as we want to estimate the differences of (positive) relative abundances and not make any compositional adjustments.

# Run maaslin3 function.
obj_daa_pos_ia <- maaslin3(
    tse,
    output = "output",
    formula = ~ patient_status * cohort,
    median_comparison_abundance = FALSE,
    verbosity = "ERROR",
    plot_associations = FALSE,
    plot_summary_plot = FALSE
)

# Extract the results and calculate the 95% confidence intervals and q-values
res_daa_pos_ia <- obj_daa_pos_ia$fit_data_abundance$results |>
    filter(!is.na(pval_individual) & N_not_zero > 6) |>
    mutate(
        ci_lwr = coef - qt(.975, df = N_not_zero - 6) * stderr,
        ci_upr = coef + qt(.975, df = N_not_zero - 6) * stderr
    ) |>
    group_by(metadata) |>
    mutate(qval = p.adjust(pval_individual, method = "fdr")) |>
    ungroup() |>
    select(feature, name, coef,
        pval = pval_individual, qval, ci_lwr, ci_upr,
        N_not_zero
    )

# The results for the interaction between patient_status and cohort (when
# comparing Cohort_3 to Cohort_1) for the taxa with the smallest p-values
res_daa_pos_ia |>
    filter(name == "patient_statusADHD:cohortCohort_3") |>
    arrange(pval) |>
    head() |>
    kable()

The output is similar to the outputs in the previous sections. It, however, now includes also N_not_zero indicating the number of samples with positive (non-zero) relative abundances, that is, the sample size used in the analysis for each taxon.

The results are interpreted in a similar manner as in normal DAA (Section 16.3.4). For instance, the estimates (coef) indicate the log2-fold changes of positive relative abundances (as maaslin3() performs log2 transformation to the positive relative abundances).

16.3.7 Alternative options

16.3.7.1 Advanced compositionality adjustments

We have shown above (sections 16.3.2, 16.3.3, 16.3.4 and 16.3.6) how to perform DAA by analyzing relative abundances (TSS normalized counts) without any advanced adjustments for compositionality. As explained in Section 16.2, this choice was motivated by some recent benchmark studies that have employed only real data, especially the study by Gamboa-Tuz et al. (2025).

Gamboa-Tuz, Samuel D., Marcel Ramos, Eric Franzosa, Curtis Huttenhower, Nicola Segata, Sehyun Oh, and Levi Waldron. 2025. “Commonly Used Compositional Data Analysis Implementations Are Not Advantageous in Microbial Differential Abundance Analyses Benchmarked Against Biological Ground Truth.” bioRxiv, February, 2025.02.13.638109. https://doi.org/10.1101/2025.02.13.638109.
Roche, Kimberly E, and Sayan Mukherjee. 2022. “The Accuracy of Absolute Differential Abundance Analysis from Relative Count Data.” https://doi.org/10.1371/journal.pcbi.1010284.

We note, however, that in some benchmark studies such simple versions of DAA have performed worse to some more advanced methods when compared on simulated data with strong compositional effects (e.g. Roche and Mukherjee (2022), Yang and Chen (2022)). We therefore give below some options for incorporating more advanced compositionality adjustments in DAA.

  • In Maaslin2, the default normalization method (TSS) can be replaced by TMM normalization (Robinson and Oshlack 2010) by including NORM = "TMM" in the Maaslin2() function.
  • In the analysis of positive abundances with maaslin3, the adjustment for compositional bias can be enabled by setting median_comparison_abundance = TRUE in maaslin3() function.
  • One can use DAA methods that are specifically designed to account for the compositionality of microbiome data. Such methods include, for instance,
Robinson, Mark D., and Alicia Oshlack. 2010. “A Scaling Normalization Method for Differential Expression Analysis of RNA-Seq Data.” Genome Biology 11 (March): 1–9. https://doi.org/10.1186/GB-2010-11-3-R25/FIGURES/3.
Lin, Huang, and Shyamal Das Peddada. 2020. “Analysis of Compositions of Microbiomes with Bias Correction.” Nature Communications 11 (1): 1–11. https://doi.org/https://doi.org/10.1038/s41467-020-17041-7.
Zhou, Huijuan, Kejun He, Jun Chen, and Xianyang Zhang. 2022. LinDA: Linear Models for Differential Abundance Analysis of Microbiome Compositional Data.” Genome Biology 23 (1): 95. https://doi.org/10.1186/s13059-022-02655-5.
Yang, Lu, and Jun Chen. 2022. “A Comprehensive Evaluation of Microbial Differential Abundance Analysis Methods: Current Status and Potential Solutions.” Microbiome 10 (130): 2049–2618. https://doi.org/10.1186/s40168-022-01320-0.

16.3.7.2 A non-parametic alternative to MaAsLin2

As MaAsLin2 analyzes log transformed relative abundances, it needs to replace the zeros by some positive numbers before the transformation. If one wants to avoid this somewhat arbitrary imputation, we suggest performing DAA by using non/semi-parametric ordinal regression models (ORM). ORM can be seen as an extension of the familiar Wilcoxon test in the sense that it provides p-values similar to those of the Wilcoxon test in the simple two group comparison, but it can also incorporate covariates (and interaction terms) (Harrell (2001), Liu et al. (2017)). ORM has also performed well in one benchmark study (Pelto et al. 2025).

Harrell, Frank E. 2001. Regression Modeling Strategies. Springer New York. https://doi.org/10.1007/978-1-4757-3462-1.
Liu, Qi, Bryan E. Shepherd, Chun Li, and Frank E. Harrell. 2017. “Modeling Continuous Response Variables Using Ordinal Regression.” Statistics in Medicine 36 (November): 4316. https://doi.org/10.1002/SIM.7433.
Pelto, Juho, Kari Auranen, Janne V Kujala, and Leo Lahti. 2025. “Elementary Methods Provide More Replicable Results in Microbial Differential Abundance Analysis.” Briefings in Bioinformatics 26 (March): 130. https://doi.org/10.1093/BIB/BBAF130.

As there does not currently exist a good implementation of ORM for DAA one must use a custom function. Such custom function with application examples is given in Appendix C. We also note that, in contrast to Maaslin2() function, the custom function provides confidence intervals and q-values (calculated for the variable of interest) automatically in its output.

16.3.7.3 A more powerful alternative to MaAsLin3 in DPA

While the maaslin3 package provides a convenient implementation of logistic regression for microbial DPA, we note, that it may provide somewhat conservative results, that is, it rarely provides false positive results but it may fail to detect some true effects. To obtain higher statistical power, we recommend using a version of logistic regression introduced by Firth (1993). As there currently exists no convenient implementation of Firth logistic regression for DPA, one needs to use a custom function. Such custom function with application examples is given in the Appendix C.

Firth, David. 1993. “Bias Reduction of Maximum Likelihood Estimates.” Biometrika 80 (March): 27. https://doi.org/10.2307/2336755.
TipExercises

Goal: The goal of these exercises it to learn how to perform differential abundance analysis and how to visualize the results.

Exercises: Differential abundance analysis (DAA)

  1. Load any of the example data sets mentioned in Section 4.2.

  2. Observe colData. Ensure that the data includes a meaningful way to group samples (e.g., Region, Treatment, Diagnosis). If it does not, choose another dataset.

  3. Agglomerate the data to the genus level and filter by a prevalence of 10%.

  4. Run the basic two-group DAA.

  5. Explore the results tables. Are there any taxa with significant abundance differences?

  6. Visualize the results (for significant taxa) with strip plots and boxplots. Can you “see” how the abundances of (possible) significant taxa differ between the groups?

  7. Run the same analysis now with an additional covariate(s). Do the results differ? Why?

  8. You can try also more complex analysis, e.g. with several covariates and/or with interactions.

  9. Bonus 1: Try also differential prevalence analysis (DPA) and/or the analysis of positive abundances with your data.

  10. Bonus 2: Try to run DAA and DPA with ordinal regression model and Firth logistic regression with the custom functions provided in the extra materials. See how the results differ from the results obtained with Maaslin2() and maaslin3().

Useful functions:

SummarizedExperiment::colData(), mia::subsetByPrevalent(), Maaslin2::Maaslin2(), maaslin3::maaslin3()

Back to top