Differential Abundance

Overview

Differential Abundance (DA) analysis is used to identify taxa that are significantly more or less abundant in the condition compared to control.

Many methods are available including:

  • ALDEx2

  • ANCOMBC

  • LinDA

A few things to keep in minds when performing DAA involve:

  • DAA software normally takes the counts assay as input, because they apply normalisation suitable for count data

  • DAA results will be more reproducible if the extremely rare taxa and singletons are removed in advance

  • It is recommended to run different methods on the same data and compare the results

Example 1.1: Preparing for DA

First, we import Tengeler2020 and load the DA library MicrobiomeStats.

library(mia)
library(MicrobiomeStat)
library(tidyverse)

# Import Tengeler2020
data("Tengeler2020", package = "mia")
tse <- Tengeler2020
Show code
mean_abund <- round(mean(rowMeans(assay(tse, "counts"))), 2)
paste0("Taxa: ", nrow(tse), ", Mean abundance: ", mean_abund)
[1] "Taxa: 151, Mean abundance: 119.19"

For DA analysis, it is preferable to reduce the dimensionality and sparsity of the data.

# Agglomerate by Genus and filter by prevalence and detection
tse_genus <- agglomerateByPrevalence(tse,
                                     rank = "Genus",
                                     detection = 0.001,
                                     prevalence = 0.1)
Show code
mean_abund_genus <- round(mean(rowMeans(assay(tse_genus, "counts"))), 2)
paste0("Taxa: ", nrow(tse_genus), ", Mean abundance: ", mean_abund_genus)
[1] "Taxa: 49, Mean abundance: 355.52"

Example 1.2: Performing DA

Here, we run LinDA. We first extract the counts assay and convert it into a dataframe.

otu.tab <- assay(tse_genus, "counts") |>
  as.data.frame()

We also need to select the columns of the colData which contain the independent variables you want to include in the model.

meta <- colData(tse) |>
  as.data.frame() |>
  select(patient_status, cohort)

We are ready to run LinDA, which takes the assay count (otu.tab) and the variable arrays (meta). A formula for the model with main independent variable + covariates should be defined. The other arguments are optional but good to know.

res <- linda(otu.tab, meta,
             formula = "~ patient_status + cohort", 
             feature.dat.type = "count")
0  features are filtered!
The filtered data has  27  samples and  49  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.

Example 1.3: Interpreting Results

Finally, we select significantly DA taxa and list it in Table 1.

signif_res <- res$output$patient_statusControl |>
  filter(reject) |>
  select(stat, padj) |>
  arrange(padj)

knitr::kable(signif_res)
Table 1: DA bacterial genera. If stat > 0, abundance is higher in control, otherwise it is higher in ADHD.
stat padj
[Ruminococcus]_gauvreauii_group 4.891159 0.0024419
Faecalibacterium -4.694520 0.0024419
Catabacter -3.616601 0.0236808
Erysipelatoclostridium 3.357042 0.0334163
Ruminococcaceae_UCG-014 -3.224143 0.0368033

Exercise 1

Extra:

Resources