Differential Abundance

Overview

Differential Abundance (DA) analysis is used to identify taxa that are significantly more or less abundant in the condition compared to control.

Many methods are available including:

ALDEx2
ANCOMBC
LinDA

A few things to keep in minds when performing DAA involve:

DAA software normally takes the counts assay as input, because they apply normalisation suitable for count data
DAA results will be more reproducible if the extremely rare taxa and singletons are removed in advance
It is recommended to run different methods on the same data and compare the results

Example 1.1: Preparing for DA

First, we import Tengeler2020 and load the DA library MicrobiomeStats.

library(mia)
library(MicrobiomeStat)
library(tidyverse)

# Import Tengeler2020
data("Tengeler2020", package = "mia")
tse <- Tengeler2020

Show code

mean_abund <- round(mean(rowMeans(assay(tse, "counts"))), 2)
paste0("Taxa: ", nrow(tse), ", Mean abundance: ", mean_abund)

[1] "Taxa: 151, Mean abundance: 119.19"

For DA analysis, it is preferable to reduce the dimensionality and sparsity of the data.

# Agglomerate by Genus and filter by prevalence and detection
tse_genus <- agglomerateByPrevalence(tse,
                                     rank = "Genus",
                                     detection = 0.001,
                                     prevalence = 0.1)

Show code

mean_abund_genus <- round(mean(rowMeans(assay(tse_genus, "counts"))), 2)
paste0("Taxa: ", nrow(tse_genus), ", Mean abundance: ", mean_abund_genus)

[1] "Taxa: 49, Mean abundance: 355.52"

Example 1.2: Performing DA

Here, we run LinDA. We first extract the counts assay and convert it into a dataframe.

otu.tab <- assay(tse_genus, "counts") |>
  as.data.frame()

We also need to select the columns of the colData which contain the independent variables you want to include in the model.

meta <- colData(tse) |>
  as.data.frame() |>
  select(patient_status, cohort)

We are ready to run LinDA, which takes the assay count (otu.tab) and the variable arrays (meta). A formula for the model with main independent variable + covariates should be defined. The other arguments are optional but good to know.

res <- linda(otu.tab, meta,
             formula = "~ patient_status + cohort", 
             feature.dat.type = "count")

0  features are filtered!
The filtered data has  27  samples and  49  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.

Example 1.3: Interpreting Results

Finally, we select significantly DA taxa and list it in Table 1.

signif_res <- res$output$patient_statusControl |>
  filter(reject) |>
  select(stat, padj) |>
  arrange(padj)

knitr::kable(signif_res)

Table 1: DA bacterial genera. If stat > 0, abundance is higher in control, otherwise it is higher in ADHD.

	stat	padj
[Ruminococcus]_gauvreauii_group	4.891159	0.0024419
Faecalibacterium	-4.694520	0.0024419
Catabacter	-3.616601	0.0236808
Erysipelatoclostridium	3.357042	0.0334163
Ruminococcaceae_UCG-014	-3.224143	0.0368033

Exercise 1

DA analysis with LinDA: exercise 8.2
DA analysis with ALDEx2: exercise 8.1

Extra:

comparing DA methods: exercise 8.3

Resources

OMA Chapter - Differential Abundance