Differential Abundance

Overview

Differential Abundance Analysis (DAA) is used to identify taxa that are significantly more or less abundant in the condition compared to control. For more details, read this chapter of the OMA book.

Many methods are available including:

  • ALDEx2

  • ANCOMBC

  • LinDA

A few things to keep in minds when performing DAA involve:

  • DAA software normally takes the counts assay as input, because they apply normalisation suitable for count data

  • DAA results will be more reproducible if the extremely rare taxa and singletons are removed in advance

  • It is recommended to run different methods on the same data and compare the results

Preparing for DAA

Before performing DAA, it is important to agglomerate to a meaningful taxonomic rank and select only taxa above a certain prevalence and detection threshold, as this has been shown to make results more reproducible.

data("Tengeler2020", package = "mia")
tse <- Tengeler2020
tse_genus <- agglomerateByPrevalence(tse,
                                     rank = "Genus",
                                     detection = 0.001,
                                     prevalence = 0.1)

Performing DAA

For this tutorial, we run the LinDA method. We first extract the counts assay and convert it into a data frame.

otu.tab <- assay(tse_genus, "counts") |>
  as.data.frame()

We also need to select the columns of the colData which contain the independent variables you want to include in the model.

meta <- colData(tse) |>
  as.data.frame() |>
  select(patient_status, cohort)

We are ready to run LinDA, which takes the assay count (otu.tab) and the variable arrays (meta). A formula for the model with main independent variable + covariates should be defined. The other arguments are optional but good to know.

res <- linda(otu.tab,
             meta,
             formula = "~ patient_status + cohort", 
             alpha = 0.05, 
             prev.filter = 0, 
             mean.abund.filter = 0,
             feature.dat.type = "count")
0  features are filtered!
The filtered data has  27  samples and  49  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.

Interpreting results

Finally, we select significantly DA taxa and list it in Table 1.

signif_res <- res$output$patient_statusControl |>
  filter(reject) |>
  select(stat, padj) |>
  arrange(padj)

knitr::kable(signif_res)
Table 1: DA bacterial genera. If stat > 0, abundance is higher in control, otherwise it is higher in ADHD.
stat padj
Faecalibacterium -4.694520 0.0024419
[Ruminococcus]_gauvreauii_group 4.891159 0.0024419
Catabacter -3.616601 0.0236808
Erysipelatoclostridium 3.357042 0.0334163
Ruminococcaceae_UCG-014 -3.224143 0.0368033

Good job reading this tutorial. Now go this chapter of the OMA book and try out other DAA methods on Tengeler2020.