data("Tengeler2020", package = "mia")
<- Tengeler2020 tse
Differential Abundance
Overview
Differential Abundance Analysis (DAA) is used to identify taxa that are significantly more or less abundant in the condition compared to control. For more details, read this chapter of the OMA book.
Many methods are available including:
ALDEx2
ANCOMBC
LinDA
A few things to keep in minds when performing DAA involve:
DAA software normally takes the counts assay as input, because they apply normalisation suitable for count data
DAA results will be more reproducible if the extremely rare taxa and singletons are removed in advance
It is recommended to run different methods on the same data and compare the results
Preparing for DAA
Before performing DAA, it is important to agglomerate to a meaningful taxonomic rank and select only taxa above a certain prevalence and detection threshold, as this has been shown to make results more reproducible.
<- agglomerateByPrevalence(tse,
tse_genus rank = "Genus",
detection = 0.001,
prevalence = 0.1)
Performing DAA
For this tutorial, we run the LinDA method. We first extract the counts assay and convert it into a data frame.
<- assay(tse_genus, "counts") |>
otu.tab as.data.frame()
We also need to select the columns of the colData which contain the independent variables you want to include in the model.
<- colData(tse) |>
meta as.data.frame() |>
select(patient_status, cohort)
We are ready to run LinDA, which takes the assay count (otu.tab
) and the variable arrays (meta
). A formula for the model with main independent variable + covariates should be defined. The other arguments are optional but good to know.
<- linda(otu.tab,
res
meta,formula = "~ patient_status + cohort",
alpha = 0.05,
prev.filter = 0,
mean.abund.filter = 0,
feature.dat.type = "count")
0 features are filtered!
The filtered data has 27 samples and 49 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
Interpreting results
Finally, we select significantly DA taxa and list it in Table 1.
<- res$output$patient_statusControl |>
signif_res filter(reject) |>
select(stat, padj) |>
arrange(padj)
::kable(signif_res) knitr
stat | padj | |
---|---|---|
Faecalibacterium | -4.694520 | 0.0024419 |
[Ruminococcus]_gauvreauii_group | 4.891159 | 0.0024419 |
Catabacter | -3.616601 | 0.0236808 |
Erysipelatoclostridium | 3.357042 | 0.0334163 |
Ruminococcaceae_UCG-014 | -3.224143 | 0.0368033 |
Good job reading this tutorial. Now go this chapter of the OMA book and try out other DAA methods on Tengeler2020.