Differential Abundance (DA) analysis is used to identify taxa that are significantly more or less abundant in the condition compared to control.
Many methods are available including:
ALDEx2
ANCOMBC
LinDA
A few things to keep in minds when performing DAA involve:
DAA software normally takes the counts assay as input, because they apply normalisation suitable for count data
DAA results will be more reproducible if the extremely rare taxa and singletons are removed in advance
It is recommended to run different methods on the same data and compare the results
First, we import Tengeler2020 and load the DA library MicrobiomeStats
.
[1] "Taxa: 151, Mean abundance: 119.19"
For DA analysis, it is preferable to reduce the dimensionality and sparsity of the data.
Here, we run LinDA. We first extract the counts assay and convert it into a dataframe.
We also need to select the columns of the colData which contain the independent variables you want to include in the model.
We are ready to run LinDA, which takes the assay count (otu.tab
) and the variable arrays (meta
). A formula for the model with main independent variable + covariates should be defined. The other arguments are optional but good to know.
Finally, we select significantly DA taxa and list it in Table 1.
signif_res <- res$output$patient_statusControl |>
filter(reject) |>
select(stat, padj) |>
arrange(padj)
knitr::kable(signif_res)
stat | padj | |
---|---|---|
[Ruminococcus]_gauvreauii_group | 4.891159 | 0.0024419 |
Faecalibacterium | -4.694520 | 0.0024419 |
Catabacter | -3.616601 | 0.0236808 |
Erysipelatoclostridium | 3.357042 | 0.0334163 |
Ruminococcaceae_UCG-014 | -3.224143 | 0.0368033 |
Extra: