# load dataset and store it into tse
data("Tengeler2020", package = "mia")
<- Tengeler2020 tse
Alpha Diversity
Overview
This notebook guides you through a basic alpha diversity analysis, where you first estimate alpha diversity in terms of a few indices, plot them for the different study groups and compare the results for the different indices.
The following packages are needed to succesfully run the examples in this notebook:
mia: tools for microbiome data analysis
scater: plotting data from TreeSummarizedExperiments
patchwork: combining plots
Estimation
First of all, we import Tengeler2020 from the mia package and store it into a variable.
We calculate alpha diversity in terms of coverage, Shannon, inverse Simpson and Faith indices based on the counts assay. The first three indices differ from one another in how much weight they give to rare taxa: coverage considers all taxa equally important, whereas Shannon and - even more - Simpson give more importance to abundant taxa. Unlike all others, Faith index measures the phylogenetic diversity and thus requires a phylogenetic tree (stored as rowTree
in the TreeSE).
# estimate the specified indices based on the counts assay
<- estimateDiversity(tse,
tse assay.type = "counts",
index = c("coverage", "shannon",
"inverse_simpson", "faith"))
Visualisation
Next, we plot the four indices, with patient status on the x axis and alpha diversity on the y axis. We can also colour by cohort to check for batch effects.
# plot coverage vs patient_status
<- plotColData(tse, "coverage", "patient_status",
p_coverage colour_by = "cohort", show_median = TRUE) +
labs(x = "Patient Status")
# plot shannon diversity vs patient_status
<- plotColData(tse, "shannon", "patient_status",
p_shannon colour_by = "cohort", show_median = TRUE) +
labs(x = "Patient Status")
# plot inverse simpson index vs patient_status
<- plotColData(tse, "inverse_simpson", "patient_status",
p_simpson colour_by = "cohort", show_median = TRUE) +
labs(x = "Patient Status")
# plot faith index vs patient_status
<- plotColData(tse, "faith", "patient_status",
p_faith colour_by = "cohort", show_median = TRUE) +
labs(x = "Patient Status")
The three metrics for alpha diversity follow different scales, but they seem to agree when comparing the distributions of the two patient groups (see Figure 1).
# combine the plots with patchwork
<- (p_coverage | p_shannon) / (p_simpson | p_faith) +
p plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")
p