Alpha Diversity

Overview

This notebook guides you through a basic alpha diversity analysis, where you first estimate alpha diversity in terms of a few indices, plot them for the different study groups and compare the results for the different indices.

The following packages are needed to succesfully run the examples in this notebook:

library(mia)
library(scater)

Example 1.1: Estimation

First of all, we import Tengeler2020 from the mia package and store it into a variable.

# load dataset and store it into tse
data("Tengeler2020", package = "mia")
tse <- Tengeler2020

We calculate alpha diversity in terms of coverage, Shannon, inverse Simpson and Faith indices based on the counts assay. The first three indices differ from one another in how much weight they give to rare taxa: coverage considers all taxa equally important, whereas Shannon and - even more - Simpson give more importance to abundant taxa. Unlike all others, Faith index measures the phylogenetic diversity and thus requires a phylogenetic tree (stored as rowTree in the TreeSE).

# estimate the specified indices based on the counts assay
tse <- estimateDiversity(tse,
                         assay.type = "counts",
                         index = "shannon")

Example 1.2: Visualisation

Next, we plot the four indices, with patient status on the x axis and alpha diversity on the y axis. We can also colour by cohort to check for batch effects.

# Plot shannon diversity vs patient_status
p_shannon <- plotColData(tse, "shannon", "patient_status",
                         colour_by = "cohort", show_median = TRUE) +
  xlab("Patient Status")

p_shannon

Example 1.3: Comparison

The three metrics for alpha diversity follow different scales, but they seem to agree when comparing the distributions of the two patient groups.

Show code
library(patchwork)

# Calculate diversity metrics
tse <- estimateDiversity(tse, assay.type = "counts",
                         index = c("coverage", "inverse_simpson", "faith"))

# Generate a plot for each metric
plots <- lapply(c("coverage", "shannon", "inverse_simpson", "faith"),
                plotColData, object = tse, x = "patient_status",
                colour_by = "cohort", show_median = TRUE)

# Combine plots
wrap_plots(plots) +
  plot_layout(guides = "collect") +
  plot_annotation(tag_levels = "A")

Exercise 1

Extra:

This two exercise could be explained in a second example.

Resources