6  Community Diversity

Community diversity is a central concept in microbiome research. A number of diversity indices are available in ecological literature.

The main categories of diversity indices include species richness, evenness, and diversity; each of them emphasizes different aspects of the community heterogeneity (Whittaker 1960), (Willis 2019). The Hill coefficient combines many standard indices into a single equation that provides observed richness, inverse Simpson, Shannon diversity, and generalized diversity as special cases, with varying levels of emphasis on species abundance values. Thus, the term alpha diversity is often used to refer collectively to all these variants.

Whittaker, R. H. 1960. “Vegetation of the Siskiyou Mountains, Oregon and California.” Ecological Monographs 30 (3): 279–338. https://doi.org/https://doi.org/10.2307/1943563.
Willis, Amy D. 2019. “Rarefaction, Alpha Diversity, and Statistics.” Frontiers in Microbiology 10. https://doi.org/10.3389/fmicb.2019.02407.
Faith, D. P. 1992. “Conservation Evaluation and Phylogenetic Diversity.” Biological Conservation 61 (1): 10. https://doi.org/https://doi.org/10.1016/0006-3207(92)91201-3.
W, Kembel Steven, Cowan Peter D, Helmus Matthew R, Cornwell William K, Morlon Helene, Ackerly David D, Blomberg Simon P, and Webb Campbell O. 2010. Picante: R tools for integrating phylogenies and ecology.” Bioinformatics 26 (11): 1463–64. https://doi.org/https://doi.org/10.1093/bioinformatics/btq166.

Diversity (estimateDiversity), summarizes the distribution of species abundances in a given sample into a single number that depends on both species richness and evenness (see below). Diversity indices measure the overall community heterogeneity that considers both of these aspects simultaneously. A number of ecological diversity measures are available. In general, diversity increases together with increasing richness and evenness. Phylogenetic diversity (PD), (Faith 1992) is a variant that incorporates information from phylogenetic relationships between species, unlike most other commonly used diversity indices. The estimateDiversity function uses a faster re-implementation of the widely used function in picante W et al. (2010). The method uses the default rowTree from the TreeSummarizedExperiment object (tse).

Richness (estimateRichness) refers to the total number of species in a community (sample). The simplest richness index is the number of species observed in a sample (observed richness). Assuming limited sampling from the community, however, this may underestimate the true species richness. Several estimators have been developed to address this, including for instance ACE (A and SM 1992) and Chao1 (A 1984) indices. Richness estimates do not aim to characterize variations in species abundances.

A, Chao, and Lee SM. 1992. “Estimating the Number of Classes via Sample Coverage.” Journal of the American Statistical Association 87 (417): 210–17. https://doi.org/https://doi.org/10.1080/01621459.1992.10475194.
A, Chao. 1984. “Non-Parametric Estimation of the Number of Classes in a Population.” Scandinavian Journal of Statistics 11 (4): 265–70. https://www.jstor.org/stable/4615964.

Evenness (estimateEvenness) focuses on the distribution of species abundances, and it can thus complement the number of species. Pielou’s evenness is a commonly used index, obtained by normalizing Shannon diversity by (the natural logarithm of) observed richness.

These main classes of alpha diversity are sometimes complemented by indices of dominance or rarity:

Dominance (estimateDominance) indices are in general negatively correlated with alpha diversity. A high dominance is obtained when one or few species have a high share of the total species abundance in the community. Note that dominance indices are generally inversely correlated with other alpha diversity indices.

Rarity (estimateRarity) indices characterize the concentration of species at low abundance. Prevalence and detection thresholds determine rare species whose total concentration will determine the value of a rarity index.

6.1 Alpha diversity estimation in practice

Alpha diversity can be estimated with wrapper functions that interact with other packages implementing the calculation, such as vegan (Oksanen et al. 2020).

Oksanen, Jari, F. Guillaume Blanchet, Michael Friendly, Roeland Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, et al. 2020. Vegan: Community Ecology Package. https://CRAN.R-project.org/package=vegan.

These functions calculate the given indices, and add them to the colData slot of the SummarizedExperiment object with the given name.

The estimated values can then be retrieved and analyzed directly from the colData, for example by plotting them using plotColData from the scater package (McCarthy et al. 2020). Here, we use the observed species as a measure of richness.

McCarthy, Davis, Kieran Campbell, Aaron Lun, and Quin Wills. 2020. Scater: Single-Cell Analysis Toolkit for Gene Expression Data in r. http://bioconductor.org/packages/scater/.
# Let us first load some example data.
library(mia)
data("GlobalPatterns", package="mia")
tse <- GlobalPatterns

# Estimate (observed) richness
tse <- mia::estimateRichness(tse, 
                             assay.type = "counts", 
                             index = "observed", 
                             name="observed")

# Check some of the first values in colData
head(tse$observed)
##      CL3     CC1     SV1 M31Fcsw M11Fcsw M31Plmr 
##     6964    7679    5729    2667    2574    3214

Let us visualize results against selected colData variables (sample type and final barcode).

library(scater)
plotColData(tse, 
            "observed", 
            "SampleType", 
            colour_by = "Final_Barcode") +
    theme(axis.text.x = element_text(angle=45,hjust=1)) + 
  labs(y=expression(Richness[Observed]))

Shannon diversity estimates plotted grouped by sample type with colour-labeled barcode.

6.2 Visualizing alpha diversities

6.2.1 Comparing alpha diversity measures

Let us now compare the estimated Shannon and Faith indices across samples. First we add Shannon and Faith indices.

tse <- mia::estimateDiversity(tse, 
                              assay.type = "counts",
                              index = c("shannon", "faith"), 
                              name = c("shannon", "faith"))

Scatterplot for the estimated Shannon and Faith indices across samples.

ggplot(colData(tse), aes(x=shannon, y=faith)) +
  geom_point() +
  labs(x="Shannon index", y="Faith (phylogenetic) index")

6.2.2 Alpha diversity measures and sample grouping

Let us visualize results from all alpha diversity measures calculated above against a given sample grouping available in colData (here, sample type). These have been readily stored in the colData slot, and they are thus directly available for plotting.

library(patchwork)

# Create the plots
plots <- lapply(c("observed", "shannon", "faith"),
                plotColData,
                object = tse,
                x = "SampleType",
                colour_by = "SampleType")

# Fine-tune visual appearance
plots <- lapply(plots, "+", 
                theme(axis.text.x = element_blank(),
                      axis.title.x = element_blank(),
                      axis.ticks.x = element_blank()))

# Plot the figures
(plots[[1]] | plots[[2]] | plots[[3]]) +
  plot_layout(guides = "collect")

6.2.3 Visualizing significance in group-wise comparisons

Let us next compare Shannon index between sample groups using the standard ggplot tools, and illustrate individual data points with geom_jitter.

The geom_signif function provides tools to test whether these differences are statistically significant; the function adds (adjusted) p-values in the plot.

library(ggsignif)
library(ggplot2)

# Determine the different combinations for significance testing
comb <- split(t(combn(levels(tse$SampleType), 2)), 
           seq(nrow(t(combn(levels(tse$SampleType), 2)))))

ggplot(colData(tse), aes(x = SampleType, y = shannon)) +
  # Outliers are removed, because otherwise each 
  # data point would be plotted twice; 
  # as an outlier of boxplot and as a point of dotplot.
  geom_boxplot(outlier.shape = NA) + 
  geom_jitter(width = 0.2) + 
  geom_signif(comparisons = comb,
              map_signif_level = FALSE,
              correction="fdr") + #corrects the p-values
  theme(text = element_text(size = 10))

The ggpubr package provides further flexibility for estimating and highlighting the significances.

Back to top