Chapter 6 Beta diversity
Beta diversity is another name for sample dissimilarity. It quantifies differences in the overall taxonomic composition between two samples.
Common indices include Bray-Curtis, Unifrac, Jaccard index, and the Aitchison distance. For more background information and examples, you can check the dedicated section in online book.
6.1 Examples of PCoA with different settings
Beta diversity estimation generates a (dis)similarity matrix that contains for each sample (rows) the dissimilarity to any other sample (columns).
This complex set of pairwise relations can be visualized in informative ways, and even coupled with other explanatory variables. As a first step, we compress the information to a lower dimensionality, or fewer principal components, and then visualize sample similarity based on that using ordination techniques, such as Principal Coordinate Analysis (PCoA). PCoA is a non-linear dimension reduction technique, and with Euclidean distances it is is identical to the linear PCA (except for potential scaling).
We typically retain just the two (or three) most informative top components, and ignore the other information. Each sample has a score on each of these components, and each component measures the variation across a set of correlated taxa. The top components are then easily visualized on a two (or three) dimensional display.
Let us next look at some concrete examples.
6.1.1 PCoA for ASV-level data with Bray-Curtis
Let us start with PCoA based on a Bray-Curtis dissimilarity matrix.
# Pick the relative abundance table
<- assays(mae[[1]])$relabundance
rel_abund_assay
# Calculates Bray-Curtis distances between samples. Because taxa is in
# columns, it is used to compare different samples. We transpose the
# assay to get taxa to columns
<- vegan::vegdist(t(rel_abund_assay), method = "bray")
bray_curtis_dist
# PCoA
<- ecodist::pco(bray_curtis_dist)
bray_curtis_pcoa
# All components could be found here:
# bray_curtis_pcoa$vectors
# But we only need the first two to demonstrate what we can do:
<- data.frame(pcoa1 = bray_curtis_pcoa$vectors[,1],
bray_curtis_pcoa_df pcoa2 = bray_curtis_pcoa$vectors[,2])
# Create a plot
<- ggplot(data = bray_curtis_pcoa_df, aes(x=pcoa1, y=pcoa2)) +
bray_curtis_plot geom_point() +
labs(x = "PC1",
y = "PC2",
title = "Bray-Curtis PCoA") +
theme_bw(12) # makes titles smaller
bray_curtis_plot
6.2 Highlighting external variables
We can map other variables on the same plot for example by coloring the points accordingly.
The following is an example with a discrete grouping variable (Diet) shown with colors:
# Add diet information to data.frame
$Diet <- colData(mae)$Diet
bray_curtis_pcoa_df
# Creates a plot
<- ggplot(data = bray_curtis_pcoa_df, aes_string(x = "pcoa1", y = "pcoa2", color = "Diet")) +
plot geom_point() +
labs(x = "PC1",
y = "PC2",
title = "Bray-Curtis PCoA") +
theme_bw(12)
plot
6.3 Estimating associations with an external variable
Next to visualizing whether any variable is associated with differences between samples, we can also quantify the strength of the association between community composition (beta diversity) and external factors.
The standard way to do this is to perform a so-called permutational multivariate analysis of variance (PERMANOVA). This method takes as input the abundance table, which measure of distance you want to base the test on and a formula that tells the model how you think the variables are associated with each other.
# First we get the relative abundance table
<- assays(mae[[1]])$relabundance
rel_abund_assay
# again transpose it to get taxa to columns
<- t(rel_abund_assay)
rel_abund_assay
# then we can perform the method
<- vegan::adonis(rel_abund_assay ~ Diet,
permanova_diet data = colData(mae),
permutations = 99)
# we can obtain a the p value for our predictor:
print(paste0("The test result p-value: ",
as.data.frame(permanova_diet$aov.tab)["Diet", "Pr(>F)"]))
## [1] "The test result p-value: 0.01"
The diet variable is significantly associated with microbiota composition (p-value is less than 0.05).
We can visualize those taxa whose abundances drive the differences between diets. We first need to extract the model coefficients of taxa:
# Gets the coefficients
<- coefficients(permanova_diet)["Diet1",]
coef
# Gets the highest coefficients
<- sort(head(coef[rev(order(abs(coef)))],20))
top.coef
# Plots the coefficients
<- ggplot(data.frame(x = top.coef,
top_taxa_coeffient_plot y = factor(names(top.coef),
unique(names(top.coef)))),
aes(x = x, y = y)) +
geom_bar(stat="identity") +
labs(x="", y="", title="Top Taxa") +
theme_bw()
top_taxa_coeffient_plot
The above plot shows taxa as code names, and it is hard to tell which bacterial groups they represent. However, it is easy to add human readable names. We can fetch those from our rowData. Here we use Genus level names:
# Gets corresponding Genus level names and stores them to top.coef
<- rowData(mae[[1]])[names(top.coef), ][,"Genus"]
names
# Adds new labels to the plot
<- top_taxa_coeffient_plot +
top_taxa_coeffient_plot scale_y_discrete(labels = names) # Adds new labels
top_taxa_coeffient_plot
The same test can be conducted using the ordination from PCoA as follows:
$Diet <- colData(mae)$Diet
bray_curtis_pcoa_df<- list()
p_values for(pc in c("pcoa1", "pcoa2")){
# Creates a formula from objects
<- as.formula(paste0(pc, " ~ ", "Diet"))
formula # Does the permanova analysis
<- vegan::adonis(formula, data = bray_curtis_pcoa_df,
p_values[[pc]] permutations = 9999, method = "euclidean"
$aov.tab["Diet", "Pr(>F)"]
)
}
# Creates a plot
<- ggplot(data = bray_curtis_pcoa_df, aes_string(x = "pcoa1", y = "pcoa2", color = "Diet")) +
plot geom_point(size = 3) +
labs(title = paste0("PCoA beta diversity ordination for microbiome samples"), x = paste0("PC1 (p = ", p_values[["pcoa1"]], ")"), y = paste0("PC2 (p = ", p_values[["pcoa2"]], ")")) +
theme_bw()
plot
There are many alternative and complementary methods for analysing community composition. For more examples, see a dedicated section on beta diversity in the online book.
6.4 Community typing
A dedicated section presenting examples on community typing is in the online book.