Principal Coordinate Analysis
Why PCoA?
Principal Coordinate Analysis (PCoA) is a method to study beta diversity, or the diversity between microbial communities from different samples.
Example 1.1
To get started, we import Tengeler2020 from mia and store it into a variable.
# Load libraries and import dataset
library(mia)
library(scater)
data("Tengeler2020", package = "mia")
tse <- Tengeler2020
After that, we transform the counts assay to relative abundances and store the new assay back into the TreeSE.
# Transform counts to relative abundance
tse <- transformAssay(tse, method = "relabundance")
Example 1.2: Ordination
Here, we run PCoA on the relative abundance assay to reduce the dimensionality of the data. We set method
to Bray-Curtis dissimilarity.
# Run PCoA with Bray-Curtis dissimilarity
tse <- runMDS(tse,
assay.type = "relabundance",
FUN = vegan::vegdist,
method = "bray",
name = "Bray")
The reduced dimensions can be retrieved with reducedDim
.
head(reducedDim(tse, "Bray"))
[,1] [,2]
A110 0.35648020 0.11979800
A12 -0.01837414 -0.35553853
A15 0.14308108 0.36730630
A19 0.03031075 -0.34670413
A21 0.07539063 -0.31823525
A23 0.17993859 0.09476674
Example 1.2: Visualisation
Then, we visualise the first two dimensions.
# Visualise Bray-Curtis dissimilarity between samples
plotReducedDim(tse, "Bray", colour_by = "patient_status", shape_by = "cohort")
Example 2.1: PCA
When Euclidean distance is used, PCoA reduces to PCA.
tse <- runPCA(tse,
assay.type = "relabundance",
ncomp = 5,
name = "Euclidean")
We can retrieve a list of all reduced dimensions with reducedDimNames
.
If you need only their names, these can be accessed with reducedDims
.
List of length 2
names(2): Bray Euclidean
Example 2.2: PCA Contributors
Some taxa contribute more than others to the generation of reduced dimensions. They can be determined from the PCA loadings.
Show code
# Define function to visualise top contributors
plot_loads <- function(loads, comp, n = 6) {
loads <- sort(loads[ , comp], decreasing = TRUE)
loads <- c(head(loads, round(n / 2)), tail(loads, round(n / 2)))
barplot(loads, horiz = TRUE, las = 1,
cex.names = 0.5, main = comp, xlab = "Loadings")
}
The top PCA loadings for the first and fourth dimension are visualised below.
Show code
# Fetch feature loadings
taxa_loads <- attr(reducedDim(tse, "Euclidean"), "rotation")
# Plot loadings for PCA 1 and 4
par(mfrow = c(1, 2))
plot_loads(taxa_loads, "PC1")
plot_loads(taxa_loads, "PC4")
Example 2.3: PCoA Contributors
PCoA does not return feature loadings, but they can be estimated as the correlation between taxa and reduced dimensions.
Show code
# Compute correlation between features and reduced dimensions
comp_loads <- apply(
assay(tse, "relabundance"),
MARGIN = 1, simplify = FALSE,
function(x) cor(x, reducedDim(tse, "Bray"), method = "kendall")
)
# Prepare matrix of feature loadings
taxa_loads <- do.call(rbind, comp_loads)
colnames(taxa_loads) <- paste0("Bray", seq(ncol(taxa_loads)))
rownames(taxa_loads) <- rownames(tse)
The top PCoA loadings for the first two dimensions are visualised below.
Show code
# Visualise loadings for PCoA 1 and 2
par(mfrow = c(1, 2))
plot_loads(taxa_loads, "Bray1")
plot_loads(taxa_loads, "Bray2")
Exercise 2
Extra:
Find the top 5 contributor taxa for principal component 1.
Example 3.1: Other Distances
A different distance function can be specified with FUN
, such as phylogenetic distance.
# Run PCoA with Unifrac distance
tse <- runMDS(tse, assay.type = "counts", name = "Unifrac", ncomp = 3,
FUN = mia::calculateUnifrac, tree = rowTree(tse))
The number of dimensions to visualise can also be adjusted with ncomp
.
# Visualise Unifrac distance between samples
plotReducedDim(tse, "Unifrac", ncomp = 3, colour_by = "patient_status")
Example 3.2: Comparison
Different ordination methods return considerably different results, which can be compared to achieve a better understanding of the data.
Show code
library(patchwork)
# Generate plots for
plots <- lapply(reducedDimNames(tse),
plotReducedDim,
object = tse,
colour_by = "patient_status")
# Generate multi-panel plot
wrap_plots(plots) +
plot_layout(guides = "collect") +
plot_annotation(tag_levels = "A")
Exercise 3
Run MDS on the CLR assay with Euclidean distance and compare the results with the previous PCoA and PCA.
Extra:
Make a plot with the first three dimensions, and a plot with the second and fourth dimensions.