Principal Coordinate Analysis

Concepts

Principal Coordinate Analysis (PCoA) is a method to find the dimensions of the data that explain most of its variance. The diversity between samples can be expressed in terms of several ecological indices, such as Bray-Curtis and Aitchison dissimilarities. If Euclidean distance is used, PCoA becomes Principal Component Analysis (PCA). You can learn more about PCoA in OMA chapter 7.

The following packages are necessary to execute the code in this presentation:

mia: methods to analyse microbiome data
scater: utils to visualise data stored in TreeSE objects
patchwork: framework to combine multiple ggplot objects

Example 1.1

To get started, we import Tengeler2020 from mia and store it into a variable.

# load dataset and store it into tse
data("Tengeler2020", package = "mia")
tse <- Tengeler2020

# Get summary about the object
# What dimensions does the data have?
tse

class: TreeSummarizedExperiment 
dim: 151 27 
metadata(0):
assays(1): counts
rownames(151): 1726470 1726471 ... 17264756 17264757
rowData names(6): Kingdom Phylum ... Family Genus
colnames(27): A110 A12 ... A35 A38
colData names(4): patient_status cohort patient_status_vs_cohort
  sample_name
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
rowLinks: a LinkDataFrame (151 rows)
rowTree: 1 phylo tree(s) (151 leaves)
colLinks: NULL
colTree: NULL

Example 1.2

After that, we transform the counts assay to relative abundances and store the new assay back into the TreeSE.

# Transform counts to relative abundance
tse <- transformAssay(tse, method = "relabundance")

Here, we run multi-dimensional scaling (another word for PCoA) on the relative abundance assay to reduce the data to fewer dimensions.

# Reduce number of dimensions from 151 to 3 by PCoA
tse <- runMDS(tse,
              assay.type = "relabundance",
              FUN = vegan::vegdist,
              method = "bray",
              name = "Bray")

# The new dimensions are stored in the reducedDim slot
head(reducedDim(tse, "Bray"))

            [,1]        [,2]
A110  0.35648020  0.11979800
A12  -0.01837414 -0.35553853
A15   0.14308108  0.36730630
A19   0.03031075 -0.34670413
A21   0.07539063 -0.31823525
A23   0.17993859  0.09476674

Example 1.3

We then visualise the first two dimensions.

# The new dimensions can be used to visualise diversity among samples
p1 <- plotReducedDim(tse, "Bray",
                     colour_by = "patient_status")

p1

Figure 1: Ordination plots based on Bray-Curtis index. Samples are coloured by patient status.

Example 2.1

By default, Bray-Curtis dissimilarity is used. However, other metrics can be specified with the method argument.

# Reduce number of dimensions with a different metric
tse <- runMDS(tse,
              assay.type = "relabundance",
              FUN = vegan::vegdist,
              method = "jaccard",
              name = "Jaccard")

reducedDimNames(tse)

[1] "Bray"    "Jaccard"

Example 2.2

# Visualise samples with the newly reduced dimensions
p3 <- plotReducedDim(tse, "Jaccard",
                     colour_by = "patient_status")

p3

Figure 2: Ordination plot based on Unifrac index. Samples are coloured by patient status.

Example 3.1

example with different distance function. (could be custom)

tse <- runMDS(tse,
              assay.type = "counts",
              FUN = mia::calculateUnifrac,
              name = "Unifrac",
              tree = rowTree(tse),
              ntop = nrow(tse))

reducedDimNames(tse)

[1] "Bray"    "Jaccard" "Unifrac"

Example 3.2

p2 <- plotReducedDim(tse, "Unifrac",
                     colour_by = "patient_status",
                     shape_by = "cohort")

p2

Figure 3: Ordination plot based on Unifrac index. Samples are coloured by patient status and shaped by cohort.

Example 4.1

Example with different algorithm.

It is also possible to specify the number of output dimensions with the argument ncomponents. Here, we show it with the UMAP ordination method.

tse <- runUMAP(tse,
               assay.type = "counts",
               ncomponents = 3)

# The new dimensions are stored in the reducedDim slot
head(reducedDim(tse, "UMAP"))

          UMAP1      UMAP2      UMAP3
A110 -0.2306184  0.7279943  1.5806612
A12  -0.9707269  0.5182520  1.2675595
A15   0.9813374  0.3229846 -0.6461952
A19  -1.1380295  0.5971419  0.8782231
A21   0.9880926 -0.6312433 -0.5866096
A23   1.2138464 -0.1371741 -0.5889314

Example 4.2

We then plot all three dimensions, but you could also plot a pair of dimensions (1 and 3, 2 and 3 or 1 and 2) with the ncomponents argument.

p4 <- plotReducedDim(tse, "UMAP",
                    ncomponents = 3,
                    colour_by = "patient_status",
                    shape_by = "cohort")

p4

Figure 4: UMAP plot of the first three dimensions.

Comparison

Finally, we combine the two plots with the patchwork syntax and compare them in ?@fig-beta.

(p1 | p2 | p3) +
  plot_layout(guides = "collect") +
  plot_annotation(tag_levels = "A")

Figure 5: Ordination plots based on (A) Bray-Curtis, (B) Jaccard and (C) Unifrac indices. Samples are coloured by patient status and shaped by cohort.

Try yourself

exercise 7.2
exercise 7.3.
exercise with aitchison