Nowadays, increasing number of microbiota researchers are incorporating temporal aspects in study designs. These may be either long-term evaluation of soil microbiota or intervention studies in human microbiome research. Here, we present some codes and visualition of longitudinal microbiota sampling. We hope to add some functionality in future releases of microbiome
package.
In this, tutorial we demonstrate how to plot trajactory of a microbiota through time We use the human microbiome time series data moving_pictures
from Caporaso et al., 2011 Genome Biol.. These data as the reprocessed files as a part of the Earth microbiome project. This data is packaged in an R package jeevanuDB for using it in this tutorial.
#### To install this pkg ####
# install.packages("devtools")
# devtools::install_github("microsud/jeevanuDB")
############################
library(microbiome)
library(jeevanuDB) # external database pkg for microbiome pkg with test data
library(dplyr)
library(ggplot2)
library(viridis)
library(knitr)
We use the moving_pictures
dataset.
# Example data
data("moving_pictures")
# Rename
ps <- moving_pictures
Check data for which and how many samples are present each subject.
kable(table(meta(ps)$host_subject_id, meta(ps)$sample_type))
saliva | skin | stool | |
---|---|---|---|
F4 | 135 | 268 | 131 |
M3 | 373 | 723 | 336 |
Use only the stool (gut) microbiota data.
ps.gut <- subset_samples(ps, sample_type == "stool")
taxa_names(ps.gut) <- paste0("ASV-", seq(ntaxa(ps.gut))) # rename sequences to ids
# remove asvs which are zero in all of these samples
ps.gut <- prune_taxa(taxa_sums(ps.gut) > 0, ps.gut)
# remove samples with less than 500 reads Note: this is user choice
# here we just show example
ps.gut <- prune_samples(sample_sums(ps.gut) > 500, ps.gut)
# Covnert to relative abundances
ps.gut.rel <- microbiome::transform(ps.gut, "compositional")
Using phyloseq
we do ordination analysis.
# Ordination object
ps.ord <- ordinate(ps.gut.rel, "PCoA")
# Ordination object plus all metadata, note: we use justDF=T. This will not return a plot but a data.frame
ordip <- plot_ordination(ps.gut.rel, ps.ord, justDF = T)
Now, let’s start to make a custom visualization.
# Get axis 1 and 2 variation
evals1 <- round(ps.ord$values$Eigenvalues[1] / sum(ps.ord$values$Eigenvalues) * 100, 2)
evals2 <- round(ps.ord$values$Eigenvalues[2] / sum(ps.ord$values$Eigenvalues) * 100, 2)
Set some nice colors
# theme_set(theme_bw(14))
# set colors
subject_cols <- c(F4 = "#457b9d", M3 = "#e63946")
Add trajectory for the subject of interest. Here, we randomnly choose subject F4
# choose data for subject F4
dfs <- subset(ordip, host_subject_id == "F4")
# arrange according to sampling time. Sort with increasing time
dfs <- dfs %>%
arrange(days_since_experiment_start)
Initiate plotting
# use the ordip
# first step get blank plot
p <- ggplot(ordip, aes(x = Axis.1, y = Axis.2))
# add path (lines) join only those samples that are from F4
p2 <- p +
geom_path(
data = dfs, alpha = 0.7,
arrow = arrow(
angle = 15, length = unit(0.1, "inches"),
ends = "last", type = "closed"
)
) +
# now add a layer of points
geom_point(aes(color = host_subject_id), alpha = 0.6, size = 3) +
scale_color_manual("Subject", values = subject_cols) +
xlab(paste("PCoA 1 (", evals1, "%)", sep = "")) +
ylab(paste("PCoA 2 (", evals2, "%)", sep = "")) +
theme_bw() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()
)
# coord_fixed(sqrt(evals[2] / evals[1]))
# Print figure
print(p2)
Alternatively we can just focus on one subject.
# subset data for only M3
ps.gut.rel.m3 <- subset_samples(ps.gut.rel, host_subject_id == "M3")
# remove asvs which are zero in all of these samples
ps.gut.rel.m3 <- prune_taxa(taxa_sums(ps.gut.rel.m3) > 0, ps.gut.rel.m3)
ps.ord.m3 <- ordinate(ps.gut.rel.m3, "PCoA")
ordip.m3 <- plot_ordination(ps.gut.rel.m3, ps.ord.m3, justDF = T)
# Get axis 1 and 2 variation
evals1 <- round(ordip.m3$values$Eigenvalues[1] / sum(ordip.m3$values$Eigenvalues) * 100, 2)
evals2 <- round(ordip.m3$values$Eigenvalues[2] / sum(ordip.m3$values$Eigenvalues) * 100, 2)
# arrange according to sampling time
ordip.m3 <- ordip.m3 %>%
arrange(days_since_experiment_start) # important to arrange the time
Plot data
# Visualize
# blank plot initiate
p1 <- ggplot(ordip.m3, aes(x = Axis.1, y = Axis.2))
# add layers
p3 <- p1 +
# add arrows with geom_path
geom_path(alpha = 0.5, arrow = arrow(
angle = 30, length = unit(0.1, "inches"),
ends = "last", type = "closed"
)) +
# add points
geom_point(aes(color = days_since_experiment_start), size = 3) +
# add gradient colors
scale_color_viridis("Days from first sampling") +
# add x and y labels
xlab(paste("PCoA 1 (", evals1, "%)", sep = "")) +
ylab(paste("PCoA 2 (", evals2, "%)", sep = "")) +
theme_bw() +
# remove grids in the plot
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()
)
print(p3)