This tutorial gets You started with R tools for microbial ecology. In particular, to provide an introduction to
Launch R/RStudio and install the microbiome R package (see installation instructions).
To initiate reproducible documentation, do the following in RStudio:
Example data set will be the HITChip Atlas, which is available via the microbiome R package in phyloseq format. This data set from Lahti et al. Nat. Comm. 5:4344, 2014 comes with 130 genus-like taxonomic groups across 1006 western adults with no reported health complications. Some subjects have also short time series. Load the data in R with:
# Data citation doi: 10.1038/ncomms5344
library(microbiome)
library(knitr)
data(atlas1006)
print(atlas1006)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 130 taxa and 1151 samples ]
## sample_data() Sample Data: [ 1151 samples by 10 sample variables ]
## tax_table() Taxonomy Table: [ 130 taxa by 3 taxonomic ranks ]
# Rename the data
pseq <- atlas1006
Explore the phyloseq data format. See examples on microbiome data processing.
Explore the estimation and analysis of various diversity indices and taxonomic composition.
alpha <- microbiome::alpha
tab <- alpha(pseq, index = "all")
kable(head(tab))
observed | chao1 | diversity_inverse_simpson | diversity_gini_simpson | diversity_shannon | diversity_fisher | diversity_coverage | evenness_camargo | evenness_pielou | evenness_simpson | evenness_evar | evenness_bulla | dominance_dbp | dominance_dmn | dominance_absolute | dominance_relative | dominance_simpson | dominance_core_abundance | dominance_gini | rarity_log_modulo_skewness | rarity_low_abundance | rarity_rare_abundance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample-1 | 99 | 107.0769 | 12.983930 | 0.9229817 | 3.187815 | 16.07126 | 5 | 0.3675967 | 0.6937392 | 0.1311508 | 0.1669647 | 0.3447496 | 0.1759515 | 0.3379428 | 1336 | 0.1759515 | 0.0770183 | 0.9599631 | 0.8488881 | 2.059488 | 0.0264718 | 0.0159357 |
Sample-2 | 98 | 106.6667 | 16.595578 | 0.9397430 | 3.394462 | 15.04043 | 7 | 0.3437543 | 0.7403459 | 0.1693426 | 0.1483521 | 0.3958182 | 0.1716594 | 0.2821246 | 1742 | 0.1716594 | 0.0602570 | 0.9017540 | 0.8188881 | 2.055369 | 0.0198069 | 0.0120221 |
Sample-3 | 99 | 108.5455 | 8.703493 | 0.8851036 | 2.864855 | 16.26890 | 4 | 0.3242763 | 0.6234560 | 0.0879141 | 0.1906035 | 0.2680793 | 0.2793437 | 0.3985416 | 1992 | 0.2793437 | 0.1148964 | 0.9394194 | 0.8806975 | 2.053348 | 0.0412284 | 0.0259431 |
Sample-4 | 100 | 109.5455 | 10.709023 | 0.9066208 | 3.056922 | 15.21763 | 4 | 0.3924486 | 0.6638021 | 0.1070902 | 0.1559785 | 0.3336218 | 0.1957623 | 0.3813911 | 2125 | 0.1957623 | 0.0933792 | 0.9509903 | 0.8604110 | 2.057767 | 0.0247812 | 0.0140028 |
Sample-5 | 98 | 109.6667 | 12.248008 | 0.9183541 | 3.073742 | 14.59865 | 4 | 0.4112500 | 0.6703956 | 0.1249797 | 0.1429785 | 0.3194244 | 0.1686667 | 0.3334167 | 2024 | 0.1686667 | 0.0816459 | 0.9441667 | 0.8672179 | 2.057402 | 0.0234167 | 0.0099167 |
Sample-6 | 99 | 110.6667 | 9.665190 | 0.8965359 | 2.941993 | 15.94374 | 3 | 0.2819936 | 0.6402429 | 0.0976282 | 0.1824172 | 0.3030473 | 0.2273187 | 0.3802123 | 1799 | 0.2273187 | 0.1034641 | 0.9560273 | 0.8735717 | 2.060100 | 0.0380339 | 0.0198383 |
Assign the estimated diversity to sample metadata
sample_data(pseq)$diversity <- tab$diversity_shannon
Visualize the data
p <- plot_regression(diversity ~ age, meta(pseq)) +
labs(x = "Age", y = "Alpha diversity")
Explore potential technical biases in the data. DNA extraction method has a remarkable effect on sample grouping.
# Use relative abundance data
ps <- microbiome::transform(pseq, "compositional")
# Pick core taxa
ps <- core(ps, detection = 0, prevalence = 80/100)
# For this example, choose samples with DNA extraction information available
ps <- subset_samples(ps, !is.na(DNA_extraction_method))
# Illustrate sample similarities with PCoA (NMDS)
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")
Core microbiota refers to the set of species shared by (almost) all individuals.
A full phyloseq object with just the core taxa is obtained as follows:
# Transform to compositional abundances
pseq.rel <- microbiome::transform(pseq, "compositional")
# Pick the core (>0.1% relative abundance in >50% of the samples)
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)
Visualizing the core. Using all data can give a visual of which taxa are core at various detection thresholds. The pseq.core object is filtered to have only core taxa based on the user provided values. We suggest using all data for plotting with plot_core and then changing settings for prevalence to limit which taxa you want to visualise.
# Core with compositionals:
prevalences <- seq(.05, 1, .05)
detections <- round(10^seq(log10(5e-3), log10(.2), length = 10), 3)
p <- plot_core(pseq.rel, plot.type = "heatmap",
prevalences = prevalences, detections = detections, min.prevalence = 0.5) +
xlab("Detection Threshold (Relative Abundance)") +
theme(axis.text.x = element_text(size = 9))
print(p)
```
Explore further tools in microbiome tutorial.