This tutorial gets You started with R tools for microbial ecology. In particular, to provide an introduction to

Installation

Launch R/RStudio and install the microbiome R package (see installation instructions).

To initiate reproducible documentation, do the following in RStudio:

  1. Open a new Rmarkdown (.html) file
  2. Convert that .html file with the ‘knit’ button
  3. Modify the file and knit again to make your own reproducible report

Example data: Intestinal microbiota of 1006 Western adults

Example data set will be the HITChip Atlas, which is available via the microbiome R package in phyloseq format. This data set from Lahti et al. Nat. Comm. 5:4344, 2014 comes with 130 genus-like taxonomic groups across 1006 western adults with no reported health complications. Some subjects have also short time series. Load the data in R with:

# Data citation doi: 10.1038/ncomms5344
library(microbiome)
library(knitr)
data(atlas1006) 
print(atlas1006)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 130 taxa and 1151 samples ]
## sample_data() Sample Data:       [ 1151 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]
# Rename the data
pseq <- atlas1006

Phyloseq data stucture for taxonomic profiling

Explore the phyloseq data format. See examples on microbiome data processing.

Alpha diversity

Explore the estimation and analysis of various diversity indices and taxonomic composition.

alpha <- microbiome::alpha
tab <- alpha(pseq, index = "all")
kable(head(tab))
observed chao1 diversity_inverse_simpson diversity_gini_simpson diversity_shannon diversity_fisher diversity_coverage evenness_camargo evenness_pielou evenness_simpson evenness_evar evenness_bulla dominance_dbp dominance_dmn dominance_absolute dominance_relative dominance_simpson dominance_core_abundance dominance_gini rarity_log_modulo_skewness rarity_low_abundance rarity_rare_abundance
Sample-1 99 107.0769 12.983930 0.9229817 3.187815 16.07126 5 0.3675967 0.6937392 0.1311508 0.1669647 0.3447496 0.1759515 0.3379428 1336 0.1759515 0.0770183 0.9599631 0.8488881 2.059488 0.0264718 0.0159357
Sample-2 98 106.6667 16.595578 0.9397430 3.394462 15.04043 7 0.3437543 0.7403459 0.1693426 0.1483521 0.3958182 0.1716594 0.2821246 1742 0.1716594 0.0602570 0.9017540 0.8188881 2.055369 0.0198069 0.0120221
Sample-3 99 108.5455 8.703493 0.8851036 2.864855 16.26890 4 0.3242763 0.6234560 0.0879141 0.1906035 0.2680793 0.2793437 0.3985416 1992 0.2793437 0.1148964 0.9394194 0.8806975 2.053348 0.0412284 0.0259431
Sample-4 100 109.5455 10.709023 0.9066208 3.056922 15.21763 4 0.3924486 0.6638021 0.1070902 0.1559785 0.3336218 0.1957623 0.3813911 2125 0.1957623 0.0933792 0.9509903 0.8604110 2.057767 0.0247812 0.0140028
Sample-5 98 109.6667 12.248008 0.9183541 3.073742 14.59865 4 0.4112500 0.6703956 0.1249797 0.1429785 0.3194244 0.1686667 0.3334167 2024 0.1686667 0.0816459 0.9441667 0.8672179 2.057402 0.0234167 0.0099167
Sample-6 99 110.6667 9.665190 0.8965359 2.941993 15.94374 3 0.2819936 0.6402429 0.0976282 0.1824172 0.3030473 0.2273187 0.3802123 1799 0.2273187 0.1034641 0.9560273 0.8735717 2.060100 0.0380339 0.0198383

Assign the estimated diversity to sample metadata

sample_data(pseq)$diversity <- tab$diversity_shannon

Visualize the data

p <- plot_regression(diversity ~ age, meta(pseq)) +
       labs(x = "Age", y = "Alpha diversity")

Technical biases

Explore potential technical biases in the data. DNA extraction method has a remarkable effect on sample grouping.

# Use relative abundance data
ps <- microbiome::transform(pseq, "compositional")

# Pick core taxa
ps <- core(ps, detection = 0, prevalence = 80/100)

# For this example, choose samples with DNA extraction information available
ps <- subset_samples(ps, !is.na(DNA_extraction_method))

# Illustrate sample similarities with PCoA (NMDS)
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")

Core microbiota

Core microbiota refers to the set of species shared by (almost) all individuals.

A full phyloseq object with just the core taxa is obtained as follows:

# Transform to compositional abundances
pseq.rel <- microbiome::transform(pseq, "compositional")

# Pick the core (>0.1% relative abundance in >50% of the samples)
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)

Visualizing the core. Using all data can give a visual of which taxa are core at various detection thresholds. The pseq.core object is filtered to have only core taxa based on the user provided values. We suggest using all data for plotting with plot_core and then changing settings for prevalence to limit which taxa you want to visualise.

# Core with compositionals:
prevalences <- seq(.05, 1, .05)

detections <- round(10^seq(log10(5e-3), log10(.2), length = 10), 3)

p <- plot_core(pseq.rel, plot.type = "heatmap",
  prevalences = prevalences, detections = detections, min.prevalence = 0.5) +
  xlab("Detection Threshold (Relative Abundance)") +
  theme(axis.text.x = element_text(size = 9))

print(p)

```

Other tools

Explore further tools in microbiome tutorial.