This tutorial gets You started with R tools for microbial ecology. In particular, to provide an introduction to

R tools for microbial ecology
Role of custom data formats and tools in data analytical workflows
Reproducible document generation
Possibilities and challenges in population-level microbiome profiling studies

Installation

Launch R/RStudio and install the microbiome R package (see installation instructions).

To initiate reproducible documentation, do the following in RStudio:

Open a new Rmarkdown (.html) file
Convert that .html file with the ‘knit’ button
Modify the file and knit again to make your own reproducible report

Example data: Intestinal microbiota of 1006 Western adults

Example data set will be the HITChip Atlas, which is available via the microbiome R package in phyloseq format. This data set from Lahti et al. Nat. Comm. 5:4344, 2014 comes with 130 genus-like taxonomic groups across 1006 western adults with no reported health complications. Some subjects have also short time series. Load the data in R with:

# Data citation doi: 10.1038/ncomms5344
library(microbiome)
library(knitr)
data(atlas1006) 
print(atlas1006)

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 130 taxa and 1151 samples ]
## sample_data() Sample Data:       [ 1151 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 130 taxa by 3 taxonomic ranks ]

# Rename the data
pseq <- atlas1006

Phyloseq data stucture for taxonomic profiling

Explore the phyloseq data format. See examples on microbiome data processing.

Alpha diversity

Explore the estimation and analysis of various diversity indices and taxonomic composition.

alpha <- microbiome::alpha
tab <- alpha(pseq, index = "all")
kable(head(tab))

	observed	chao1	diversity_inverse_simpson	diversity_gini_simpson	diversity_shannon	diversity_fisher	diversity_coverage	evenness_camargo	evenness_pielou	evenness_simpson	evenness_evar	evenness_bulla	dominance_dbp	dominance_dmn	dominance_absolute	dominance_relative	dominance_simpson	dominance_core_abundance	dominance_gini	rarity_log_modulo_skewness	rarity_low_abundance	rarity_rare_abundance
Sample-1	99	107.0769	12.983930	0.9229817	3.187815	16.07126	5	0.3675967	0.6937392	0.1311508	0.1669647	0.3447496	0.1759515	0.3379428	1336	0.1759515	0.0770183	0.9599631	0.8488881	2.059488	0.0264718	0.0159357
Sample-2	98	106.6667	16.595578	0.9397430	3.394462	15.04043	7	0.3437543	0.7403459	0.1693426	0.1483521	0.3958182	0.1716594	0.2821246	1742	0.1716594	0.0602570	0.9017540	0.8188881	2.055369	0.0198069	0.0120221
Sample-3	99	108.5455	8.703493	0.8851036	2.864855	16.26890	4	0.3242763	0.6234560	0.0879141	0.1906035	0.2680793	0.2793437	0.3985416	1992	0.2793437	0.1148964	0.9394194	0.8806975	2.053348	0.0412284	0.0259431
Sample-4	100	109.5455	10.709023	0.9066208	3.056922	15.21763	4	0.3924486	0.6638021	0.1070902	0.1559785	0.3336218	0.1957623	0.3813911	2125	0.1957623	0.0933792	0.9509903	0.8604110	2.057767	0.0247812	0.0140028
Sample-5	98	109.6667	12.248008	0.9183541	3.073742	14.59865	4	0.4112500	0.6703956	0.1249797	0.1429785	0.3194244	0.1686667	0.3334167	2024	0.1686667	0.0816459	0.9441667	0.8672179	2.057402	0.0234167	0.0099167
Sample-6	99	110.6667	9.665190	0.8965359	2.941993	15.94374	3	0.2819936	0.6402429	0.0976282	0.1824172	0.3030473	0.2273187	0.3802123	1799	0.2273187	0.1034641	0.9560273	0.8735717	2.060100	0.0380339	0.0198383

Assign the estimated diversity to sample metadata

sample_data(pseq)$diversity <- tab$diversity_shannon

Visualize the data

p <- plot_regression(diversity ~ age, meta(pseq)) +
       labs(x = "Age", y = "Alpha diversity")

Technical biases

Explore potential technical biases in the data. DNA extraction method has a remarkable effect on sample grouping.

# Use relative abundance data
ps <- microbiome::transform(pseq, "compositional")

# Pick core taxa
ps <- core(ps, detection = 0, prevalence = 80/100)

# For this example, choose samples with DNA extraction information available
ps <- subset_samples(ps, !is.na(DNA_extraction_method))

# Illustrate sample similarities with PCoA (NMDS)
plot_landscape(ps, "NMDS", "bray", col = "DNA_extraction_method")

Core microbiota

Core microbiota refers to the set of species shared by (almost) all individuals.

A full phyloseq object with just the core taxa is obtained as follows:

# Transform to compositional abundances
pseq.rel <- microbiome::transform(pseq, "compositional")

# Pick the core (>0.1% relative abundance in >50% of the samples)
pseq.core <- core(pseq.rel, detection = 0.1/100, prevalence = 50/100)

Visualizing the core. Using all data can give a visual of which taxa are core at various detection thresholds. The pseq.core object is filtered to have only core taxa based on the user provided values. We suggest using all data for plotting with plot_core and then changing settings for prevalence to limit which taxa you want to visualise.

# Core with compositionals:
prevalences <- seq(.05, 1, .05)

detections <- round(10^seq(log10(5e-3), log10(.2), length = 10), 3)

p <- plot_core(pseq.rel, plot.type = "heatmap",
  prevalences = prevalences, detections = detections, min.prevalence = 0.5) +
  xlab("Detection Threshold (Relative Abundance)") +
  theme(axis.text.x = element_text(size = 9))

print(p)

```

Other tools

Explore further tools in microbiome tutorial.

Tutorial on Microbiome Data Analysis