Using the alpha
function in microbiome R packge
you can calculate a wide variaty of diversity indices. Comparison and visualising group based differecences or similarities is also important. Here, we show steps from calculating diversity indices using microbiome R package and visualising the differences and/or similarities between groups. A useful R package is ggpubr. If you have not installed it, please install it.
Load libraries and data.
library(microbiome)
library(ggpubr)
library(knitr)
library(dplyr)
data(dietswap)
pseq <- dietswap
This returns a table with selected diversity indicators. Check a separate page on Alpha for other functions.
ps1 <- prune_taxa(taxa_sums(pseq) > 0, pseq)
tab <- microbiome::alpha(ps1, index = "all")
kable(head(tab))
observed | chao1 | diversity_inverse_simpson | diversity_gini_simpson | diversity_shannon | diversity_fisher | diversity_coverage | evenness_camargo | evenness_pielou | evenness_simpson | evenness_evar | evenness_bulla | dominance_dbp | dominance_dmn | dominance_absolute | dominance_relative | dominance_simpson | dominance_core_abundance | dominance_gini | rarity_log_modulo_skewness | rarity_low_abundance | rarity_rare_abundance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample-1 | 104 | 113.2308 | 7.561722 | 0.8677550 | 2.940698 | 16.69360 | 4 | 0.1876313 | 0.6331719 | 0.0727089 | 0.1709714 | 0.3298916 | 0.3279347 | 0.4297198 | 2774 | 0.3279347 | 0.1322450 | 0.9276510 | 0.8547129 | 2.057691 | 0.0289632 | 0.0146589 |
Sample-2 | 110 | 120.0000 | 8.102943 | 0.8765881 | 2.822472 | 15.20257 | 3 | 0.1471282 | 0.6004646 | 0.0736631 | 0.1334372 | 0.2755652 | 0.2428626 | 0.4656170 | 5121 | 0.2428626 | 0.1234119 | 0.9330361 | 0.8777889 | 2.057552 | 0.0299725 | 0.0349047 |
Sample-3 | 103 | 106.6000 | 4.291085 | 0.7669587 | 2.407963 | 13.42077 | 2 | 0.1911955 | 0.5195476 | 0.0416610 | 0.1373098 | 0.2541325 | 0.4594744 | 0.5603989 | 13271 | 0.4594744 | 0.2330413 | 0.9514247 | 0.9030912 | 2.052324 | 0.0339646 | 0.0093134 |
Sample-4 | 105 | 110.6250 | 7.930799 | 0.8739093 | 2.992482 | 15.56061 | 4 | 0.2674871 | 0.6429969 | 0.0755314 | 0.1711334 | 0.3176017 | 0.3230653 | 0.3957720 | 4279 | 0.3230653 | 0.1260907 | 0.8621367 | 0.8487989 | 2.051175 | 0.0348056 | 0.0367686 |
Sample-5 | 103 | 109.5000 | 3.170738 | 0.6846160 | 2.106022 | 14.53671 | 1 | 0.1183671 | 0.4544002 | 0.0307839 | 0.1661081 | 0.2310307 | 0.5450144 | 0.6317579 | 9456 | 0.5450144 | 0.3153840 | 0.9535447 | 0.9167798 | 2.059634 | 0.0441499 | 0.0105476 |
Sample-6 | 105 | 112.5833 | 2.953753 | 0.6614476 | 2.071136 | 14.55619 | 1 | 0.1533745 | 0.4450266 | 0.0281310 | 0.1625409 | 0.2308364 | 0.5693811 | 0.6430163 | 11243 | 0.5693811 | 0.3385524 | 0.9258078 | 0.9146230 | 2.059421 | 0.0375772 | 0.0168642 |
Now, get the metadata (sample_data) from the phyloseq
object
ps1.meta <- meta(ps1)
kable(head(ps1.meta))
subject | sex | nationality | group | sample | timepoint | timepoint.within.group | bmi_group | |
---|---|---|---|---|---|---|---|---|
Sample-1 | byn | male | AAM | DI | Sample-1 | 4 | 1 | obese |
Sample-2 | nms | male | AFR | HE | Sample-2 | 2 | 1 | lean |
Sample-3 | olt | male | AFR | HE | Sample-3 | 2 | 1 | overweight |
Sample-4 | pku | female | AFR | HE | Sample-4 | 2 | 1 | obese |
Sample-5 | qjy | female | AFR | HE | Sample-5 | 2 | 1 | overweight |
Sample-6 | riv | female | AFR | HE | Sample-6 | 2 | 1 | obese |
Add the diversity table to metadata
ps1.meta$Shannon <- tab$diversity_shannon
ps1.meta$InverseSimpson <- tab$diversity_inverse_simpson
Let’s say we want to compare differences in Shannon index between bmi group of the study subjects.
# create a list of pairwise comaprisons
bmi <- levels(ps1.meta$bmi_group) # get the variables
# make a pairwise list that we want to compare.
bmi.pairs <- combn(seq_along(bmi), 2, simplify = FALSE, FUN = function(i)bmi[i])
print(bmi.pairs)
## [[1]]
## [1] "lean" "overweight"
##
## [[2]]
## [1] "lean" "obese"
##
## [[3]]
## [1] "overweight" "obese"
Sometimes that variables can be stored as characters and sometime as factors. In such a senario levels()
may return an empty vector. A work around for this can be found here.
Using ggpubr
a violin plot will be created
#ps1.meta$'' <- alpha(ps1, index = 'shannon')
p1 <- ggviolin(ps1.meta, x = "bmi_group", y = "Shannon",
add = "boxplot", fill = "bmi_group", palette = c("#a6cee3", "#b2df8a", "#fdbf6f"))
print(p1)
Pairwise comparision using non-parametric test (Wilcoxon test).
p1 <- p1 + stat_compare_means(comparisons = bmi.pairs)
print(p1)
For more information and useful tips and suggestions check the Statistical tools for high-throughput data analysis.