In may instances one would wish to compare core taxa that are shared between multiple groups. Here, we demonstrate how this can be achieved by microbiome
and eulerr. The test data is stored in the microbiomeutilities R package and the original source of data is Zackular et al., 2014: The Gut Microbiome Modulates Colon Tumorigenesis.
Load packages.
#install.packages("eulerr") # If not installed
library(eulerr)
library(microbiome)
#devtools::install_github('microsud/microbiomeutilities')
library(microbiomeutilities)
Get data
data("zackular2014")
pseq <- zackular2014
We will aim to identify shared core taxa between nationalities.
# simple way to count number of samples in each group
table(meta(pseq)$DiseaseState, useNA = "always")
##
## CRC H nonCRC <NA>
## 30 30 28 0
There are 30 CRC and Healthy samples while 28 CRC samples.
# convert to relative abundances
pseq.rel <- microbiome::transform(pseq, "compositional")
Make a list of DiseaseStates.
disease_states <- unique(as.character(meta(pseq.rel)$DiseaseState))
print(disease_states)
## [1] "nonCRC" "CRC" "H"
Write a for loop
to go through each of the disease_states
one by one and combine identified core taxa into a list.
list_core <- c() # an empty object to store information
for (n in disease_states){ # for each variable n in DiseaseState
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(pseq.rel, DiseaseState == n) # Choose sample from DiseaseState by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.001, # 0.001 in atleast 90% samples
prevalence = 0.75)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each DiseaseState.
list_core[[n]] <- core_m # add to a list core taxa for each group.
#print(list_core)
}
## [1] "No. of core taxa in nonCRC : 11"
## [1] "No. of core taxa in CRC : 8"
## [1] "No. of core taxa in H : 14"
print(list_core)
# Specify colors and plot venn
# supplying colors in the order they appear in list_core
mycols <- c(nonCRC="#d6e2e9", CRC="#cbf3f0", H="#fcf5c7")
plot(venn(list_core),
fills = mycols)
print(list_core)
## $nonCRC
## [1] "d__denovo24" "d__denovo15" "d__denovo1" "d__denovo5" "d__denovo8" "d__denovo9"
## [7] "d__denovo10" "d__denovo40" "d__denovo2" "d__denovo11" "d__denovo14"
##
## $CRC
## [1] "d__denovo15" "d__denovo1" "d__denovo3" "d__denovo4" "d__denovo5" "d__denovo9" "d__denovo2"
## [8] "d__denovo14"
##
## $H
## [1] "d__denovo24" "d__denovo22" "d__denovo15" "d__denovo80" "d__denovo31" "d__denovo1"
## [7] "d__denovo4" "d__denovo8" "d__denovo9" "d__denovo10" "d__denovo41" "d__denovo2"
## [13] "d__denovo11" "d__denovo14"
Here, we see that the ids are only names without taxonomic information. This is because the rownames of otu_table
and tax_table
are usually IDs or sequences.
The format_to_besthit()
function in microbiomeutilites
can be useful which add best taxonomic information that is available as demonstrated below.
# use the pseq.rel object created at the begening of this tutorial.
taxa_names(pseq.rel)[1:5]
## [1] "d__denovo1773" "d__denovo1771" "d__denovo1776" "d__denovo1777" "d__denovo1775"
The taxa names are ids.
# first remove "d__"
taxa_names(pseq.rel) <- gsub( "d__","", taxa_names(pseq.rel))
taxa_names(pseq.rel)[1:5]
## [1] "denovo1773" "denovo1771" "denovo1776" "denovo1777" "denovo1775"
# format names
pseq.rel.f <- format_to_besthit(pseq.rel)
# check names
taxa_names(pseq.rel.f)[1:5]
## [1] "denovo1773:Bacteroides" "denovo1771:Bacteroides" "denovo1776:f__Ruminococcaceae"
## [4] "denovo1777:Bacteroides" "denovo1775:Blautia"
Run the for loop
to go through each of the disease_states
one by one and combine identified core taxa into a list.
list_core <- c() # an empty object to store information
for (n in disease_states){ # for each variable n in DiseaseState
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(pseq.rel.f, DiseaseState == n) # Choose sample from DiseaseState by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.001, # 0.001 in atleast 90% samples
prevalence = 0.75)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each DiseaseState.
list_core[[n]] <- core_m # add to a list core taxa for each group.
#print(list_core)
}
## [1] "No. of core taxa in nonCRC : 11"
## [1] "No. of core taxa in CRC : 8"
## [1] "No. of core taxa in H : 14"
This is more useful information.
print(list_core)
## $nonCRC
## [1] "denovo24:Blautia" "denovo15:f__Lachnospiraceae"
## [3] "denovo1:Blautia" "denovo5:Escherichia/Shigella"
## [5] "denovo8:Roseburia" "denovo9:Anaerostipes"
## [7] "denovo10:Faecalibacterium" "denovo40:Lachnospiracea_incertae_sedis"
## [9] "denovo2:Bacteroides" "denovo11:Blautia"
## [11] "denovo14:Blautia"
##
## $CRC
## [1] "denovo15:f__Lachnospiraceae" "denovo1:Blautia" "denovo3:Bacteroides"
## [4] "denovo4:Bacteroides" "denovo5:Escherichia/Shigella" "denovo9:Anaerostipes"
## [7] "denovo2:Bacteroides" "denovo14:Blautia"
##
## $H
## [1] "denovo24:Blautia" "denovo22:Bacteroides" "denovo15:f__Lachnospiraceae"
## [4] "denovo80:Dorea" "denovo31:Dorea" "denovo1:Blautia"
## [7] "denovo4:Bacteroides" "denovo8:Roseburia" "denovo9:Anaerostipes"
## [10] "denovo10:Faecalibacterium" "denovo41:Coprococcus" "denovo2:Bacteroides"
## [13] "denovo11:Blautia" "denovo14:Blautia"
Note: There are limitations to the number of groups that can be plotted with the eulerr::venn()
. Check eulerr documentation for more detailed information.