Shared core with venn diagram

In may instances one would wish to compare core taxa that are shared between multiple groups. Here, we demonstrate how this can be achieved by microbiome and eulerr. The test data is stored in the microbiomeutilities R package and the original source of data is Zackular et al., 2014: The Gut Microbiome Modulates Colon Tumorigenesis.

Load packages.

#install.packages("eulerr") # If not installed
library(eulerr)
library(microbiome)
#devtools::install_github('microsud/microbiomeutilities')
library(microbiomeutilities)

Get data

data("zackular2014")
pseq <- zackular2014

We will aim to identify shared core taxa between nationalities.

# simple way to count number of samples in each group
table(meta(pseq)$DiseaseState, useNA = "always")

## 
##    CRC      H nonCRC   <NA> 
##     30     30     28      0

There are 30 CRC and Healthy samples while 28 CRC samples.

# convert to relative abundances
pseq.rel <- microbiome::transform(pseq, "compositional")

Make a list of DiseaseStates.

disease_states <- unique(as.character(meta(pseq.rel)$DiseaseState))
print(disease_states)

## [1] "nonCRC" "CRC"    "H"

Write a for loop to go through each of the disease_states one by one and combine identified core taxa into a list.

list_core <- c() # an empty object to store information

for (n in disease_states){ # for each variable n in DiseaseState
    #print(paste0("Identifying Core Taxa for ", n))
    
    ps.sub <- subset_samples(pseq.rel, DiseaseState == n) # Choose sample from DiseaseState by n
    
    core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g 
                           detection = 0.001, # 0.001 in atleast 90% samples 
                           prevalence = 0.75)
    print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each DiseaseState.
    list_core[[n]] <- core_m # add to a list core taxa for each group.
    #print(list_core)
}

## [1] "No. of core taxa in nonCRC : 11"
## [1] "No. of core taxa in CRC : 8"
## [1] "No. of core taxa in H : 14"

print(list_core)

# Specify colors and plot venn
# supplying colors in the order they appear in list_core
mycols <- c(nonCRC="#d6e2e9", CRC="#cbf3f0", H="#fcf5c7") 
plot(venn(list_core),
     fills = mycols)

print(list_core)

## $nonCRC
##  [1] "d__denovo24" "d__denovo15" "d__denovo1"  "d__denovo5"  "d__denovo8"  "d__denovo9" 
##  [7] "d__denovo10" "d__denovo40" "d__denovo2"  "d__denovo11" "d__denovo14"
## 
## $CRC
## [1] "d__denovo15" "d__denovo1"  "d__denovo3"  "d__denovo4"  "d__denovo5"  "d__denovo9"  "d__denovo2" 
## [8] "d__denovo14"
## 
## $H
##  [1] "d__denovo24" "d__denovo22" "d__denovo15" "d__denovo80" "d__denovo31" "d__denovo1" 
##  [7] "d__denovo4"  "d__denovo8"  "d__denovo9"  "d__denovo10" "d__denovo41" "d__denovo2" 
## [13] "d__denovo11" "d__denovo14"

Here, we see that the ids are only names without taxonomic information. This is because the rownames of otu_table and tax_table are usually IDs or sequences.
The format_to_besthit() function in microbiomeutilites can be useful which add best taxonomic information that is available as demonstrated below.

# use the pseq.rel object created at the begening of this tutorial. 
taxa_names(pseq.rel)[1:5]

## [1] "d__denovo1773" "d__denovo1771" "d__denovo1776" "d__denovo1777" "d__denovo1775"

The taxa names are ids.

# first remove "d__"
taxa_names(pseq.rel) <- gsub( "d__","", taxa_names(pseq.rel))
taxa_names(pseq.rel)[1:5]

## [1] "denovo1773" "denovo1771" "denovo1776" "denovo1777" "denovo1775"

# format names
pseq.rel.f <- format_to_besthit(pseq.rel)
# check names
taxa_names(pseq.rel.f)[1:5]

## [1] "denovo1773:Bacteroides"        "denovo1771:Bacteroides"        "denovo1776:f__Ruminococcaceae"
## [4] "denovo1777:Bacteroides"        "denovo1775:Blautia"

Run the for loop to go through each of the disease_states one by one and combine identified core taxa into a list.

list_core <- c() # an empty object to store information

for (n in disease_states){ # for each variable n in DiseaseState
    #print(paste0("Identifying Core Taxa for ", n))
    
    ps.sub <- subset_samples(pseq.rel.f, DiseaseState == n) # Choose sample from DiseaseState by n
    
    core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g 
                           detection = 0.001, # 0.001 in atleast 90% samples 
                           prevalence = 0.75)
    print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each DiseaseState.
    list_core[[n]] <- core_m # add to a list core taxa for each group.
    #print(list_core)
}

## [1] "No. of core taxa in nonCRC : 11"
## [1] "No. of core taxa in CRC : 8"
## [1] "No. of core taxa in H : 14"

This is more useful information.

print(list_core)

## $nonCRC
##  [1] "denovo24:Blautia"                       "denovo15:f__Lachnospiraceae"           
##  [3] "denovo1:Blautia"                        "denovo5:Escherichia/Shigella"          
##  [5] "denovo8:Roseburia"                      "denovo9:Anaerostipes"                  
##  [7] "denovo10:Faecalibacterium"              "denovo40:Lachnospiracea_incertae_sedis"
##  [9] "denovo2:Bacteroides"                    "denovo11:Blautia"                      
## [11] "denovo14:Blautia"                      
## 
## $CRC
## [1] "denovo15:f__Lachnospiraceae"  "denovo1:Blautia"              "denovo3:Bacteroides"         
## [4] "denovo4:Bacteroides"          "denovo5:Escherichia/Shigella" "denovo9:Anaerostipes"        
## [7] "denovo2:Bacteroides"          "denovo14:Blautia"            
## 
## $H
##  [1] "denovo24:Blautia"            "denovo22:Bacteroides"        "denovo15:f__Lachnospiraceae"
##  [4] "denovo80:Dorea"              "denovo31:Dorea"              "denovo1:Blautia"            
##  [7] "denovo4:Bacteroides"         "denovo8:Roseburia"           "denovo9:Anaerostipes"       
## [10] "denovo10:Faecalibacterium"   "denovo41:Coprococcus"        "denovo2:Bacteroides"        
## [13] "denovo11:Blautia"            "denovo14:Blautia"

Note: There are limitations to the number of groups that can be plotted with the eulerr::venn(). Check eulerr documentation for more detailed information.

Shared core with venn diagram

Sudarshan Shetty, Leo Lahti, et al.