Summarizing microbiome data

To query a SummarizedExperiment for interesting features, several functions are available.

getTop(
  x,
  top = 5L,
  method = c("mean", "sum", "median"),
  assay.type = assay_name,
  assay_name = "counts",
  na.rm = TRUE,
  ...
)

# S4 method for class 'SummarizedExperiment'
getTop(
  x,
  top = 5L,
  method = c("mean", "sum", "median", "prevalence"),
  assay.type = assay_name,
  assay_name = "counts",
  na.rm = TRUE,
  ...
)

getUnique(x, ...)

# S4 method for class 'SummarizedExperiment'
getUnique(x, rank = NULL, ...)

summarizeDominance(x, group = NULL, name = "dominant_taxa", ...)

# S4 method for class 'SummarizedExperiment'
summarizeDominance(x, group = NULL, name = "dominant_taxa", ...)

# S4 method for class 'SummarizedExperiment'
summary(object, assay.type = assay_name, assay_name = "counts")

Arguments

x: TreeSummarizedExperiment.
top: Numeric scalar. Determines how many top taxa to return. Default is to return top five taxa. (Default: 5)
method: Character scalar. Specify the method to determine top taxa. Either sum, mean, median or prevalence. (Default: "mean")
assay.type: Character scalar. Specifies the name of the assay used in calculation. (Default: "counts")
assay_name: Deprecated. Use assay.type instead.
na.rm: Logical scalar. Should NA values be omitted? (Default: TRUE)
...: Additional arguments passed on to agglomerateByRank() when rank is specified for summarizeDominance.
rank: Character scalar. Defines a taxonomic rank. Must be a value of the output of taxonomyRanks(). (Default: NULl)
group: With group, it is possible to group the observations in an overview. Must be one of the column names of colData.
name: Character scalar. A name for the column of the colData where results will be stored. (Default: "dominant_taxa")
object: A SummarizedExperiment object.

Value

The getTop returns a vector of the most top abundant “FeatureID”s

The getUnique returns a vector of unique taxa present at a particular rank

The summarizeDominance returns an overview in a tibble. It contains dominant taxa in a column named *name* and its abundance in the data set.

The summary returns a list with two tibbles

Details

The getTop extracts the most top abundant “FeatureID”s in a SummarizedExperiment object.

The getUnique is a basic function to access different taxa at a particular taxonomic rank.

summarizeDominance returns information about most dominant taxa in a tibble. Information includes their absolute and relative abundances in whole data set.

The summary will return a summary of counts for all samples and features in SummarizedExperiment object.

Examples

data(GlobalPatterns)
top_taxa <- getTop(GlobalPatterns,
                       method = "mean",
                       top = 5,
                       assay.type = "counts")
top_taxa
#> [1] "549656" "331820" "279599" "360229" "317182"

# Use 'detection' to select detection threshold when using prevalence method
top_taxa <- getTop(GlobalPatterns,
                       method = "prevalence",
                       top = 5,
                       assay_name = "counts",
                       detection = 100)
top_taxa
#> [1] "549656" "331820" "94166"  "317182" "279599"
                       
# Top taxa os specific rank
getTop(agglomerateByRank(GlobalPatterns,
                             rank = "Genus",
                             na.rm = TRUE))
#> [1] "Bacteroides"      "Dolichospermum"   "Faecalibacterium" "Neisseria"       
#> [5] "Haemophilus"     

# Gets the overview of dominant taxa
dominant_taxa <- summarizeDominance(GlobalPatterns,
                                   rank = "Genus")
dominant_taxa
#> # A tibble: 17 × 3
#>    dominant_taxa                n rel_freq
#>    <chr>                    <int>    <dbl>
#>  1 Bacteroides                  5   0.192 
#>  2 Crenothrix                   3   0.115 
#>  3 Faecalibacterium             2   0.0769
#>  4 Prochlorococcus              2   0.0769
#>  5 Streptococcus                2   0.0769
#>  6 CandidatusNitrososphaera     1   0.0385
#>  7 CandidatusPortiera           1   0.0385
#>  8 CandidatusSolibacter         1   0.0385
#>  9 Corynebacterium              1   0.0385
#> 10 Desulfuromonas               1   0.0385
#> 11 Dolichospermum               1   0.0385
#> 12 Luteolibacter                1   0.0385
#> 13 MC18                         1   0.0385
#> 14 Neisseria                    1   0.0385
#> 15 Nitrosopumilus               1   0.0385
#> 16 Polaribacter                 1   0.0385
#> 17 Veillonella                  1   0.0385

# With group, it is possible to group observations based on specified groups
# Gets the overview of dominant taxa
dominant_taxa <- summarizeDominance(GlobalPatterns,
                                   rank = "Genus",
                                   group = "SampleType",
                                   na.rm = TRUE)

dominant_taxa
#> # A tibble: 20 × 4
#> # Groups:   SampleType [9]
#>    SampleType         dominant_taxa                n rel_freq
#>    <fct>              <chr>                    <int>    <dbl>
#>  1 Mock               Bacteroides                  3    1    
#>  2 Feces              Bacteroides                  2    0.5  
#>  3 Feces              Faecalibacterium             2    0.5  
#>  4 Freshwater (creek) Crenothrix                   2    0.667
#>  5 Skin               Streptococcus                2    0.667
#>  6 Freshwater         Dolichospermum               1    0.5  
#>  7 Freshwater         Prochlorococcus              1    0.5  
#>  8 Freshwater (creek) Luteolibacter                1    0.333
#>  9 Ocean              CandidatusPortiera           1    0.333
#> 10 Ocean              Polaribacter                 1    0.333
#> 11 Ocean              Prochlorococcus              1    0.333
#> 12 Sediment (estuary) Crenothrix                   1    0.333
#> 13 Sediment (estuary) Desulfuromonas               1    0.333
#> 14 Sediment (estuary) Nitrosopumilus               1    0.333
#> 15 Skin               Corynebacterium              1    0.333
#> 16 Soil               CandidatusNitrososphaera     1    0.333
#> 17 Soil               CandidatusSolibacter         1    0.333
#> 18 Soil               MC18                         1    0.333
#> 19 Tongue             Neisseria                    1    0.5  
#> 20 Tongue             Veillonella                  1    0.5  

# Get an overview of sample and taxa counts
summary(GlobalPatterns, assay.type= "counts")
#> $samples
#> # A tibble: 1 × 6
#>   total_counts min_counts max_counts median_counts mean_counts stdev_counts
#>          <dbl>      <dbl>      <dbl>         <dbl>       <dbl>        <dbl>
#> 1     28216678      58688    2357181       1106849    1085257.      650145.
#> 
#> $features
#> # A tibble: 1 × 3
#>   total singletons per_sample_avg
#>   <int>      <int>          <dbl>
#> 1 19216       2134          4022.
#> 

# Get unique taxa at a particular taxonomic rank
# sort = TRUE means that output is sorted in alphabetical order
# With na.rm = TRUE, it is possible to remove NAs
# sort and na.rm can also be used in function getTop
getUnique(GlobalPatterns, "Phylum", sort = TRUE)
#>  [1] "ABY1_OD1"         "AC1"              "AD3"              "Acidobacteria"   
#>  [5] "Actinobacteria"   "Armatimonadetes"  "BRC1"             "Bacteroidetes"   
#>  [9] "CCM11b"           "Caldiserica"      "Caldithrix"       "Chlamydiae"      
#> [13] "Chlorobi"         "Chloroflexi"      "Crenarchaeota"    "Cyanobacteria"   
#> [17] "Elusimicrobia"    "Euryarchaeota"    "Fibrobacteres"    "Firmicutes"      
#> [21] "Fusobacteria"     "GAL15"            "GN02"             "GN04"            
#> [25] "GN06"             "GN12"             "GOUTA4"           "Gemmatimonadetes"
#> [29] "Hyd24-12"         "KSB1"             "LCP-89"           "LD1"             
#> [33] "Lentisphaerae"    "MVP-15"           "NC10"             "NKB19"           
#> [37] "Nitrospirae"      "OP11"             "OP3"              "OP8"             
#> [41] "OP9"              "PAUC34f"          "Planctomycetes"   "Proteobacteria"  
#> [45] "SAR406"           "SBR1093"          "SC3"              "SC4"             
#> [49] "SM2F11"           "SPAM"             "SR1"              "Spirochaetes"    
#> [53] "Synergistetes"    "TG3"              "TM6"              "TM7"             
#> [57] "Tenericutes"      "Thermi"           "Thermotogae"      "Verrucomicrobia" 
#> [61] "WPS-2"            "WS1"              "WS2"              "WS3"             
#> [65] "ZB2"              "ZB3"              NA

Arguments

Value

Details

See also

Examples