To query a SummarizedExperiment for interesting features, several functions are available.

getTopFeatures(
  x,
  top = 5L,
  method = c("mean", "sum", "median"),
  assay.type = assay_name,
  assay_name = "counts",
  na.rm = TRUE,
  ...
)

# S4 method for SummarizedExperiment
getTopFeatures(
  x,
  top = 5L,
  method = c("mean", "sum", "median", "prevalence"),
  assay.type = assay_name,
  assay_name = "counts",
  na.rm = TRUE,
  ...
)

getTopTaxa(x, ...)

# S4 method for SummarizedExperiment
getTopTaxa(x, ...)

getUniqueFeatures(x, ...)

# S4 method for SummarizedExperiment
getUniqueFeatures(x, rank = NULL, ...)

getUniqueTaxa(x, ...)

# S4 method for SummarizedExperiment
getUniqueTaxa(x, ...)

countDominantFeatures(x, group = NULL, name = "dominant_taxa", ...)

# S4 method for SummarizedExperiment
countDominantFeatures(x, group = NULL, name = "dominant_taxa", ...)

countDominantTaxa(x, ...)

# S4 method for SummarizedExperiment
countDominantTaxa(x, ...)

# S4 method for SummarizedExperiment
summary(object, assay.type = assay_name, assay_name = "counts")

Arguments

x

A SummarizedExperiment object.

top

Numeric value, how many top taxa to return. Default return top five taxa.

method

Specify the method to determine top taxa. Either sum, mean, median or prevalence. Default is 'mean'.

assay.type

a character value to select an assayNames By default it expects count data.

assay_name

a single character value for specifying which assay to use for calculation. (Please use assay.type instead. At some point assay_name will be disabled.)

na.rm

For getTopFeatures logical argument for calculation method specified to argument method. Default is TRUE.

...

Additional arguments passed on to agglomerateByRank() when rank is specified for countDominantFeatures.

rank

A single character defining a taxonomic rank. Must be a value of the output of taxonomyRanks().

group

With group, it is possible to group the observations in an overview. Must be one of the column names of colData.

name

The column name for the features. The default is 'dominant_taxa'.

object

A SummarizedExperiment object.

Value

The getTopFeatures returns a vector of the most top abundant “FeatureID”s

The getUniqueFeatures returns a vector of unique taxa present at a particular rank

The countDominantFeatures returns an overview in a tibble. It contains dominant taxa in a column named *name* and its abundance in the data set.

The summary returns a list with two tibbles

Details

The getTopFeatures extracts the most top abundant “FeatureID”s in a SummarizedExperiment object.

The getUniqueFeatures is a basic function to access different taxa at a particular taxonomic rank.

countDominantFeatures returns information about most dominant taxa in a tibble. Information includes their absolute and relative abundances in whole data set.

The summary will return a summary of counts for all samples and features in SummarizedExperiment object.

Author

Leo Lahti, Tuomas Borman and Sudarshan A. Shetty

Examples

data(GlobalPatterns)
top_taxa <- getTopFeatures(GlobalPatterns,
                       method = "mean",
                       top = 5,
                       assay.type = "counts")
top_taxa
#> [1] "549656" "331820" "279599" "360229" "317182"

# Use 'detection' to select detection threshold when using prevalence method
top_taxa <- getTopFeatures(GlobalPatterns,
                       method = "prevalence",
                       top = 5,
                       assay_name = "counts",
                       detection = 100)
top_taxa
#> [1] "549656" "331820" "94166"  "317182" "279599"
                       
# Top taxa os specific rank
getTopFeatures(agglomerateByRank(GlobalPatterns,
                             rank = "Genus",
                             na.rm = TRUE))
#> [1] "Bacteroides"      "Dolichospermum"   "Faecalibacterium" "Neisseria"       
#> [5] "Haemophilus"     

# Gets the overview of dominant taxa
dominant_taxa <- countDominantFeatures(GlobalPatterns,
                                   rank = "Genus")
dominant_taxa
#> # A tibble: 17 × 3
#>    dominant_taxa                  n rel_freq
#>    <chr>                      <int>    <dbl>
#>  1 Genus:Bacteroides              5   0.192 
#>  2 Order:Stramenopiles            4   0.154 
#>  3 Family:Desulfobulbaceae        2   0.0769
#>  4 Genus:Streptococcus            2   0.0769
#>  5 Class:Chloracidobacteria       1   0.0385
#>  6 Family:ACK-M1                  1   0.0385
#>  7 Family:Flavobacteriaceae       1   0.0385
#>  8 Family:Moraxellaceae           1   0.0385
#>  9 Family:Ruminococcaceae         1   0.0385
#> 10 Genus:CandidatusSolibacter     1   0.0385
#> 11 Genus:Dolichospermum           1   0.0385
#> 12 Genus:Faecalibacterium         1   0.0385
#> 13 Genus:MC18                     1   0.0385
#> 14 Genus:Neisseria                1   0.0385
#> 15 Genus:Prochlorococcus          1   0.0385
#> 16 Genus:Veillonella              1   0.0385
#> 17 Order:Chromatiales             1   0.0385

# With group, it is possible to group observations based on specified groups
# Gets the overview of dominant taxa
dominant_taxa <- countDominantFeatures(GlobalPatterns,
                                   rank = "Genus",
                                   group = "SampleType",
                                   na.rm = TRUE)

dominant_taxa
#> # A tibble: 20 × 4
#> # Groups:   SampleType [9]
#>    SampleType         dominant_taxa                n rel_freq
#>    <fct>              <chr>                    <int>    <dbl>
#>  1 Mock               Bacteroides                  3    1    
#>  2 Feces              Bacteroides                  2    0.5  
#>  3 Feces              Faecalibacterium             2    0.5  
#>  4 Freshwater (creek) Crenothrix                   2    0.667
#>  5 Skin               Streptococcus                2    0.667
#>  6 Freshwater         Dolichospermum               1    0.5  
#>  7 Freshwater         Prochlorococcus              1    0.5  
#>  8 Freshwater (creek) Luteolibacter                1    0.333
#>  9 Ocean              CandidatusPortiera           1    0.333
#> 10 Ocean              Polaribacter                 1    0.333
#> 11 Ocean              Prochlorococcus              1    0.333
#> 12 Sediment (estuary) Crenothrix                   1    0.333
#> 13 Sediment (estuary) Desulfuromonas               1    0.333
#> 14 Sediment (estuary) Nitrosopumilus               1    0.333
#> 15 Skin               Corynebacterium              1    0.333
#> 16 Soil               CandidatusNitrososphaera     1    0.333
#> 17 Soil               CandidatusSolibacter         1    0.333
#> 18 Soil               MC18                         1    0.333
#> 19 Tongue             Neisseria                    1    0.5  
#> 20 Tongue             Veillonella                  1    0.5  

# Get an overview of sample and taxa counts
summary(GlobalPatterns, assay_name= "counts")
#> $samples
#> # A tibble: 1 × 6
#>   total_counts min_counts max_counts median_counts mean_counts stdev_counts
#>          <dbl>      <dbl>      <dbl>         <dbl>       <dbl>        <dbl>
#> 1     28216678      58688    2357181       1106849    1085257.      650145.
#> 
#> $features
#> # A tibble: 1 × 3
#>   total singletons per_sample_avg
#>   <int>      <int>          <dbl>
#> 1 19216       2134          4022.
#> 

# Get unique taxa at a particular taxonomic rank
# sort = TRUE means that output is sorted in alphabetical order
# With na.rm = TRUE, it is possible to remove NAs
# sort and na.rm can also be used in function getTopFeatures
getUniqueFeatures(GlobalPatterns, "Phylum", sort = TRUE)
#>  [1] "ABY1_OD1"         "AC1"              "AD3"              "Acidobacteria"   
#>  [5] "Actinobacteria"   "Armatimonadetes"  "BRC1"             "Bacteroidetes"   
#>  [9] "CCM11b"           "Caldiserica"      "Caldithrix"       "Chlamydiae"      
#> [13] "Chlorobi"         "Chloroflexi"      "Crenarchaeota"    "Cyanobacteria"   
#> [17] "Elusimicrobia"    "Euryarchaeota"    "Fibrobacteres"    "Firmicutes"      
#> [21] "Fusobacteria"     "GAL15"            "GN02"             "GN04"            
#> [25] "GN06"             "GN12"             "GOUTA4"           "Gemmatimonadetes"
#> [29] "Hyd24-12"         "KSB1"             "LCP-89"           "LD1"             
#> [33] "Lentisphaerae"    "MVP-15"           "NC10"             "NKB19"           
#> [37] "Nitrospirae"      "OP11"             "OP3"              "OP8"             
#> [41] "OP9"              "PAUC34f"          "Planctomycetes"   "Proteobacteria"  
#> [45] "SAR406"           "SBR1093"          "SC3"              "SC4"             
#> [49] "SM2F11"           "SPAM"             "SR1"              "Spirochaetes"    
#> [53] "Synergistetes"    "TG3"              "TM6"              "TM7"             
#> [57] "Tenericutes"      "Thermi"           "Thermotogae"      "Verrucomicrobia" 
#> [61] "WPS-2"            "WS1"              "WS2"              "WS3"             
#> [65] "ZB2"              "ZB3"              NA