To query a SummarizedExperiment
for interesting features, several
functions are available.
getTopFeatures(
x,
top = 5L,
method = c("mean", "sum", "median"),
assay.type = assay_name,
assay_name = "counts",
na.rm = TRUE,
...
)
# S4 method for SummarizedExperiment
getTopFeatures(
x,
top = 5L,
method = c("mean", "sum", "median", "prevalence"),
assay.type = assay_name,
assay_name = "counts",
na.rm = TRUE,
...
)
getTopTaxa(x, ...)
# S4 method for SummarizedExperiment
getTopTaxa(x, ...)
getUniqueFeatures(x, ...)
# S4 method for SummarizedExperiment
getUniqueFeatures(x, rank = NULL, ...)
getUniqueTaxa(x, ...)
# S4 method for SummarizedExperiment
getUniqueTaxa(x, ...)
countDominantFeatures(x, group = NULL, name = "dominant_taxa", ...)
# S4 method for SummarizedExperiment
countDominantFeatures(x, group = NULL, name = "dominant_taxa", ...)
countDominantTaxa(x, ...)
# S4 method for SummarizedExperiment
countDominantTaxa(x, ...)
# S4 method for SummarizedExperiment
summary(object, assay.type = assay_name, assay_name = "counts")
A
SummarizedExperiment
object.
Numeric value, how many top taxa to return. Default return top five taxa.
Specify the method to determine top taxa. Either sum, mean, median or prevalence. Default is 'mean'.
a character
value to select an
assayNames
By default it expects count data.
a single character
value for specifying which
assay to use for calculation.
(Please use assay.type
instead. At some point assay_name
will be disabled.)
For getTopFeatures
logical argument for calculation method
specified to argument method
. Default is TRUE.
Additional arguments passed on to agglomerateByRank()
when
rank
is specified for countDominantFeatures
.
A single character defining a taxonomic rank. Must be a value of
the output of taxonomyRanks()
.
With group, it is possible to group the observations in an
overview. Must be one of the column names of colData
.
The column name for the features. The default is 'dominant_taxa'.
A
SummarizedExperiment
object.
The getTopFeatures
returns a vector of the most top
abundant
“FeatureID”s
The getUniqueFeatures
returns a vector of unique taxa present at a
particular rank
The countDominantFeatures
returns an overview in a tibble. It contains dominant taxa
in a column named *name*
and its abundance in the data set.
The summary
returns a list with two tibble
s
The getTopFeatures
extracts the most top
abundant “FeatureID”s
in a SummarizedExperiment
object.
The getUniqueFeatures
is a basic function to access different taxa at a
particular taxonomic rank.
countDominantFeatures
returns information about most dominant
taxa in a tibble. Information includes their absolute and relative
abundances in whole data set.
The summary
will return a summary of counts for all samples and
features in
SummarizedExperiment
object.
data(GlobalPatterns)
top_taxa <- getTopFeatures(GlobalPatterns,
method = "mean",
top = 5,
assay.type = "counts")
top_taxa
#> [1] "549656" "331820" "279599" "360229" "317182"
# Use 'detection' to select detection threshold when using prevalence method
top_taxa <- getTopFeatures(GlobalPatterns,
method = "prevalence",
top = 5,
assay_name = "counts",
detection = 100)
top_taxa
#> [1] "549656" "331820" "94166" "317182" "279599"
# Top taxa os specific rank
getTopFeatures(agglomerateByRank(GlobalPatterns,
rank = "Genus",
na.rm = TRUE))
#> [1] "Bacteroides" "Dolichospermum" "Faecalibacterium" "Neisseria"
#> [5] "Haemophilus"
# Gets the overview of dominant taxa
dominant_taxa <- countDominantFeatures(GlobalPatterns,
rank = "Genus")
dominant_taxa
#> # A tibble: 17 × 3
#> dominant_taxa n rel_freq
#> <chr> <int> <dbl>
#> 1 Genus:Bacteroides 5 0.192
#> 2 Order:Stramenopiles 4 0.154
#> 3 Family:Desulfobulbaceae 2 0.0769
#> 4 Genus:Streptococcus 2 0.0769
#> 5 Class:Chloracidobacteria 1 0.0385
#> 6 Family:ACK-M1 1 0.0385
#> 7 Family:Flavobacteriaceae 1 0.0385
#> 8 Family:Moraxellaceae 1 0.0385
#> 9 Family:Ruminococcaceae 1 0.0385
#> 10 Genus:CandidatusSolibacter 1 0.0385
#> 11 Genus:Dolichospermum 1 0.0385
#> 12 Genus:Faecalibacterium 1 0.0385
#> 13 Genus:MC18 1 0.0385
#> 14 Genus:Neisseria 1 0.0385
#> 15 Genus:Prochlorococcus 1 0.0385
#> 16 Genus:Veillonella 1 0.0385
#> 17 Order:Chromatiales 1 0.0385
# With group, it is possible to group observations based on specified groups
# Gets the overview of dominant taxa
dominant_taxa <- countDominantFeatures(GlobalPatterns,
rank = "Genus",
group = "SampleType",
na.rm = TRUE)
dominant_taxa
#> # A tibble: 20 × 4
#> # Groups: SampleType [9]
#> SampleType dominant_taxa n rel_freq
#> <fct> <chr> <int> <dbl>
#> 1 Mock Bacteroides 3 1
#> 2 Feces Bacteroides 2 0.5
#> 3 Feces Faecalibacterium 2 0.5
#> 4 Freshwater (creek) Crenothrix 2 0.667
#> 5 Skin Streptococcus 2 0.667
#> 6 Freshwater Dolichospermum 1 0.5
#> 7 Freshwater Prochlorococcus 1 0.5
#> 8 Freshwater (creek) Luteolibacter 1 0.333
#> 9 Ocean CandidatusPortiera 1 0.333
#> 10 Ocean Polaribacter 1 0.333
#> 11 Ocean Prochlorococcus 1 0.333
#> 12 Sediment (estuary) Crenothrix 1 0.333
#> 13 Sediment (estuary) Desulfuromonas 1 0.333
#> 14 Sediment (estuary) Nitrosopumilus 1 0.333
#> 15 Skin Corynebacterium 1 0.333
#> 16 Soil CandidatusNitrososphaera 1 0.333
#> 17 Soil CandidatusSolibacter 1 0.333
#> 18 Soil MC18 1 0.333
#> 19 Tongue Neisseria 1 0.5
#> 20 Tongue Veillonella 1 0.5
# Get an overview of sample and taxa counts
summary(GlobalPatterns, assay_name= "counts")
#> $samples
#> # A tibble: 1 × 6
#> total_counts min_counts max_counts median_counts mean_counts stdev_counts
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 28216678 58688 2357181 1106849 1085257. 650145.
#>
#> $features
#> # A tibble: 1 × 3
#> total singletons per_sample_avg
#> <int> <int> <dbl>
#> 1 19216 2134 4022.
#>
# Get unique taxa at a particular taxonomic rank
# sort = TRUE means that output is sorted in alphabetical order
# With na.rm = TRUE, it is possible to remove NAs
# sort and na.rm can also be used in function getTopFeatures
getUniqueFeatures(GlobalPatterns, "Phylum", sort = TRUE)
#> [1] "ABY1_OD1" "AC1" "AD3" "Acidobacteria"
#> [5] "Actinobacteria" "Armatimonadetes" "BRC1" "Bacteroidetes"
#> [9] "CCM11b" "Caldiserica" "Caldithrix" "Chlamydiae"
#> [13] "Chlorobi" "Chloroflexi" "Crenarchaeota" "Cyanobacteria"
#> [17] "Elusimicrobia" "Euryarchaeota" "Fibrobacteres" "Firmicutes"
#> [21] "Fusobacteria" "GAL15" "GN02" "GN04"
#> [25] "GN06" "GN12" "GOUTA4" "Gemmatimonadetes"
#> [29] "Hyd24-12" "KSB1" "LCP-89" "LD1"
#> [33] "Lentisphaerae" "MVP-15" "NC10" "NKB19"
#> [37] "Nitrospirae" "OP11" "OP3" "OP8"
#> [41] "OP9" "PAUC34f" "Planctomycetes" "Proteobacteria"
#> [45] "SAR406" "SBR1093" "SC3" "SC4"
#> [49] "SM2F11" "SPAM" "SR1" "Spirochaetes"
#> [53] "Synergistetes" "TG3" "TM6" "TM7"
#> [57] "Tenericutes" "Thermi" "Thermotogae" "Verrucomicrobia"
#> [61] "WPS-2" "WS1" "WS2" "WS3"
#> [65] "ZB2" "ZB3" NA