R/getPrevalence.R
agglomerateByPrevalence.Rd
Agglomerate data based on population prevalence
agglomerateByPrevalence(x, ...)
# S4 method for class 'SummarizedExperiment'
agglomerateByPrevalence(
x,
rank = NULL,
other.name = other_label,
other_label = "Other",
...
)
# S4 method for class 'TreeSummarizedExperiment'
agglomerateByPrevalence(
x,
rank = NULL,
other.name = other_label,
other_label = "Other",
update.tree = FALSE,
...
)
arguments passed to agglomerateByRank
function for
SummarizedExperiment
objects and other functions.
See agglomerateByRank
for more details.
Character scalar
. Defines a taxonomic rank. Must be a
value of taxonomyRanks()
function.
Character scalar
. Used as the label for the
summary of non-prevalent taxa. (default: "Other"
)
Deprecated. use other.name
instead.
Logical scalar
. Should
rowTree()
also be merged? (Default: FALSE
)
agglomerateByPrevalence
returns a taxonomically-agglomerated object
of the same class as x and based on prevalent taxonomic results.
agglomerateByPrevalence
sums up the values of assays at the taxonomic
level specified by rank
(by default the highest taxonomic level
available) and selects the summed results that exceed the given population
prevalence at the given detection level. The other summed values (below the
threshold) are agglomerated in an additional row taking the name indicated by
other.name
(by default "Other").
## Data can be aggregated based on prevalent taxonomic results
data(GlobalPatterns)
tse <- GlobalPatterns
tse <- transformAssay(tse, method = "relabundance")
tse <- agglomerateByPrevalence(
tse,
rank = "Phylum",
assay.type = "relabundance",
detection = 1/100,
prevalence = 50/100)
tse
#> class: TreeSummarizedExperiment
#> dim: 6 26
#> metadata(2): agglomerated_by_rank agglomerated_by_rank
#> assays(2): counts relabundance
#> rownames(6): Actinobacteria Bacteroidetes ... Proteobacteria Other
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (6 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
# Here data is aggregated at the taxonomic level "Phylum". The five phyla
# that exceed the population prevalence threshold of 50/100 represent the
# five first rows of the assay in the aggregated data. The sixth and last row
# named by default "Other" takes the summed up values of all the other phyla
# that are below the prevalence threshold.
assay(tse)[,1:5]
#> CL3 CC1 SV1 M31Fcsw M11Fcsw
#> Actinobacteria 39601 90280 121703 2540 841
#> Bacteroidetes 67395 96398 93436 804395 1424107
#> Cyanobacteria 1955 3353 16676 423 212812
#> Firmicutes 8584 4726 3524 700084 330423
#> Proteobacteria 294228 361327 224004 18798 86614
#> Other 452307 579321 238100 17211 21679