Agglomerate data based on population prevalence

agglomerateByPrevalence(x, ...)

# S4 method for class 'SummarizedExperiment'
agglomerateByPrevalence(
  x,
  rank = NULL,
  other.name = other_label,
  other_label = "Other",
  ...
)

# S4 method for class 'TreeSummarizedExperiment'
agglomerateByPrevalence(
  x,
  rank = NULL,
  other.name = other_label,
  other_label = "Other",
  update.tree = FALSE,
  ...
)

Arguments

x

TreeSummarizedExperiment.

...

arguments passed to agglomerateByRank function for SummarizedExperiment objects and other functions. See agglomerateByRank for more details.

rank

Character scalar. Defines a taxonomic rank. Must be a value of taxonomyRanks() function.

other.name

Character scalar. Used as the label for the summary of non-prevalent taxa. (default: "Other")

other_label

Deprecated. use other.name instead.

update.tree

Logical scalar. Should rowTree() also be merged? (Default: FALSE)

Value

agglomerateByPrevalence returns a taxonomically-agglomerated object of the same class as x and based on prevalent taxonomic results.

Details

agglomerateByPrevalence sums up the values of assays at the taxonomic level specified by rank (by default the highest taxonomic level available) and selects the summed results that exceed the given population prevalence at the given detection level. The other summed values (below the threshold) are agglomerated in an additional row taking the name indicated by other.name (by default "Other").

Examples

## Data can be aggregated based on prevalent taxonomic results
data(GlobalPatterns)
tse <- GlobalPatterns
tse <- transformAssay(tse, method = "relabundance")
tse <- agglomerateByPrevalence(
    tse,
    rank = "Phylum",
    assay.type = "relabundance",
    detection = 1/100,
    prevalence = 50/100)

tse
#> class: TreeSummarizedExperiment 
#> dim: 6 26 
#> metadata(2): agglomerated_by_rank agglomerated_by_rank
#> assays(2): counts relabundance
#> rownames(6): Actinobacteria Bacteroidetes ... Proteobacteria Other
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (6 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL

# Here data is aggregated at the taxonomic level "Phylum". The five phyla
# that exceed the population prevalence threshold of 50/100 represent the
# five first rows of the assay in the aggregated data. The sixth and last row
# named by default "Other" takes the summed up values of all the other phyla
# that are below the prevalence threshold.

assay(tse)[,1:5]
#>                   CL3    CC1    SV1 M31Fcsw M11Fcsw
#> Actinobacteria  39601  90280 121703    2540     841
#> Bacteroidetes   67395  96398  93436  804395 1424107
#> Cyanobacteria    1955   3353  16676     423  212812
#> Firmicutes       8584   4726   3524  700084  330423
#> Proteobacteria 294228 361327 224004   18798   86614
#> Other          452307 579321 238100   17211   21679