Split TreeSummarizedExperiment column-wise or row-wise based on grouping variable

splitOn(x, ...)

# S4 method for SummarizedExperiment
splitOn(x, f = NULL, ...)

# S4 method for SingleCellExperiment
splitOn(x, f = NULL, ...)

# S4 method for TreeSummarizedExperiment
splitOn(x, f = NULL, update_rowTree = FALSE, ...)

unsplitOn(x, ...)

# S4 method for list
unsplitOn(x, update_rowTree = FALSE, ...)

# S4 method for SimpleList
unsplitOn(x, update_rowTree = FALSE, ...)

# S4 method for SingleCellExperiment
unsplitOn(x, altExpNames = names(altExps(x)), keep_reducedDims = FALSE, ...)

Arguments

x

A SummarizedExperiment object or a list of SummarizedExperiment objects.

...

Arguments passed to mergeRows/mergeCols function for SummarizedExperiment objects and other functions. See mergeRows for more details.

  • use_names A single boolean value to select whether to name elements of list by their group names.

f

A single character value for selecting the grouping variable from rowData or colData or a factor or vector with the same length as one of the dimensions. If f matches with both dimensions, MARGIN must be specified. Split by cols is not encouraged, since this is not compatible with storing the results in altExps.

update_rowTree

TRUE or FALSE: Should the rowTree be updated based on splitted data? Option is enabled when x is a TreeSummarizedExperiment object or a list of such objects. (By default: update_rowTree = FALSE)

altExpNames

a character vector specifying the alternative experiments to be unsplit. (By default: altExpNames = names(altExps(x)))

keep_reducedDims

TRUE or FALSE: Should the reducedDims(x) be transferred to the result? Please note, that this breaks the link between the data used to calculate the reduced dims. (By default: keep_reducedDims = FALSE)

Value

For splitOn: SummarizedExperiment objects in a SimpleList.

For unsplitOn: x, with rowData and assay

data replaced by the unsplit data. colData of x is kept as well and any existing rowTree is dropped as well, since existing rowLinks are not valid anymore.

Details

splitOn split data based on grouping variable. Splitting can be done column-wise or row-wise. The returned value is a list of SummarizedExperiment objects; each element containing members of each group.

Author

Leo Lahti and Tuomas Borman. Contact: microbiome.github.io

Examples

data(GlobalPatterns)
tse <- GlobalPatterns
# Split data based on SampleType. 
se_list <- splitOn(tse, f = "SampleType")

# List of SE objects is returned. 
se_list
#> List of length 9
#> names(9): Soil Feces Skin Tongue ... Ocean Sediment (estuary) Mock

# Create arbitrary groups
rowData(tse)$group <- sample(1:3, nrow(tse), replace = TRUE)
colData(tse)$group <- sample(1:3, ncol(tse), replace = TRUE)

# Split based on rows
# Each element is named based on their group name. If you don't want to name
# elements, use use_name = FALSE. Since "group" can be found from rowdata and colData
# you must use MARGIN.
se_list <- splitOn(tse, f = "group", use_names = FALSE, MARGIN = 1)

# When column names are shared between elements, you can store the list to altExps
altExps(tse) <- se_list
#> Warning: 'names(value)' is NULL, replacing with 'unnamed'

altExps(tse)
#> List of length 3
#> names(3): unnamed1 unnamed2 unnamed3

# If you want to split on columns and update rowTree, you can do
se_list <- splitOn(tse, f = colData(tse)$group, update_rowTree = TRUE)
#> Warning: 'keep.nodes' does specify all the tips from 'tree'. The tree is not agglomerated.
#> Warning: 'keep.nodes' does specify all the tips from 'tree'. The tree is not agglomerated.
#> Warning: 'keep.nodes' does specify all the tips from 'tree'. The tree is not agglomerated.

# If you want to combine groups back together, you can use unsplitBy
unsplitOn(se_list)
#> class: TreeSummarizedExperiment 
#> dim: 19216 26 
#> metadata(0):
#> assays(1): counts
#> rownames(19216): 549322 522457 ... 200359 271582
#> rowData names(8): Kingdom Phylum ... Species group
#> colnames(26): CL3 M11Fcsw ... Even1 Even3
#> colData names(8): X.SampleID Primer ... Description group
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (19216 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL