subsampleCounts
will randomly subsample counts in
SummarizedExperiment
and return the a modified object in which each
sample has same number of total observations/counts/reads.
subsampleCounts(
x,
assay.type = assay_name,
assay_name = "counts",
min_size = min(colSums2(assay(x))),
replace = TRUE,
name = "subsampled",
verbose = TRUE,
...
)
# S4 method for SummarizedExperiment
subsampleCounts(
x,
assay.type = assay_name,
assay_name = "counts",
min_size = min(colSums2(assay(x))),
replace = TRUE,
name = "subsampled",
verbose = TRUE,
...
)
A
SummarizedExperiment
object.
A single character value for selecting the
SummarizedExperiment
assay
used for random subsampling.
Only counts are useful and other transformed data as input will give
meaningless output.
a single character
value for specifying which
assay to use for calculation.
(Please use assay.type
instead. At some point assay_name
will be disabled.)
A single integer value equal to the number of counts being simulated this can equal to lowest number of total counts found in a sample or a user specified number.
Logical Default is TRUE
. The default is with
replacement (replace=TRUE
).
See phyloseq::rarefy_even_depth
for details on implications of this parameter.
A single character value specifying the name of transformed abundance table.
Logical Default is TRUE
. When TRUE
an additional
message about the random number used is printed.
additional arguments not used
subsampleCounts
return x
with subsampled data.
Although the subsampling approach is highly debated in microbiome research,
we include the subsampleCounts
function because there may be some
instances where it can be useful.
Note that the output of subsampleCounts
is not the equivalent as the
input and any result have to be verified with the original dataset.
To maintain the reproducibility, please define the seed using set.seed()
before implement this function.
McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology. 2014 Apr 3;10(4):e1003531.
Gloor GB, Macklaim JM, Pawlowsky-Glahn V & Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Frontiers in Microbiology 8: 2224. doi: 10.3389/fmicb.2017.02224
Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, Lozupone C, Zaneveld JR, Vázquez-Baeza Y, Birmingham A, Hyde ER. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017 Dec;5(1):1-8.
# When samples in TreeSE are less than specified min_size, they will be removed.
# If after subsampling features are not present in any of the samples,
# they will be removed.
data(GlobalPatterns)
tse <- GlobalPatterns
set.seed(123)
tse.subsampled <- subsampleCounts(tse,
min_size = 60000,
name = "subsampled"
)
#> Warning: Subsampling/Rarefying may undermine downstream analyses and have unintended consequences. Therefore, make sure this normalization is appropriate for your data.
#> 1 samples removed because they contained fewer reads than `min_size`.
#> 6808 features removed because they are not present in all samples after subsampling.
tse.subsampled
#> class: TreeSummarizedExperiment
#> dim: 12408 25
#> metadata(1): subsampleCounts_min_size
#> assays(2): counts subsampled
#> rownames(12408): 549322 255340 ... 200359 271582
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(25): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (12408 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
dim(tse)
#> [1] 19216 26
dim(tse.subsampled)
#> [1] 12408 25