rarefyAssay
randomly subsamples counts within a
SummarizedExperiment
object and returns a new
SummarizedExperiment
containing the original assay and the new
subsampled assay.
optional arguments:
verbose
: Logical scalar
. Choose whether to show
messages. (Default: TRUE
)
Character scalar
. Specifies which assay to use for
calculation. (Default: "counts"
)
Deprecated. Use assay.type
instead.
Integer scalar
. Indicates the number of counts being
simulated i.e. rarefying depth. This can equal to lowest number of total
counts found in a sample or a user specified number.
Deprecated. Use sample
instead.
Logical scalar
. Whether to åperform subsampling with
replacement. Ths works similarly to sample(..., replace = TRUE)
.
(Default: FALSE
)
Character scalar
. The name for the transformed assay to
be stored. (Default: method
)
rarefyAssay
return x
with subsampled data.
Although the subsampling approach is highly debated in microbiome research,
we include the rarefyAssay
function because there may be some
instances where it can be useful.
Note that the output of rarefyAssay
is not the equivalent as the
input and any result have to be verified with the original dataset.
Subsampling/Rarefying may undermine downstream analyses and have unintended consequences. Therefore, make sure this normalization is appropriate for your data.
To maintain the reproducibility, please define the seed using set.seed() before implement this function.
When replace = FALSE
, the function uses internally
vegan::rarefy
while with replacement enabled the function utilizes
own implementation, inspired by phyloseq::rarefy_even_depth
.
McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology. 2014 Apr 3;10(4):e1003531.
Gloor GB, Macklaim JM, Pawlowsky-Glahn V & Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Frontiers in Microbiology 8: 2224. doi: 10.3389/fmicb.2017.02224
Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, Lozupone C, Zaneveld JR, Vázquez-Baeza Y, Birmingham A, Hyde ER. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017 Dec;5(1):1-8.
# When samples in TreeSE are less than specified sample, they will be
# removed. If after subsampling features are not present in any of the
# samples, they will be removed.
data(GlobalPatterns)
tse <- GlobalPatterns
set.seed(123)
tse_subsampled <- rarefyAssay(tse, sample = 60000, name = "subsampled")
#> 1 samples removed because they contained fewer reads than `sample`.
#> 6726 features removed because they are not present in any of the samples after subsampling.
tse_subsampled
#> class: TreeSummarizedExperiment
#> dim: 12490 25
#> metadata(0):
#> assays(2): counts subsampled
#> rownames(12490): 549322 522457 ... 200359 271582
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(25): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (12490 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
dim(tse)
#> [1] 19216 26
dim(assay(tse_subsampled, "subsampled"))
#> [1] 12490 25