rarefyAssay randomly subsamples counts within a SummarizedExperiment object and returns a new SummarizedExperiment containing the original assay and the new subsampled assay.

rarefyAssay(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  sample = min_size,
  min_size = min(colSums2(assay(x))),
  replace = TRUE,
  name = "subsampled",
  verbose = TRUE,
  ...
)

# S4 method for class 'SummarizedExperiment'
rarefyAssay(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  sample = min_size,
  min_size = min(colSums2(assay(x, assay.type))),
  replace = TRUE,
  name = "subsampled",
  verbose = TRUE,
  ...
)

Arguments

x

TreeSummarizedExperiment.

assay.type

Character scalar. Specifies which assay to use for calculation. (Default: "counts")

assay_name

Deprecated. Use assay.type instead.

sample

Integer scalar. Indicates the number of counts being simulated i.e. rarefying depth. This can equal to lowest number of total counts found in a sample or a user specified number.

min_size

Deprecated. Use sample instead.

replace

Logical scalar. The default is with replacement (replace=TRUE). See phyloseq::rarefy_even_depth for details on implications of this parameter. (Default: TRUE)

name

Character scalar. A name for the column of the colData where results will be stored. (Default: "method")

verbose

Logical scalar. Choose whether to show messages. (Default: TRUE)

...

additional arguments not used

Value

rarefyAssay return x with subsampled data.

Details

Although the subsampling approach is highly debated in microbiome research, we include the rarefyAssay function because there may be some instances where it can be useful. Note that the output of rarefyAssay is not the equivalent as the input and any result have to be verified with the original dataset.

Subsampling/Rarefying may undermine downstream analyses and have unintended consequences. Therefore, make sure this normalization is appropriate for your data.

To maintain the reproducibility, please define the seed using set.seed() before implement this function.

References

McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology. 2014 Apr 3;10(4):e1003531.

Gloor GB, Macklaim JM, Pawlowsky-Glahn V & Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Frontiers in Microbiology 8: 2224. doi: 10.3389/fmicb.2017.02224

Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, Lozupone C, Zaneveld JR, Vázquez-Baeza Y, Birmingham A, Hyde ER. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017 Dec;5(1):1-8.

Examples

# When samples in TreeSE are less than specified sample, they will be removed.
# If after subsampling features are not present in any of the samples, 
# they will be removed.
data(GlobalPatterns)
tse <- GlobalPatterns
set.seed(123)
tse_subsampled <- rarefyAssay(tse, sample = 60000, name = "subsampled")
#> 1 samples removed because they contained fewer reads than `sample`.
#> 6808 features removed because they are not present in all samples after subsampling.
tse_subsampled
#> class: TreeSummarizedExperiment 
#> dim: 12408 25 
#> metadata(1): rarefyAssay_sample
#> assays(2): counts subsampled
#> rownames(12408): 549322 255340 ... 200359 271582
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(25): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (12408 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
dim(tse)
#> [1] 19216    26
dim(assay(tse_subsampled, "subsampled"))
#> [1] 12408    25