These functions perform Non-negative Matrix Factorization on data stored in a TreeSummarizedExperiment object.

getNMF(x, ...)

addNMF(x, ...)

# S4 method for class 'SummarizedExperiment'
getNMF(x, k = 2, assay.type = "counts", eval.metric = "evar", ...)

# S4 method for class 'SummarizedExperiment'
addNMF(
  x,
  k = 2,
  assay.type = "counts",
  eval.metric = "evar",
  name = "NMF",
  ...
)

Arguments

x

a TreeSummarizedExperiment object.

...

optional arguments passed to nmf::NMF.

k

numeric vector. A number of latent vectors/topics. (Default: 2)

assay.type

Character scalar. Specifies which assay to use for NMF ordination. (Default: "counts")

eval.metric

Character scalar. Specifies the evaluation metric that will be used to select the model with the best fit. Must be one of the following options: "evar" (explained variance; maximized), "sparseness.basis" (degree of sparsity in the basis matrix; maximized), "sparseness.coef" (degree of sparsity in the coefficient matrix; maximized), "rss" (residual sum of squares; minimized), "silhouette.coef" (quality of clustering based on the coefficient matrix; maximized), "silhouette.basis" (quality of clustering based on the basis matrix; maximized), "cophenetic" (correlation between cophenetic distances and original distances; maximized), "dispersion" (spread of data points within clusters; minimized). (Default: "evar")

name

Character scalar. The name to be used to store the result in the reducedDims of the output. (Default: "NMF")

Value

For getNMF, the ordination matrix with feature loadings matrix as attribute "loadings".

For addNMF, a TreeSummarizedExperiment object is returned containing the ordination matrix in reducedDims(x, name) with the following attributes:

  • "loadings" which is a matrix containing the feature loadings

  • "NMF_output" which is the output of function nmf::NMF

  • "best_fit" which is the result of the best fit if k is a vector of integers

Details

The functions getNMF and addNMF internally use nmf::NMF compute the ordination matrix and feature loadings.

If k is a vector of integers, NMF output is calculated for all the rank values contained in k, and the best fit is selected based on eval.metric value.

References

Brunet J, Tamayo P, Golub TR, Mesirov JP (2004). “Metagenes and Molecular Pattern Discovery Using Matrix Factorization.” Proceedings of the National Academy of Sciences, 101(12), 4164–4169. doi:10.1073/pnas.0308531101 .

Examples

data(GlobalPatterns)
tse <- GlobalPatterns

# Reduce the number of features
tse <- agglomerateByPrevalence(tse, rank = "Phylum")

# Run NMF and add the result to reducedDim(tse, "NMF").
tse <- addNMF(tse, k = 2, name = "NMF")
#> Loading required package: registry
#> Loading required package: rngtools
#> Loading required package: cluster
#> NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 2/2
#>   To enable shared memory capabilities, try: install.extras('
#> NMF
#> ')
#> 
#> Attaching package: ‘NMF’
#> The following object is masked from ‘package:S4Vectors’:
#> 
#>     nrun
#> The following object is masked from ‘package:generics’:
#> 
#>     fit

# Extract feature loadings
loadings_NMF <- getReducedDimAttribute(tse, "NMF", "loadings")
head(loadings_NMF)
#>                         [,1]         [,2]
#> AD3             1.116660e-07 8.334903e-04
#> Acidobacteria   1.177592e-05 3.092582e-02
#> Actinobacteria  2.321099e-02 9.255390e-02
#> Armatimonadetes 2.220446e-16 4.489929e-04
#> BRC1            6.716210e-08 6.025701e-05
#> Bacteroidetes   1.751361e-01 8.184782e-02

# Estimate models with number of topics from 2 to 4. Perform 2 runs.
tse <- addNMF(tse, k = c(2, 3, 4), name = "NMF_4", nrun = 2)
#> NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 2/2
#>   To enable shared memory capabilities, try: install.extras('
#> NMF
#> ')
#> 
#> Attaching package: ‘NMF’
#> The following object is masked from ‘package:S4Vectors’:
#> 
#>     nrun
#> The following object is masked from ‘package:generics’:
#> 
#>     fit

# Extract feature loadings
loadings_NMF_4 <- getReducedDimAttribute(tse, "NMF_4", "loadings")
head(loadings_NMF_4)
#>                         [,1]         [,2]         [,3]         [,4]
#> AD3             6.578609e-04 2.220446e-16 2.220446e-16 1.853284e-07
#> Acidobacteria   2.252985e-02 2.006144e-03 1.213772e-05 2.220446e-16
#> Actinobacteria  7.341793e-02 1.021829e-02 1.958249e-02 2.220446e-16
#> Armatimonadetes 2.280797e-04 1.344906e-04 2.220446e-16 2.220446e-16
#> BRC1            3.879111e-05 9.354747e-06 2.220446e-16 6.530121e-08
#> Bacteroidetes   1.746377e-02 3.202705e-03 2.074859e-02 2.742031e-01