These functions perform Latent Dirichlet Allocation on data stored in a
TreeSummarizedExperiment
object.
getLDA(x, ...)
addLDA(x, ...)
# S4 method for class 'SummarizedExperiment'
getLDA(x, k = 2, assay.type = "counts", eval.metric = "perplexity", ...)
# S4 method for class 'SummarizedExperiment'
addLDA(x, k = 2, assay.type = "counts", name = "LDA", ...)a
TreeSummarizedExperiment
object.
optional arguments passed to LDA
Integer vector. A number of latent vectors/topics.
(Default: 2)
Character scalar. Specifies which assay to use for
LDA ordination. (Default: "counts")
Character scalar. Specifies evaluation metric that
will be used to select the model with the best fit. Must be either
"perplexity" (topicmodels::perplexity) or "coherence"
(topicdoc::topic_coherence, the best model is selected based on mean
coherence). (Default: "perplexity")
Character scalar. The name to be used to store the result
in the reducedDims of the output. (Default: "LDA")
For getLDA, the ordination matrix with feature loadings matrix
as attribute "loadings".
For addLDA, a
TreeSummarizedExperiment
object is returned containing the ordination matrix in
reducedDim(..., name) with feature loadings matrix as attribute
"loadings".
The functions getLDA and addLDA internally use
LDA to compute the ordination matrix and
feature loadings.
data(GlobalPatterns)
tse <- GlobalPatterns
# Reduce the number of features
tse <- agglomerateByPrevalence(tse, rank="Phylum")
# Run LDA and add the result to reducedDim(tse, "LDA")
tse <- addLDA(tse)
# Extract feature loadings
loadings <- getReducedDimAttribute(tse, "LDA", "loadings")
head(loadings)
#> 1 2
#> AD3 9.984562e-04 1.280627e-14
#> Acidobacteria 3.283583e-02 6.378740e-03
#> Actinobacteria 5.602671e-02 1.313161e-01
#> Armatimonadetes 1.812458e-04 5.387297e-04
#> BRC1 5.580952e-05 2.486462e-05
#> Bacteroidetes 2.811754e-01 8.897488e-02
# Estimate models with number of topics from 2 to 10
tse <- addLDA(tse, k = c(2, 3, 4, 5, 6, 7, 8, 9, 10), name = "LDA_10")
# Get the evaluation metrics
tab <- getReducedDimAttribute(tse, "LDA_10","eval_metrics")
# Plot
plot(tab[["k"]], tab[["perplexity"]], xlab = "k", ylab = "perplexity")