Latent Dirichlet Allocation

These functions perform Latent Dirichlet Allocation on data stored in a TreeSummarizedExperiment object.

getLDA(x, ...)

addLDA(x, ...)

# S4 method for class 'SummarizedExperiment'
getLDA(x, k = 2, assay.type = "counts", eval.metric = "perplexity", ...)

# S4 method for class 'SummarizedExperiment'
addLDA(x, k = 2, assay.type = "counts", name = "LDA", ...)

Arguments

x: a TreeSummarizedExperiment object.
...: optional arguments passed to LDA
k: Integer vector. A number of latent vectors/topics. (Default: 2)
assay.type: Character scalar. Specifies which assay to use for LDA ordination. (Default: "counts")
eval.metric: Character scalar. Specifies evaluation metric that will be used to select the model with the best fit. Must be either "perplexity" (topicmodels::perplexity) or "coherence" (topicdoc::topic_coherence, the best model is selected based on mean coherence). (Default: "perplexity")
name: Character scalar. The name to be used to store the result in the reducedDims of the output. (Default: "LDA")

Value

For getLDA, the ordination matrix with feature loadings matrix as attribute "loadings".

For addLDA, a TreeSummarizedExperiment object is returned containing the ordination matrix in reducedDim(..., name) with feature loadings matrix as attribute "loadings".

Details

The functions getLDA and addLDA internally use LDA to compute the ordination matrix and feature loadings.

Examples

data(GlobalPatterns)
tse <- GlobalPatterns

# Reduce the number of features
tse <- agglomerateByPrevalence(tse, rank="Phylum")

# Run LDA and add the result to reducedDim(tse, "LDA")
tse <- addLDA(tse)

# Extract feature loadings
loadings <- getReducedDimAttribute(tse, "LDA", "loadings")
head(loadings)
#>                            1            2
#> AD3             1.716339e-07 0.0014797888
#> Acidobacteria   2.599123e-05 0.0548929864
#> Actinobacteria  2.923594e-02 0.1690892787
#> Armatimonadetes 8.806890e-06 0.0007842721
#> BRC1            1.221643e-07 0.0001069415
#> Bacteroidetes   2.916072e-01 0.0773615571

# Estimate models with number of topics from 2 to 10
tse <- addLDA(tse, k = c(2, 3, 4, 5, 6, 7, 8, 9, 10), name = "LDA_10")
# Get the evaluation metrics
tab <- getReducedDimAttribute(tse, "LDA_10","eval_metrics")
# Plot
plot(tab[["k"]], tab[["perplexity"]], xlab = "k", ylab = "perplexity")