Run (joint) robust principal component analysis (RPCA)

These functions implement robust principal component analysis (RPCA) for single tables and joint RPCA for multiple tables. *RPCA functions run RPCA for single table. *JointRPCA runs the analysis jointly for multiple tables. get* functions return the results of the analysis while add* adds them to the input object.

getRPCA(x, ...)

addRPCA(x, ...)

getJointRPCA(x, ...)

addJointRPCA(x, ...)

# S4 method for class 'SingleCellExperiment'
getRPCA(x, ...)

# S4 method for class 'SummarizedExperiment'
getRPCA(x, assay.type = "counts", ...)

# S4 method for class 'SummarizedExperiment'
addRPCA(x, name = "RPCA", ...)

# S4 method for class 'MultiAssayExperiment'
getJointRPCA(x, experiments, assay.types, ...)

# S4 method for class 'SingleCellExperiment'
getJointRPCA(x, experiments, assay.types, ...)

# S4 method for class 'SingleCellExperiment'
addJointRPCA(x, name = "JointRPCA", ...)

# S4 method for class 'MultiAssayExperiment'
addJointRPCA(x, name = "JointRPCA", ...)

Arguments

x

a SummarizedExperiment object.

...

additional arguments:

ncomponents:Integer scalar. The number of components to estimate. (Default: 3)
max.iterations:Integer scalar. The number of iterations run in OptSpace algorithm. (Default: 3)
tolerance:Numeric scalar. Accepted error between lower rank representation obtained by OptSpace algorithm and original table. (Default: 3)

assay.type

Character scalar. Specifies the name of assay used in calculation. (Default: "counts")

name

Character scalar. Name of results stored in x. (Default: "RPCA" or "JointRPCA" depending on the method)

experiments

Character vector or integer scalar. Names or indices of experiments selected from x.

assay.types

Character vector. Names assays selected from experiments.

Value

matrix, TreeSummarizedExperiment or MultiAssayExperiment object.

Details

These functions perform robust principal component analysis (RPCA) using a low-rank matrix approximation followed by principal component analysis. Missing values are handled using matrix completion based on the OptSpace algorithm.

Single-table RPCA

For a single assay, the workflow is:

Extract the selected assay matrix
Estimate a low-rank approximation of the matrix using the OptSpace algorithm, which reconstructs missing values and reduces noise.
Apply double centering (row and column centering).
Apply PCA (via singular value decomposition) to the centered matrix.
Return the sample coordinates in PCA space together with additional information such as loadings, explained variance, and pairwise distances.

Joint RPCA

When multiple assays are provided, joint RPCA estimates a shared low-rank representation across all tables. Each table may contain different features, but they must share the same samples.

The workflow is:

Extract the selected experiments and assays.
Estimate a joint low-rank representation using a joint OptSpace algorithm that learns shared sample factors while allowing table-specific feature loadings.
Apply double centering (row and column centering).
Perform PCA on the centered matrix. to obtain the final sample coordinates.
Return the sample coordinates in PCA space together with additional information such as loadings, explained variance, and pairwise distances.

To assess generalization, the joint RPCA procedure optionally splits the samples into training and test sets. The low-rank model is learned using the training data and the test samples are projected into the resulting PCA space. Reconstruction error for the test set is returned as a measure of model fit.

The returned object contains the PCA sample scores as the main result, with additional information (e.g., rotation matrix, explained variance, reconstructed matrix, and reconstruction error) stored in attributes.

References

Martino, C. and Shenhav, L. et al. (2020) Context-aware dimensionality reduction deconvolutes gut microbial community dynamics. Nat. Biotechnol. doi:10.1038/s41587-020-0660-7

Author

The RPCA method is reported in Martino et al. (2020) and the R/Bioconductor implementation utilizes the robust Aitchison distance from vegan::decostand.

The Joint-RPCA method was adapted from the original Python-based implementation in biocore/Gemelli by Bianca Cordazzo Vargas, Liat Shenhav, and Cameron Martino. The R/Bioconductor implementation was subsequently prepared by Aituar Bektanov, Tuomas Borman, and Leo Lahti.

Examples

data("ibdmdb")
mae <- ibdmdb

# Apply data transformations. With impute=FALSE, missing values are preserved
# and not imputed.
mae[[1]] <- transformAssay(
    mae[[1]], assay.type = "mgx", method = "rclr", impute = FALSE)
mae[[2]] <- transformAssay(
    mae[[2]], assay.type = "mtx", method = "rclr", impute = FALSE)

# Run joint-RPCA
res <- getJointRPCA(
    mae,
    experiments = c(1, 2),
    assay.types = c("rclr", "rclr")
)

# Run RPCA for single experiment
res <- getRPCA(mae[[1]], assay.type = "rclr")