12 Multi-Assay Analyses

library(mia)

Multi-omics approaches integrate data from multiple sources. For example, we can integrate taxonomic abundance profiles with metabolomic or other biomolecular profiling data to observe associations, make predictions, or aim at causal inferences. Integrating evidence across multiple sources can lead to enhanced predictions, more holistic understanding, or facilitate the discovery of novel biomarkers. In this section we demonstrate common multi-assay data integration tasks.

Cross-correlation analysis is a straightforward approach that can reveal strength and type of associations between data sets. For instance, we can analyze if higher presence of a specific taxon relates to higher levels of a biomolecule.

The analyses can be facilitated by the multi-assay data containers, TreeSummarizedExperiment and MultiAssayExperiment. These are scalable and contain different types of data in a single container, making this framework particularly suited for multi-assay microbiome data incorporating different types of complementary data sources in a single, reproducible workflow. Solutions to a number of data integration problems are discussed in more detail in Section Chapter 2. An alternative experiment can be stored in altExp slot of the SE data container. Alternatively, both experiments can be stored side-by-side in an MAE data container (see sections Section 2.2.5 and Section 2.2.6 to learn more about altExp and MAE objects, respectively). Different experiments are first imported as single-assay data containers similarly to the case in which only one experiment is present. After that, the different experiments can be combined into one multi-assay data container. The result is a MAE object with multiple experiments in its experiment slot, or a TreeSE object with alternative experiments in the altExp slot.

As an example, we use a dataset from the following publication: (2021) Xylo-Oligosaccharides in prevention of hepatic steatosis and adipose tissue inflammation: associating taxonomic and metabolomic patterns in fecal microbiota with biclustering. In this study, mice were fed either with a high-fat or a low-fat diet, and with or without prebiotics, for the purpose of studying whether prebiotics attenuate the negative impact of a high-fat diet on health.

Hintikka, Jukka, Sanna Lensu, Elina Mäkinen, Sira Karvinen, Marjaana Honkanen, Jere Lindén, Tim Garrels, Satu Pekkala, and Leo Lahti. 2021. “Xylo-Oligosaccharides in Prevention of Hepatic Steatosis and Adipose Tissue Inflammation: Associating Taxonomic and Metabolomic Patterns in Fecal Microbiomes with Biclustering.” International Journal of Environmental Research and Public Health 18 (8): 4049.

This example data can be loaded from microbiomeDataSets. The data is already in MAE format. It includes three different experiments: microbial abundance data, metabolite concentrations, and data about different biomarkers. If you like to construct the same data object from the original files instead, here you can find help for importing data into an SE object.

# Load the data
data(HintikkaXOData, package = "mia")
mae <- HintikkaXOData

library(stringr)
# Drop those bacteria that do not include information in Phylum or lower levels
mae[[1]] <- mae[[1]][!is.na(rowData(mae[[1]])$Phylum), ]
# Clean taxonomy data, so that names do not include additional characters
rowData(mae[[1]]) <- DataFrame(apply(rowData(mae[[1]]), 2, 
                                     str_remove, pattern = "._[0-9]__"))

# Available alternative experiments
experiments(mae)
##  ExperimentList class object of length 3:
##   [1] microbiota: TreeSummarizedExperiment with 12613 rows and 40 columns
##   [2] metabolites: TreeSummarizedExperiment with 38 rows and 40 columns
##   [3] biomarkers: TreeSummarizedExperiment with 39 rows and 40 columns

# Microbiome data
getWithColData(mae, "microbiota")
##  class: TreeSummarizedExperiment 
##  dim: 12613 40 
##  metadata(0):
##  assays(1): counts
##  rownames(12613): GAYR01026362.62.2014 CVJT01000011.50.2173 ...
##    JRJTB:03787:02429 JRJTB:03787:02478
##  rowData names(7): Phylum Class ... Species OTU
##  colnames(40): C1 C2 ... C39 C40
##  colData names(6): Sample Rat ... Fat XOS
##  reducedDimNames(0):
##  mainExpName: NULL
##  altExpNames(0):
##  rowLinks: NULL
##  rowTree: NULL
##  colLinks: NULL
##  colTree: NULL

# Metabolite data
getWithColData(mae, "metabolites")
##  class: TreeSummarizedExperiment 
##  dim: 38 40 
##  metadata(0):
##  assays(1): nmr
##  rownames(38): Butyrate Acetate ... Malonate 1,3-dihydroxyacetone
##  rowData names(0):
##  colnames(40): C1 C2 ... C39 C40
##  colData names(6): Sample Rat ... Fat XOS
##  reducedDimNames(0):
##  mainExpName: NULL
##  altExpNames(0):
##  rowLinks: NULL
##  rowTree: NULL
##  colLinks: NULL
##  colTree: NULL

# Biomarker data
getWithColData(mae, "biomarkers")
##  class: TreeSummarizedExperiment 
##  dim: 39 40 
##  metadata(0):
##  assays(1): signals
##  rownames(39): Triglycerides_liver CLSs_epi ... NPY_serum
##    Glycogen_liver
##  rowData names(0):
##  colnames(40): C1 C2 ... C39 C40
##  colData names(6): Sample Rat ... Fat XOS
##  reducedDimNames(0):
##  mainExpName: NULL
##  altExpNames(0):
##  rowLinks: NULL
##  rowTree: NULL
##  colLinks: NULL
##  colTree: NULL

12.1 Cross-correlation Analysis

Next we can perform a cross-correlation analysis. Let us analyze if individual bacteria genera are correlated with concentrations of individual metabolites. This helps to answer the following question: “If bacterium X is present, is the concentration of metabolite Y lower or higher”?

# Agglomerate microbiome data at family level
mae[[1]] <- mergeFeaturesByPrevalence(mae[[1]], rank = "Family")
# Does log10 transform for microbiome data
mae[[1]] <- transformAssay(mae[[1]], method = "log10", pseudocount = TRUE)

# Give unique names, so that we do not have problems when we are creating a plot
rownames(mae[[1]]) <- getTaxonomyLabels(mae[[1]])

# Cross correlates data sets
correlations <- getCrossAssociation(mae, 
                                    experiment1 = 1,
                                    experiment2 = 2,
                                    assay.type1 = "log10", 
                                    assay.type2 = "nmr",
                                    method = "spearman",
                                    test.signif = TRUE,
                                    p_adj_threshold = NULL,
                                    cor_threshold = NULL,
                                    # Remove when mia is fixed
                                    mode = "matrix",
                                    sort = TRUE,
                                    show.warnings = FALSE)

Next, we create a heatmap depicting all cross-correlations between bacterial genera and metabolite concentrations.

library(ComplexHeatmap) 

# Create a heatmap and store it
plot <- Heatmap(correlations$cor,
                # Print values to cells
                cell_fun = function(j, i, x, y, width, height, fill) {
                    # If the p-value is under threshold
                    if( !is.na(correlations$p_adj[i, j]) & correlations$p_adj[i, j] < 0.05 ){
                        # Print "X"
                        grid.text(sprintf("%s", "X"), x, y, gp = gpar(fontsize = 10, col = "#1dff00"))
                        }
                    },
                heatmap_legend_param = list(title = "", legend_height = unit(5, "cm"))
                )
plot

12.2 Multi-Omics Factor Analysis

Multi-Omics Factor Analysis (MOFA) is an unsupervised method for integrating multi-omic data sets in a downstream analysis (Argelaguet 2018). It could be seen as a generalization of principal component analysis. Yet, with the ability to infer a latent (low-dimensional) representation, shared among the multiple (-omics) data sets in hand.

Argelaguet, Ricard et al. 2018. “Multi-Omics Factor Analysis—a Framework for Unsupervised Integration of Multi-Omics Data Sets.” Molecular Systems Biology 14 (6): e8124. https://doi.org/10.15252/msb.20178124.

We use the R MOFA2 package for the analysis, and install the corresponding dependencies.

library(MOFA2)

# For inter-operability between Python and R, and setting Python dependencies,
# reticulate package is needed
library(reticulate)
# Let us assume that these have been installed already.
#reticulate::install_miniconda(force = TRUE)
#reticulate::use_miniconda(condaenv = "env1", required = FALSE)
#reticulate::py_install(packages = c("mofapy2"), pip = TRUE, python_version=3.6)

The mae object could be used straight to create the MOFA model. Yet, we transform our assays since the model assumes normality per default. We can also use Poisson or Bernoulli distributions among others.

Note that duplicates, such as “uncultured”, might appear when aggregating the microbiome data by a taxonomic rank. To check for duplicates, run any(duplicated(rownames(mae[[1]]))). If it returns TRUE, then the duplicates are present. We can add rownames(mae[[1]]) <- getTaxonomyLabels(mae[[1]], make.unique=TRUE) to remove them.

library(MOFA2)
# For simplicity, classify all high-fat diets as high-fat, and all the low-fat 
# diets as low-fat diets
colData(mae)$Diet <- ifelse(colData(mae)$Diet == "High-fat" | 
                              colData(mae)$Diet == "High-fat + XOS", 
                            "High-fat", "Low-fat")

# Transforming microbiome data with rclr
mae[[1]] <- transformAssay(mae[[1]], method = "relabundance")
mae[[1]] <- transformAssay(mae[[1]], assay.type = "relabundance", method = "rclr")

# Transforming metabolomic data with log10
mae[[2]] <- transformAssay(mae[[2]], assay.type = "nmr",
                            MARGIN = "cols",
                            method = "log10")

# Transforming biomarker data with standardize-transform
mae[[3]] <- transformAssay(mae[[3]], assay.type = "signals",
                           MARGIN = "rows",
                           method = "standardize", pseudocount = 1)

# Removing the assays no longer needed
assay(mae[[1]], "counts") <- NULL
assay(mae[[1]], "log10") <- NULL
assay(mae[[2]], "nmr") <- NULL
assay(mae[[3]], "signals") <- NULL

# Building our mofa model
model <- create_mofa_from_MultiAssayExperiment(mae,
                                               groups = "Diet", 
                                               extract_metadata = TRUE)
model

Model options can be defined as follows:

model_opts <- get_default_model_options(model)
model_opts$num_factors <- 5
head(model_opts)

Training options for the model are defined in the following way:

train_opts <- get_default_training_options(model)
head(train_opts)

The model is then prepared with prepare_mofa and trained with run_mofa:

model.prepared <- prepare_mofa(
  object = model,
  model_options = model_opts
)

# Some systems may require the specification `use_basilisk = TRUE`
# so it has been added to the following code
model.trained <- run_mofa(model.prepared, use_basilisk = TRUE)

The explained variance is visualized with the plot_variance_explained function:

library(patchwork)
library(ggplot2)

plot_list <- plot_variance_explained(model.trained,
                                     x = "view", y = "factor",
                                     plot_total = TRUE)

wrap_plots(plot_list, nrow = 2) +
  plot_annotation(title = "Variance Explained per factor and assay",
                  theme = theme(plot.title = element_text(hjust = 0.5)))

The top weights for each assay using all five factors:

custom_plotter <- function(name) {
  
  p <- plot_top_weights(model.trained,
                        view = name,
                        factors = "all",
                        nfeatures = 10) +
    labs(title = paste0("Top weights of the ", name, " assay"))
  
}

plot_list <- lapply(c("microbiota", "metabolites", "biomarkers"), custom_plotter)

wrap_plots(plot_list, nrow = 3) & theme(text = element_text(size = 8))

More tutorials and examples of using the package are found at link