2 miaverse
This chapter provides an overview of the miaverse ecosystem. Section 2.1 aims to describe the relationship between data containers utilized in miaverse
. Section 2.2 details the packages involved, while Section 2.3 provides guidance on installing these packages.
miaverse (MIcrobiome Analysis uniVERSE) is an actively developed R/Bioconductor framework for microbiome downstream analysis. It becomes particularly relevant when working with abundance tables derived from sequencing data, whether from shotgun metagenomics or 16S rRNA sequencing. Before utilizing miaverse, sequencing data must undergo preprocessing to convert raw sequence reads into abundance tables.
miaverse
consists of multiple R/Bioc packages and this online book that you are reading. The idea is not only to offer tools for microbiome downstream analysis but also to serve as a resource for valuable insights, offering guidance on conducting microbiome data analysis and developing effective microbiome data science workflows.
The key concept of miaverse lies in its utilization of SummarizedExperiment
-based data containers. This design choice enhances interoperability and versatility within the broader Bioconductor framework, facilitating access to an expanding array of tools. In practice, this approach allows for the integration of promising methods from related fields, such as single-cell sequencing.
2.1 Data containers
As discussed, miaverse is built upon TreeSummarizedExperiment (TreeSE)
data container. TreeSummarizedExperiment
is expanded from SingleCellExperiment (SCE)
by incorporating additional slots tailored for microbiome analysis. SingleCellExperiment
class is designed for single-cell sequencing (Lun and Risso 2020). Bioconductor offers wide variety of tools for this field including online book Orchestrating Single-Cell Analysis in Bioconductor (OSCA) (Amezquita et al. 2020). SingleCellExperiment
, on the other hand, is further derived from SummarizedExperiment (SE)
class. This hierarchical relationship among data containers means that all methods applicable to SingleCellExperiment
and SummarizedExperiment
objects can also be applied to TreeSummarizedExperiment
objects.
SummarizedExperiment
(SE
) (Morgan et al. 2020) is a generic and highly optimized container for complex data structures. It has become a common choice for analyzing various types of biomedical profiling data, such as RNAseq, ChIp-Seq, microarrays, flow cytometry, proteomics, and single-cell sequencing.SingeCellExperiment
(SCE
) (Lun and Risso 2020) was developed as an extension to store copies of data to same data container.TreeSummarizedExperiment
(TreeSE
) (R. Huang 2020) was developed as an extension to incorporate hierarchical information (such as phylogenetic trees and sample hierarchies) and reference sequences.
MultiAssayExperiment
(MAE
) (Ramos et al. 2017) provides an organized way to bind several different data containers together in a single object. For example, we can bind microbiome data (in TreeSE
container) with metabolomic profiling data (in SE
) container, with (partially) shared sample metadata. This is convenient and robust for instance, in subsetting and other data manipulation tasks. Microbiome data can be part of multiomics experiments and analysis strategies. We highlight how the methods used throughout in this book relate to this data framework by using the TreeSE
, MAE
, and classes beyond.
2.2 Package ecosystem
Methods for the(Tree)SummarizedExperiment
and MultiAssayExperiment
data containers are provided by multiple independent developers through R/Bioconductor packages. Some of these are listed below (tips on new packages are welcome).
Especially, Bioconductor packages include comprehensive manuals as they are required. Follow the links below to find package vignettes and other materials showing the utilization of packages and their methods.
2.2.1 mia package family
The mia
package family provides general methods for microbiome data wrangling, analysis and visualization.
- mia: Microbiome analysis tools (Ernst, Shetty, and Lahti 2020)
- miaViz: Microbiome analysis specific visualization (Ernst, Borman, and Lahti 2022)
- miaSim: Microbiome data simulations (Simsek et al. 2021)
- miaTime: Microbiome time series analysis (Lahti 2021)
2.2.2 SE supporting packages
The following DA methods support (Tree)SummarizedExperiment
.
- ANCOMBC (Lin and Peddada 2020) for differential abundance analysis
- benchdamic (Calgaro et al. 2022) for benchmarking differential abundance methods
- ALDEx2 (Gloor, Macklaim, and Fernandes 2016) for differential abundance analysis
2.2.3 Other relevant packages
- MGnifyR for accessing and processing MGnify data in R
- LinDA(Zhou et al. 2022) for differential abundance analysis
- vegan (Oksanen et al. 2020) for community ecologists
- CBEA (Nguyen QP, n.d.) for taxonomic enrichment analysis
- microSTASIS (Sánchez-Sánchez, Santonja, and Benítez-Páez 2022) for microbiota stability assessment via iterative clustering
- PLSDAbatch (Wang and Lê Cao 2023) for batch effect correction
- treeclimbR (S. Huang Ruizhu 2021) for finding optimal signal levels in a tree
- dar for differential abundance testing
- iSEEtree for interactive visualisation of microbiome data
- philr (Silverman et al. (2017)) phylogeny-aware phILR transformation
- IntegratedLearner for multiomics classification and prediction
- MicrobiotaProcess (Xu et al. 2023) for the “tidy” analysis of microbiome and other ecological data
-
Tools for Microbiome Analysis site listed over 130 R packages for microbiome data science in 2023. Many of these are not in Bioconductor, or do not directly support the data containers used in this book but can be often used with minor modifications. The phyloseq-based tools can be used by converting the TreeSE data into phyloseq with
convertToPhyloseq()
(see Chapter 5).
2.2.4 Open microbiome data
Hundreds of published microbiome datasets are readily available in these data containers (see Section 4.2).
2.3 Installation
2.3.1 Installing all packages
You can install all packages that are required to run every example in this book via the following command:
remotes::install_github("microbiome/OMA", dependencies = TRUE, upgrade = TRUE)
Optionally, you can install all packages or just certain ones with the following script.
#|
# URL of the raw CSV file on GitHub. It includes all packages needed.
url <- "https://raw.githubusercontent.com/microbiome/OMA/devel/oma_packages/oma_packages.csv"
# Read the CSV file directly into R
df <- read.csv(url)
packages <- df[[1]]
# Get packages that are already installed installed
packages_already_installed <- packages[ packages %in% installed.packages() ]
# Get packages that need to be installed
packages_need_to_install <- setdiff( packages, packages_already_installed )
# Loads BiocManager into the session. Install it if it not already installed.
if( !require("BiocManager") ){
install.packages("BiocManager")
library("BiocManager")
}
# If there are packages that need to be installed, installs them with BiocManager
# Updates old packages.
if( length(packages_need_to_install) > 0 ) {
install(packages_need_to_install, ask = FALSE)
}
# Load all packages into session. Stop if there are packages that were not
# successfully loaded
pkgs_not_loaded <- !sapply(packages, require, character.only = TRUE)
pkgs_not_loaded <- names(pkgs_not_loaded)[ pkgs_not_loaded ]
if( length(pkgs_not_loaded) > 0 ){
stop("Error in loading the following packages into the session: '", paste0(pkgs_not_loaded, collapse = "', '"), "'")
}
2.3.2 Installing specific packages
You can install R packages of your choice with the following procedures.
Bioconductor release version is the most stable and tested version but may miss some of the latest methods and updates.
BiocManager::install("microbiome/mia")
Bioconductor development version requires the installation of the latest R beta version. This is primarily recommended for those who already have experience with R/Bioconductor and need access to the latest updates.
BiocManager::install("microbiome/mia", version = "devel")
Github development version provides access to the latest but potentially unstable features. This is useful when you want access to all available tools.
devtools::install_github("microbiome/mia")
2.3.3 Troubleshoot in installing
If you encounter installation issue related to package dependencies please see the troubleshoot page here and Chapter 30.
-
TreeSummarizedExperiment
is derived fromSummarizedExperiment
class. -
miaverse
is based onTreeSummarizedExperiment
data container. - We can borrow methods from packages utilizing
SingleCellExperiment
andSummarizedExperiment
.