Data containers in R/Bioconductor

Getting started

  • CSC notebook access OK?
  • R, Rstudio, R packages installation OK?
  • First task: reproducible workflow & Quarto documents (in a moment)

Data containers support collaborative workflow development

Standardized data containers

Central for the R/Bioconductor ecosystem: phyloseq, (Tree)SummarizedExperiment, MultiAssayExperiment

Data containers

phyloseq: microbiome data container

  • The first microbome data container from around 2010.

  • Has become standard for (16S) microbiome bioinformatics in R (J McMurdie, S Holmes et al.)

TreeSummarizedExperiment

New, alternative microbiome data container.

Huang et al. F1000, 2021

Current framework

  • (Tree)SummarizedExperiment for single omics
  • MultiAssayExperiment for multi-omics

Benefits

  • Reduce overlapping efforts

  • Improve interoperability

  • Ensure sustainability

  • Transparency

  • Reproducibility

  • Collaboration

Orchestrating Microbiome Analysis with R and Bioconductor – online book: beta version

Figure source: Moreno-Indias et al. (2021) Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions. Frontiers in Microbiology 12:11.

Data containers

Data containers

Data containers

Demo data

Introduction of the afternoon assignment and data set: HintikkaXOData

Task: load microbiome data

Load an example data set from the mia R package with:

library(mia)
data(HintikkaXOData)

Source: Hintikka et al. (2021). Xylo-oligosaccharides in prevention of hepatic steatosis and adipose tissue inflammation: Associating taxonomic and metabolomic patterns in fecal microbiomes with biclustering. International Journal of Environmental Research and Public Health 18(8) https://doi.org/10.3390/ijerph18084049

Task: load microbiome data

This is MultiAssayExperiment data object. Let us check what experiment it contains.

mae <- HintikkaXOData
experiments(mae)
ExperimentList class object of length 3:
 [1] microbiota: TreeSummarizedExperiment with 12706 rows and 40 columns
 [2] metabolites: TreeSummarizedExperiment with 38 rows and 40 columns
 [3] biomarkers: TreeSummarizedExperiment with 39 rows and 40 columns

Task: load microbiome data

Let us pick the microbiota data, which is TreeSummarizedExperiment object.

tse <- mae[["microbiota"]]
tse
class: TreeSummarizedExperiment 
dim: 12706 40 
metadata(0):
assays(1): counts
rownames(12706): GAYR01026362.62.2014 CVJT01000011.50.2173 ...
  JRJTB:03787:02429 JRJTB:03787:02478
rowData names(7): Phylum Class ... Species OTU
colnames(40): C1 C2 ... C39 C40
colData names(0):
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
rowLinks: NULL
rowTree: NULL
colLinks: NULL
colTree: NULL

Open microbiome data resources

Open microbiome data resources supporting TreeSummarizedExperiment:

See also OMA chapter on available data sets

Julia packages