Orchestrating Microbiome Analysis with Bioconductor

Bioconductor Africa Seminar Series, May 2026

Wednesday, May 20, 2026

About me

Tuomas Borman

  • Doctoral Researcher at the University of Turku, Finland (Turku Data Science Group)

  • Developing microbiome data science R/Bioconductor packages

European Bioconductor Conference 2026 in Turku!

Topics

Week 1. Ecosystems
Week 2. End-to-end workflow
Week 3. Statistical analysis

Microbiome data science ecosystems

Bioconductor

  • Community-driven, global open-source project
  • Started in 2001 from genomics
  • High impact across bioinformatics
Bioconductor logo.

Collaborative development

Collaborative development

  • Software & data repository
  • Tutorials & documentation
  • Community support & events

Open (microbiome) data science ecosystems

Bioconductor software

  • ~2,300 R packages
  • Review, testing, documentation

Bioconductor ecosystems

Orchestrating Microbiome Analysis, OMA

  • Microbiome data science ecosystem
  • “Downstream”, statistical analysis
  • Microbiome Analysis (mia): > 15,000 yearly downloads from distinct IPs (top 8.05% in Bioconductor)

Bioconductor sticker mia logo

Benefits

  1. Tightly linked with broader Bioconductor ecosystem
  2. Improved scalability
  3. Improved multi-table integration

Community-driven ecosystem of tools

mia logo. MGnifyR logo. HoloFoodR logo. iSEE logo. MAE logo. SE logo. SCE logo. scater logo. benchdamic logo. netcomi logo. radEmu logo. DESeq2 logo. Biobakery logo. anansi logo.

Orchestrating Microbiome Analysis with Bioconductor

  • Resources and tutorials for microbiome analysis
  • Community-built best practices
  • Open to contributions!

Go to the Orchestrating Microbiome Analysis (OMA) online book

Take home message

  1. Community is the key

Data containers

Data containers

  • The core of software
  • Structured, standardized way to manage complex data
  • Enables modular, efficient workflows

Optimal container for microbiome data?

Optimal container for microbiome data?

  • Multiple assays: seamless interlinking

Optimal container for microbiome data?

  • Multiple assays: seamless interlinking
  • Hierarchical data: supporting samples & features

Optimal container for microbiome data?

  • Multiple assays: seamless interlinking
  • Hierarchical data: supporting samples & features
  • Side information: extended capabilities & data types

Optimal container for microbiome data?

  • Multiple assays: seamless interlinking
  • Hierarchical data: supporting samples & features
  • Side information: extended capabilities & data types
  • Optimized: for speed & memory

Optimal container for microbiome data?

  • Multiple assays: seamless interlinking
  • Hierarchical data: supporting samples & features
  • Side information: extended capabilities & data types
  • Optimized: for speed & memory
  • Integrated: with other applications & frameworks

TreeSummarizedExperiment

(Huang et al. 2021)

TreeSummarizedExperiment class

SummarizedExperiment

(Huber et al. 2015)

Take home messages

  1. Community is the key
  2. (Tree)SummarizedExperiment is a structured way to handle complex (biological) data
TreeSummarizedExperiment class

Orchestrating Microbiome Analysis with Bioconductor, session 2/3

Bioconductor Africa Seminar Series, May 2026

Topics

Week 1. Ecosystems
Week 2. End-to-end workflow
Week 3. Statistical analysis

Take home messages from the session 1

  1. Community is the key
  2. (Tree)SummarizedExperiment is a structured way to handle complex (biological) data
TreeSummarizedExperiment class

End-to-end workflow

  1. Sampling and sequencing
  2. Upstream, taxonomy mapping
  3. Downstream / statistical analysis

Microbiome data science workflow

Microbiome data science workflow

Data generation

  • Different study systems
  • Sample collection
  • Sequencing

Microbiome data science workflow

Microbiome data

  1. Who there are? (16S & shotgun)
  2. What they can do? (shotgun)
  3. What they do? (transcriptomics & metabolomics)

16S amplicon sequencing

  • Targets the 16S rRNA marker gene
  • Cost-effective and mature reference databases
  • Genus is the lowest taxonomy resolution

Source: Wikipedia

Shotgun metagenomic sequencing

  • Random sequencing of all DNA fragments in a sample
  • Higher resolution and functional potential
  • More expensive

Source: france-genomique.org/technological-expertises/metagenomics/shotgun-metagenomics

Microbiome data science workflow

Upstream bioinformatics

  • Input: FASTQ files
  • Output: abundance table

16S

Shotgun

Data to import

  • Abundance table (from taxonomy annotation pipeline)
  • Taxonomy table (from taxonomy annotation pipeline)
  • Phylogeny (from used database)
  • Sample metadata

Microbiome data science workflow

TreeSummarizedExperiment

(Huang et al. 2021)

TreeSummarizedExperiment class

Demonstration

Example data: (Gupta et al., mSystems 2019)

library(mia)
library(ape)

# Import data
abundance_table <- read.csv("taxonomy_abundance.csv", row.names = 1)
taxonomy_table <- read.csv("taxonomy_table.csv", row.names = 1)
sample_meta <- read.csv("sample_metadata.csv", row.names = 1)
tree <- read.tree("phylogeny.tree")

library(mia)
library(ape)

# Import data
abundance_table <- read.csv("taxonomy_abundance.csv", row.names = 1)
taxonomy_table <- read.csv("taxonomy_table.csv", row.names = 1)
sample_meta <- read.csv("sample_metadata.csv", row.names = 1)
tree <- read.tree("phylogeny.tree")

# Abundance table must be a matrix
abundance_table <- abundance_table |> as.matrix()

# Construct TreeSE
tse <- TreeSummarizedExperiment(
    assays = list(counts = abundance_table),
    rowData = taxonomy_table,
    colData = sample_meta,
    rowTree = tree
)

# Construct TreeSE
tse <- TreeSummarizedExperiment(
    assays = list(counts = abundance_table),
    rowData = taxonomy_table,
    colData = sample_meta,
    rowTree = tree
)

print(tse)
class: TreeSummarizedExperiment 
dim: 19216 26 
metadata(0):
assays(1): counts
rownames(19216): 549322 522457 ... 200359 271582
rowData names(7): Kingdom Phylum ... Genus Species
colnames(26): CL3 CC1 ... Even2 Even3
colData names(7): X.SampleID Primer ... SampleType Description
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
rowLinks: a LinkDataFrame (19216 rows)
rowTree: 1 phylo tree(s) (19216 leaves)
colLinks: NULL
colTree: NULL

tse <- agglomerateByRanks(tse)
class: TreeSummarizedExperiment 
dim: 19216 26 
metadata(0):
assays(1): counts
rownames(19216): 549322 522457 ... 200359 271582
rowData names(7): Kingdom Phylum ... Genus Species
colnames(26): CL3 CC1 ... Even2 Even3
colData names(7): X.SampleID Primer ... SampleType Description
reducedDimNames(0):
mainExpName: NULL
altExpNames(7): Kingdom Phylum ... Genus Species
rowLinks: a LinkDataFrame (19216 rows)
rowTree: 1 phylo tree(s) (19216 leaves)
colLinks: NULL
colTree: NULL

tse <- transformAssay(
    tse,
    assay.type = "counts",
    method = "relabundance",
    altexp = altExpNames(tse)
)

print(tse)
class: TreeSummarizedExperiment 
dim: 19216 26 
metadata(0):
assays(2): counts relabundance
rownames(19216): 549322 522457 ... 200359 271582
rowData names(7): Kingdom Phylum ... Genus Species
colnames(26): CL3 CC1 ... Even2 Even3
colData names(7): X.SampleID Primer ... SampleType Description
reducedDimNames(0):
mainExpName: NULL
altExpNames(7): Kingdom Phylum ... Genus Species
rowLinks: a LinkDataFrame (19216 rows)
rowTree: 1 phylo tree(s) (19216 leaves)
colLinks: NULL
colTree: NULL

library(miaViz)

plotAbundance(
    tse,
    rank = "Phylum",
    as.relative = TRUE
)

Take home messages

Microbiome data science workflow

Orchestrating Microbiome Analysis with Bioconductor, session 3/3

Bioconductor Africa Seminar Series, May 2026

Topics

Week 1. Ecosystems
Week 2. End-to-end workflow
Week 3. Statistical analysis

tidyomics (Hutchison et al. 2024)

library(tidySummarizedExperiment)
library(tidymodels)

# Convert to tibble table
tse_df <- as_tibble(tse)

# Specify a model
model <- logistic_reg() %>%
    set_engine("glm")

# Fit the model
fit <- model %>%
    fit(disease ~ counts, data = tse_df)

Microbiome data science workflow

Microbiome data science workflow

Alpha diversity

  • Shannon diversity, Faith’s phylogenetic diversity, …

Beta diversity

  • Principal Component Analysis (PCA), Principal Coordinate Analysis (PCoA), …

Differential abundance analysis

Demonstration

Example data: (Gupta et al., mSystems 2019)

tse <- addAlpha(tse)

tse <- addAlpha(tse)
plotBoxplot(tse, col.var = "faith_diversity", x = "Diet")

tse <- addMDS(tse, method = "unifrac")

tse <- addMDS(tse, method = "bray")

library(scater)
plotReducedDim(tse, dimred = "MDS", colour_by = "Diet")

library(maaslin3)

daa_res <- maaslin3(
    input_data = tse,
    formula = "~ Diet",
    normalization = "TSS",
    transform = "LOG",
    output = "maaslin3_output"
)

library(maaslin3)

daa_res <- maaslin3(
    input_data = tse,
    formula = "~ Diet",
    normalization = "TSS",
    transform = "LOG",
    output = "maaslin3_output"
)

file_path <- file.path("maaslin3_output", "figures", "summary_plot.png")
knitr::include_graphics(file_path)

Next steps

European Bioconductor Conference 2026 will have live streaming!

Thank you for your time!

mia logo

References

Amezquita, Robert A, Aaron T L Lun, Etienne Becht, Vince J Carey, Lindsay N Carpp, Ludwig Geistlinger, Federico Marini, et al. 2020. “Orchestrating Single-Cell Analysis with Bioconductor.” Nature Methods 17 (2): 137–45. https://doi.org/10.1038/s41592-019-0654-x.
Beghini, Francesco, Lauren J McIver, Aitor Blanco-Míguez, Leonard Dubois, Francesco Asnicar, Sagun Maharjan, Ana Mailyan, et al. 2021. “Integrating Taxonomic, Functional, and Strain-Level Profiling of Diverse Microbial Communities with bioBakery 3.” eLife 10 (May). https://doi.org/10.7554/elife.65088.
Blanco-Míguez, Aitor, Francesco Beghini, Fabio Cumbo, Lauren J. McIver, Kelsey N. Thompson, Moreno Zolfo, Paolo Manghi, et al. 2023. “Extending and Improving Metagenomic Taxonomic Profiling with Uncharacterized Species Using MetaPhlAn 4.” Nature Biotechnology 41 (11): 1633–44. https://doi.org/10.1038/s41587-023-01688-w.
Bolyen, Evan, Jai Ram Rideout, Matthew R. Dillon, Nicholas A. Bokulich, Christian C. Abnet, Gabriel A. Al-Ghalith, Harriet Alexander, et al. 2019. “Reproducible, Interactive, Scalable and Extensible Microbiome Data Science Using QIIME 2.” Nature Biotechnology 37 (8): 852–57. https://doi.org/10.1038/s41587-019-0209-9.
Callahan, Benjamin J, Paul J McMurdie, Michael J Rosen, Andrew W Han, Amy Jo A Johnson, and Susan P Holmes. 2016. “DADA2: High-Resolution Sample Inference from Illumina Amplicon Data.” Nature Methods 13 (7): 581–83. https://doi.org/10.1038/nmeth.3869.
Huang, Ruizhu, Charlotte Soneson, Felix G. M. Ernst, et al. 2021. “TreeSummarizedExperiment: A S4 Class for Data with Hierarchical Structure.” F1000Research 9: 1246. https://doi.org/10.12688/f1000research.26669.2.
Huber, W., V. J. Carey, R. Gentleman, S. Anders, M. Carlson, B. S. Carvalho, H. C. Bravo, et al. 2015. Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2): 115–21. http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html.
Hutchison, William J., Timothy J. Keyes, Helena L. Crowell, Jacques Serizay, Charlotte Soneson, Eric S. Davis, Noriaki Sato, et al. 2024. “The Tidyomics Ecosystem: Enhancing Omic Data Analyses.” Nature Methods 21 (7): 1166–70. https://doi.org/10.1038/s41592-024-02299-2.
Lu, Jennifer, Natalia Rincon, Derrick E. Wood, Florian P. Breitwieser, Christopher Pockrandt, Ben Langmead, Steven L. Salzberg, and Martin Steinegger. 2022. “Metagenome Analysis Using the Kraken Software Suite.” Nature Protocols 17 (12): 2815–39. https://doi.org/10.1038/s41596-022-00738-y.
McMurdie, PJ, and S Holmes. 2013. Phyloseq: An r Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8: e61217. https://doi.org/10.1371/journal.pone.0061217.
Schloss, Patrick D., Sarah L. Westcott, Thomas Ryabin, Justine R. Hall, Martin Hartmann, Emily B. Hollister, Ryan A. Lesniewski, et al. 2009. “Introducing Mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities.” Applied and Environmental Microbiology 75 (23): 7537–41. https://doi.org/10.1128/AEM.01541-09.