Agglomeration

Tuesday, February 3, 2026

Agglomeration

Combining features into higher-level taxa

Example data (Gupta et al., mSystems 2019)

  • Download zip file from Slack (day1)

  • Import the data (script also online

library(mia)
library(ape)

dir <- "GuptaA_2019"

# Read taxonomy data into TreeSE
rd <- read.csv(file.path(dir, "taxonomy_table.csv"), row.names = 1L) |> DataFrame()
cd <- read.csv(file.path(dir, "sample_metadata.csv"), row.names = 1L) |> DataFrame()
assay <- read.csv(file.path(dir, "taxonomy_abundance.csv"), row.names = 1L) |> as.matrix()
tree <- read.tree(file.path(dir, "phylogeny.tree"))
tse <- TreeSummarizedExperiment(
    assays = SimpleList(counts = assay),
    rowData = rd,
    colData = cd,
    rowTree = tree
)

saveRDS(tse, "Gupta2019.rds")
tse <- readRDS("Gupta2019.rds")

Why?

  • Focus on biologically meaningful units
  • Reduce sparsity and noise
  • Improve interpretability of downstream analyses

Demonstration

library(mia)

tse <- agglomerateByRank(tse, rank = "Phylum")
print(tse)
class: TreeSummarizedExperiment 
dim: 66 26 
metadata(1): agglomerated_by_rank
assays(1): counts
rownames(66): ABY1_OD1 AC1 ... ZB2 ZB3
rowData names(7): Kingdom Phylum ... Genus Species
colnames(26): CL3 CC1 ... Even2 Even3
colData names(7): X.SampleID Primer ... SampleType Description
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
rowLinks: a LinkDataFrame (66 rows)
rowTree: 1 phylo tree(s) (66 leaves)
colLinks: NULL
colTree: NULL

tse <- agglomerateByRanks(tse)
print(tse)
class: TreeSummarizedExperiment 
dim: 19216 26 
metadata(0):
assays(1): counts
rownames(19216): 549322 522457 ... 200359 271582
rowData names(7): Kingdom Phylum ... Genus Species
colnames(26): CL3 CC1 ... Even2 Even3
colData names(7): X.SampleID Primer ... SampleType Description
reducedDimNames(0):
mainExpName: NULL
altExpNames(7): Kingdom Phylum ... Genus Species
rowLinks: a LinkDataFrame (19216 rows)
rowTree: 1 phylo tree(s) (19216 leaves)
colLinks: NULL
colTree: NULL

Exercises

From OMA online book, Chapter 10: Agglomeration

  • 1.2, 1.3, 1.4, 1.5, 1.6

References