5 Taxonomic Information

library(mia)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns

Taxonomic information is a key part of analyzing microbiome data, and without it, any type of data analysis probably will not make much sense. However, the degree of detail of taxonomic information differs depending on the dataset and annotation data used.

Therefore, the mia package expects a loose assembly of taxonomic information and assumes certain key aspects:

Taxonomic information is given as character vectors or factors in the rowData of a SummarizedExperiment object.
The columns containing the taxonomic information must be named domain, kingdom, phylum, class, order, family, genus, species or with a capital first letter.
The columns must be given in the order shown above.
column can be omitted, but the order must remain

In this chapter, we will refer to the co-abundant groups as CAGs, which are clusters of taxa that co-vary across samples.

5.1 Assigning taxonomic information

There are a number of methods to assign taxonomic information. We like to give a short introduction about the methods available without ranking one over the other. This has to be your choice based on the result for the individual dataset.

5.1.1 DADA2

The dada2 package (Callahan et al. 2016) implements the assignTaxonomy function, which takes as input the ASV sequences associated with each row of data and a training dataset. For more information visit the dada2 homepage.

Callahan, Benjamin J, Paul J McMurdie, Michael J Rosen, Andrew W Han, Amy Jo A Johnson, and Susan P Holmes. 2016. “DADA2: High-Resolution Sample Inference from Illumina Amplicon Data.” Nature Methods 13: 581–83. https://doi.org/10.1038/nmeth.3869.

5.1.2 DECIPHER

The DECIPHER package (Wright 2020) implements the IDTAXA algorithm to assign either taxonomic information or function information. For mia only the first option is of interest for now and more information can be found on the DECIPHER website.

Wright, Erik. 2020. DECIPHER: Tools for Curating, Analyzing, and Manipulating Biological Sequences.

5.2 Functions to access taxonomic information

checkTaxonomy checks whether the taxonomic information is usable for mia

checkTaxonomy(tse)
##  [1] TRUE

Since the rowData can contain other data, taxonomyRanks will return the columns mia assumes to contain the taxonomic information.

taxonomyRanks(tse)
##  [1] "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"

This can then be used to subset the rowData to columns needed.

rowData(tse)[, taxonomyRanks(tse)]
##  DataFrame with 19216 rows and 7 columns
##             Kingdom        Phylum        Class        Order        Family
##         <character>   <character>  <character>  <character>   <character>
##  549322     Archaea Crenarchaeota Thermoprotei           NA            NA
##  522457     Archaea Crenarchaeota Thermoprotei           NA            NA
##  951        Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae
##  244423     Archaea Crenarchaeota        Sd-NA           NA            NA
##  586076     Archaea Crenarchaeota        Sd-NA           NA            NA
##  ...            ...           ...          ...          ...           ...
##  278222    Bacteria           SR1           NA           NA            NA
##  463590    Bacteria           SR1           NA           NA            NA
##  535321    Bacteria           SR1           NA           NA            NA
##  200359    Bacteria           SR1           NA           NA            NA
##  271582    Bacteria           SR1           NA           NA            NA
##               Genus                Species
##         <character>            <character>
##  549322          NA                     NA
##  522457          NA                     NA
##  951     Sulfolobus Sulfolobusacidocalda..
##  244423          NA                     NA
##  586076          NA                     NA
##  ...            ...                    ...
##  278222          NA                     NA
##  463590          NA                     NA
##  535321          NA                     NA
##  200359          NA                     NA
##  271582          NA                     NA

taxonomyRankEmpty checks for empty values in the given rank and returns a logical vector of length(x).

all(!taxonomyRankEmpty(tse, rank = "Kingdom"))
##  [1] TRUE
table(taxonomyRankEmpty(tse, rank = "Genus"))
##  
##  FALSE  TRUE 
##   8008 11208
table(taxonomyRankEmpty(tse, rank = "Species"))
##  
##  FALSE  TRUE 
##   1413 17803

getTaxonomyLabels is a multi-purpose function, which turns taxonomic information into a character vector of length(x)

head(getTaxonomyLabels(tse))
##  [1] "Class:Thermoprotei"               "Class:Thermoprotei_1"            
##  [3] "Species:Sulfolobusacidocaldarius" "Class:Sd-NA"                     
##  [5] "Class:Sd-NA_1"                    "Class:Sd-NA_2"

By default, this will use the lowest non-empty information to construct a string with the following scheme level:value. If all levels are the same, this part is omitted, but can be added by setting with.rank = TRUE.

phylum <- !is.na(rowData(tse)$Phylum) &
    vapply(data.frame(apply(rowData(tse)[, taxonomyRanks(tse)[3:7]], 1L, is.na)), all, logical(1))
head(getTaxonomyLabels(tse[phylum,]))
##  [1] "Crenarchaeota"    "Crenarchaeota_1"  "Crenarchaeota_2" 
##  [4] "Actinobacteria"   "Actinobacteria_1" "Spirochaetes"
head(getTaxonomyLabels(tse[phylum,], with.rank = TRUE))
##  [1] "Phylum:Crenarchaeota"    "Phylum:Crenarchaeota_1" 
##  [3] "Phylum:Crenarchaeota_2"  "Phylum:Actinobacteria"  
##  [5] "Phylum:Actinobacteria_1" "Phylum:Spirochaetes"

By default the return value of getTaxonomyLabels contains only unique elements by passing it through make.unique. This step can be omitted by setting make.unique = FALSE.

head(getTaxonomyLabels(tse[phylum,], with.rank = TRUE, make.unique = FALSE))
##  [1] "Phylum:Crenarchaeota"  "Phylum:Crenarchaeota"  "Phylum:Crenarchaeota" 
##  [4] "Phylum:Actinobacteria" "Phylum:Actinobacteria" "Phylum:Spirochaetes"

To apply the loop resolving function resolveLoop from the TreeSummarizedExperiment package (Huang 2020) within getTaxonomyLabels, set resolve.loops = TRUE.

The function getUnique gives a list of unique taxa for the specified taxonomic rank.

head(getUnique(tse, rank = "Phylum"))
##  [1] "Crenarchaeota"  "Euryarchaeota"  "Actinobacteria" "Spirochaetes"  
##  [5] "MVP-15"         "Proteobacteria"

5.2.1 Generate a hierarchy tree on the fly

A hierarchy tree is not an actual phylogenetic tree. A hierarchy tree shows mapping between the taxonomic levels in taxonomic rank table (included in rowData), rather than the detailed phylogenetic relations included in a phylogenetic tree

For instance, the TreeSummarizedExperiment dataset, Tengeler2020, contains an actual phylogenetic tree.

data(Tengeler2020)
tse2 <- Tengeler2020
phylo_tree <- rowTree(tse2)
phylo_tree
##  
##  Phylogenetic tree with 151 tips and 149 internal nodes.
##  
##  Tip labels:
##    [Eubacterium]_coprostanoligenes_group, [Clostridium]_innocuum_group, [Clostridium]_innocuum_group_1, Dielma, Unidentified_Lachnospiraceae_12, Epulopiscium, ...
##  Node labels:
##    node_1, node_2, node_3, node_4, node_5, node_6, ...
##  
##  Unrooted; includes branch lengths.

To create a hierarchy tree, getHierarchyTree used the information and returns a phylo object. Duplicate information from the rowData is removed.

getHierarchyTree(tse)
##  
##  Phylogenetic tree with 1645 tips and 1089 internal nodes.
##  
##  Tip labels:
##    Species:Cenarchaeumsymbiosum, Species:pIVWA5, Species:CandidatusNitrososphaeragargensis, Species:SCA1145, Species:SCA1170, Species:Sulfolobusacidocaldarius, ...
##  Node labels:
##    root:ALL, Kingdom:Archaea, Phylum:Crenarchaeota, Class:C2, Class:Sd-NA, Class:Thaumarchaeota, ...
##  
##  Rooted; includes branch lengths.

tse <- addHierarchyTree(tse)
tse
##  class: TreeSummarizedExperiment 
##  dim: 19216 26 
##  metadata(0):
##  assays(1): counts
##  rownames(19216): 549322 522457 ... 200359 271582
##  rowData names(7): Kingdom Phylum ... Genus Species
##  colnames(26): CL3 CC1 ... Even2 Even3
##  colData names(7): X.SampleID Primer ... SampleType Description
##  reducedDimNames(0):
##  mainExpName: NULL
##  altExpNames(0):
##  rowLinks: a LinkDataFrame (19216 rows)
##  rowTree: 1 phylo tree(s) (1645 leaves)
##  colLinks: NULL
##  colTree: NULL

The implementation is based on the toTree function from the TreeSummarizedExperiment package (Huang 2020).

Huang, Ruizhu. 2020. TreeSummarizedExperiment: A S4 Class for Data with Tree Structures.

5.3 Data agglomeration

One of the main applications of taxonomic information in regards to count data is to agglomerate count data on taxonomic levels and track the influence of changing conditions through these levels. For this mia contains the mergeFeaturesByRank function. The ideal location to store the agglomerated data is as an alternative experiment.

tse <- transformAssay(tse, assay.type = "counts", method = "relabundance")
altExp(tse, "Family") <- mergeFeaturesByRank(tse, rank = "Family",
                                           agglomerateTree = TRUE)
altExp(tse, "Family")
##  class: TreeSummarizedExperiment 
##  dim: 341 26 
##  metadata(1): agglomerated_by_rank
##  assays(2): counts relabundance
##  rownames(341): 125ds10 211ds20 ... vadinHA31 wb1_P06
##  rowData names(7): Kingdom Phylum ... Genus Species
##  colnames(26): CL3 CC1 ... Even2 Even3
##  colData names(7): X.SampleID Primer ... SampleType Description
##  reducedDimNames(0):
##  mainExpName: NULL
##  altExpNames(0):
##  rowLinks: a LinkDataFrame (341 rows)
##  rowTree: 1 phylo tree(s) (341 leaves)
##  colLinks: NULL
##  colTree: NULL

If multiple assays (counts and relabundance) exist, both will be agglomerated.

assayNames(tse)
##  [1] "counts"       "relabundance"
assayNames(altExp(tse, "Family"))
##  [1] "counts"       "relabundance"

assay(altExp(tse, "Family"), "relabundance")[1:5, 1:7]
##                CL3       CC1       SV1   M31Fcsw   M11Fcsw   M31Plmr
##  125ds10 5.324e-05 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
##  211ds20 1.389e-05 3.699e-05 4.731e-04 0.000e+00 0.000e+00 5.564e-06
##  5B-12   1.180e-04 2.272e-04 2.638e-04 0.000e+00 0.000e+00 0.000e+00
##  A714017 3.472e-06 3.523e-06 1.434e-06 0.000e+00 2.890e-06 1.391e-06
##  ACK-M1  5.902e-05 5.636e-05 2.151e-04 4.147e-05 4.671e-05 4.312e-05
##            M11Plmr
##  125ds10 0.000e+00
##  211ds20 8.297e-05
##  5B-12   0.000e+00
##  A714017 9.219e-06
##  ACK-M1  3.342e-04

assay(altExp(tse, "Family"), "counts")[1:5, 1:7]
##          CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr M11Plmr
##  125ds10  46   0   0       0       0       0       0
##  211ds20  12  42 330       0       0       4      36
##  5B-12   102 258 184       0       0       0       0
##  A714017   3   4   1       0       6       1       4
##  ACK-M1   51  64 150      64      97      31     145

altExpNames now consists of Family level data. This can be extended to use any taxonomic level listed in mia::taxonomyRanks(tse).

Rare taxa can also be aggregated into a single group “Other” instead of filtering them out. A suitable function for this is agglomerateByRank. The number of rare taxa is higher on the species level, which causes the need for data agglomeration by prevalence.

altExp(tse, "Species_byPrevalence") <- agglomerateByRank(tse, 
                                                               rank = "Species", 
                                                               other.label = "Other", 
                                                               prevalence = 5 / 100, 
                                                               detection = 1 / 100, 
                                                               as.relative = TRUE)
altExp(tse, "Species_byPrevalence")
##  class: TreeSummarizedExperiment 
##  dim: 944 26 
##  metadata(1): agglomerated_by_rank
##  assays(2): counts relabundance
##  rownames(944): Abiotrophiadefectiva Achromatiumoxaliferum ...
##    proteobacteriumsymbiontofOsedaxsp.MB4 symbiontofNoeetapupillata
##  rowData names(7): Kingdom Phylum ... Genus Species
##  colnames(26): CL3 CC1 ... Even2 Even3
##  colData names(7): X.SampleID Primer ... SampleType Description
##  reducedDimNames(0):
##  mainExpName: NULL
##  altExpNames(0):
##  rowLinks: a LinkDataFrame (944 rows)
##  rowTree: 1 phylo tree(s) (1645 leaves)
##  colLinks: NULL
##  colTree: NULL

assay(altExp(tse, "Species_byPrevalence"), "relabundance")[1:5, 1:3]
##                                  CL3       CC1       SV1
##  Abiotrophiadefectiva      0.000e+00 0.000e+00 0.000e+00
##  Achromatiumoxaliferum     0.000e+00 0.000e+00 0.000e+00
##  Acidaminococcusfermentans 0.000e+00 0.000e+00 0.000e+00
##  Acidiphiliumcryptum       0.000e+00 0.000e+00 1.434e-06
##  Acidomonasmethanolica     1.157e-06 3.523e-06 0.000e+00

# Saving the tse for later
tseGlobalPatterns <- tse

5.3.1 Taxa clustering

Another way to agglomerate the data is to cluster the taxa. To do so, we usually start by doing a compositionality aware transformation such as CLR, followed by the application of a standard clustering method.

Here is an example that does a CLR transformation followed by the hierarchical clustering algorithm.

First, we import the library bluster that simplifies the clustering.

library(bluster)

Then we do the CLR transform followed by the clustering. We will cluster with two different distances: the euclidean distance and the kendall distance.

# Get the data
data("peerj13075", package = "mia")
tse <- peerj13075

# The result of the CLR transform is stored in the assay clr
tse <- transformAssay(tse, method = "clr", pseudocount = 1)

tse <- transformAssay(tse, assay.type = "clr", method = "standardize", 
                      MARGIN = "rows")

# Cluster (with euclidean distance) on the features of the `standardize` assay
tse <- addCluster(tse,
               assay.type = "standardize",
               clust.col = "hclustEuclidean",
                 MARGIN = "rows",
               HclustParam(dist.fun = stats::dist, method = "ward.D2"))

# Declare the Kendall dissimilarity computation function
kendall_dissimilarity <- function(x) {
    as.dist(1 - cor(t(x), method = "kendall"))
}

# Cluster (with Kendall dissimilarity) on the features of the `standardize` assay
tse <- addCluster(tse,
               assay.type = "standardize",
               clust.col = "hclustKendall",
               MARGIN = "rows",            
               HclustParam(dist.fun = kendall_dissimilarity, method = "ward.D2"))

Let us store the resulting cluster indices in the rowData column specified with the clust.col parameter.

# Checking the clusters
clusters_euclidean <- rowData(tse)$hclustEuclidean
head(clusters_euclidean, 10)
##   OTU1  OTU2  OTU7  OTU9 OTU10 OTU12 OTU14 OTU15 OTU18 OTU19 
##      1     2     1     1     1     1     3     4     3     2 
##  Levels: 1 2 3 4 5

clusters_kendall <- rowData(tse)$hclustKendall
head(clusters_kendall, 10)
##   OTU1  OTU2  OTU7  OTU9 OTU10 OTU12 OTU14 OTU15 OTU18 OTU19 
##      1     2     1     3     3     1     3     1     1     3 
##  Levels: 1 2 3 4

To better visualize the results and the distribution of the clusters, we can plot the histogram of the clusters.

library(ggplot2)
library(patchwork) # TO arrange several plots as a grid
plot1 <- ggplot(rowData(tse), aes(x = clusters_euclidean)) +
    geom_bar() +
    labs(title = "CAG size distribution (Euclidean distance)",
         x = "Clusters", y = "Feature count (n)")
plot2 <- ggplot(rowData(tse), aes(x = clusters_kendall)) +
    geom_bar() +
    labs(title = "CAG size distribution (1 - tau)",
         x = "Clusters", y = "Feature count (n)")
plot1 + plot2 + plot_layout(ncol = 2)

It’s also possible to merge the rows by cluster.

# Aggregate clusters as a sum of each cluster values
tse_merged <- mergeFeatures(tse, clusters_euclidean)
tse_merged
##  class: TreeSummarizedExperiment 
##  dim: 5 58 
##  metadata(0):
##  assays(3): counts clr standardize
##  rownames(5): 1 2 3 4 5
##  rowData names(8): kingdom phylum ... hclustEuclidean hclustKendall
##  colnames(58): ID1 ID2 ... ID57 ID58
##  colData names(5): Sample Geographical_location Gender Age Diet
##  reducedDimNames(0):
##  mainExpName: NULL
##  altExpNames(0):
##  rowLinks: NULL
##  rowTree: NULL
##  colLinks: NULL
##  colTree: NULL

We can note that it worked as planned since there were 5 clusters and there are now 5 rows.

5.4 Data transformation

Data transformations are common in (microbial) ecology (Legendre and Gallagher 2001) and used to improve compatibility with assumptions related to specific statistical methods, mitigate biases, enhance the comparability of samples, or features, or to obtain more interpretable values.

Legendre, Pierre, and Eugene D. Gallagher. 2001. “Ecologically Meaningful Transformations for Ordination of Species Data.” Oecologia 129 (2): 271–80. https://doi.org/10.1007/s004420100716.

Examples include the logarithmic transformation, the calculation of relative abundances (percentages), and compositionality-aware transformations such as the centered log-ratio transformation (clr).

Let us summarize some commonly used transformations in microbiome data science; further details and benchmarkings available in the references.

‘relabundance’ Relative transformation, also known as total sum scaling (TSS) and compositional transformation. This converts counts into percentages (at the scale [0, 1]) that sum up to
1. Much of the currently available taxonomic abundance data from high-throughput assays (16S, metagenomic sequencing) is compositional by nature, even if the data is provided as counts (Gloor et al. 2017).
‘clr’ Centered log ratio transformation (Aitchison 1986) is used to reduce data skewness and compositionality bias in relative abundances, while bringing the data to the logarithmic scale. This transformation is frequently applied in microbial ecology (Gloor et al. 2017). However, this transformation only applies to positive values. Usual solution is to add pseudocount, which adds another type of bias in the data. The robust clr transformation (‘rclr’) aims to circumvent the need to add a pseudocount. While the resulting values from these transformations are difficult interpret directly, this transformation may enhance comparability of relative differences between samples. It is part of a broader Aitchison family of transformations; the additive log ratio transformation (`alr’) is also available. The robust clr (“rclr”) is similar to regular clr (see above) but allows data with zeroes and avoids the need to add pseudocount Martino et al. (2019).
‘pa’ Presence/Absence transformation ignores abundances and only indicates whether the given feature is detected above the given threshold (default: 0). This simple transformation is relatively widely used in ecological research. It has shown good performance in microbiome-based classification performance (Giliberti et al. 2022, Karwowska2024).
‘standardize’ Standardize(or ‘z-score’) transformation scales data to zero mean and unit variance. This is used to bring features (or samples) to more comparable levels in terms of mean and scale of the values. This can enhance visualization and interpretation of the data
‘log’, ‘log2’, ‘log10’ Logarithmic transformations, used e.g. to reduce data skewness. With compositional data, the clr (or rclr) transformation is often preferred.
‘hellinger’ Hellinger transformation is equal to the square root of relative abundances. This ecological transformation can be useful if we are interested in changes in relative abundances.
‘rank’ Rank transformation replaces each value by its rank. Also see ‘rrank’ (relative rank transformation). This has use, for instance, in non-parametric statistics.
Other available transformations include Chi square (‘chi.square’), Frequency transformation (‘frequency’), and Make margin sum of squares equal to one (‘normalize’)

Gloor, GB, JM Macklaim, V Pawlowsky-Glahn, and JJ Egozcue. 2017. “Microbiome Datasets Are Compositional: And This Is Not Optional.” Frontiers in Microbiology 8. https://doi.org/10.3389/fmicb.2017.02224.

Aitchison, J. 1986. The Statistical Analysis of Compositional Data. London, UK: Chapman & Hall.

Martino, C, J. T. Morton, C. A. Marotz, L. R. Thompson, A Tripathi, R Knight, and K Zengler. 2019. “A Novel Sparse Compositional Technique Reveals Microbial Perturbations.” mSystems 4.

5.4.1 Transforming the data in practice

Transformations on abundance assays can be performed with mia::transformAssay(), keeping both the original and the transformed assay(s). The transformed abundance assay is then stored back to the ‘assays’ slot in the data object. The function applies sample-wise or column-wise transformation when MARGIN = ‘cols’, feature-wise or row-wise transformation when MARGIN = ‘rows’.

A complete list of available transformations and parameters, is available in the function help.

# Load example data
library(mia)
data("GlobalPatterns", package = "mia")
tse <- GlobalPatterns

# Transform "counts" assay to relative abundances ("relabundance"), with pseudocount 1 
tse <- transformAssay(tse, assay.type = "counts", method = "relabundance", pseudocount = 1)

# Transform relative abundance assay ("relabundance") to "clr", using pseudocount if necessary;
# name the resulting assay to "clr" 
tse <- transformAssay(x = tse, assay.type = "relabundance", method = "clr", 
                      pseudocount = TRUE, name = "clr")

Get the values in the resulting assay, and view some of the first entries of it with the head command.

head(assay(tse, "clr"))
##             CL3    CC1     SV1 M31Fcsw M11Fcsw M31Plmr M11Plmr F21Plmr
##  549322 -0.9056 -1.054 -0.7109  -0.264 -0.2319 -0.3381 -0.4562 -0.2601
##  522457 -0.9056 -1.054 -0.7109  -0.264 -0.2319 -0.3381 -0.4562 -0.2601
##  951    -0.9056 -1.054 -0.7109  -0.264 -0.2319 -0.3381  0.1924 -0.2601
##  244423 -0.9056 -1.054 -0.7109  -0.264 -0.2319 -0.3381 -0.4562 -0.2601
##  586076 -0.9056 -1.054 -0.7109  -0.264 -0.2319 -0.3381 -0.4562 -0.2601
##  246140 -0.9056 -1.054 -0.7109  -0.264 -0.2319 -0.3381 -0.4562 -0.2601
##         M31Tong M11Tong LMEpi24M SLEpi20M AQC1cm  AQC4cm  AQC7cm     NP2
##  549322 -0.2193 -0.1554  -0.3118   0.2891  2.465  3.4909  3.8475  0.4076
##  522457 -0.2193 -0.1554  -0.3118  -0.2950 -0.653  0.1236  0.9658 -0.2330
##  951    -0.2193 -0.1554  -0.3118  -0.2950 -0.653 -0.7237 -0.7218 -0.2330
##  244423 -0.2193 -0.1554  -0.3118  -0.2950 -0.653  2.0279  2.3827 -0.2330
##  586076 -0.2193 -0.1554  -0.3118  -0.2950 -0.653  0.1236 -0.1711 -0.2330
##  246140 -0.2193 -0.1554  -0.3118  -0.2950 -0.653 -0.2128  0.4424 -0.2330
##             NP3     NP5 TRRsed1 TRRsed2 TRRsed3    TS28    TS29   Even1
##  549322 -0.3929 -0.3238 -0.2659 -0.4643 -0.4281 -0.2514 -0.2354 -0.3146
##  522457 -0.3929 -0.3238 -0.2659 -0.4643 -0.4281 -0.2514 -0.2354 -0.3146
##  951    -0.3929 -0.3238 -0.2659 -0.4643 -0.4281 -0.2514 -0.2354 -0.3146
##  244423 -0.3929 -0.3238 -0.2659 -0.4643 -0.4281 -0.2514 -0.2354 -0.3146
##  586076 -0.3929 -0.3238 -0.2659 -0.4643 -0.4281 -0.2514 -0.2354 -0.3146
##  246140 -0.3929 -0.3238 -0.2659 -0.4643 -0.4281 -0.2514 -0.2354 -0.3146
##           Even2   Even3
##  549322 -0.2334 -0.2185
##  522457 -0.2334 -0.2185
##  951    -0.2334 -0.2185
##  244423 -0.2334 -0.2185
##  586076 -0.2334 -0.2185
##  246140 -0.2334 -0.2185

In ‘pa’ transformation, abundance table is converted to presence/absence table that ignores abundances and only indicates whether the given feature is detected. This simple transformation is relatively widely used in ecological research. It has shown good performance in microbiome-based classification performance (Giliberti et al. 2022, Karwowska2024).

Giliberti, R, S Cavaliere, IE Mauriello, D Ercolini, and E Pasolli. 2022. “Host Phenotype Classification from Human Microbiome Data Is Mainly Driven by the Presence of Microbial Taxa.” PLoS Comput Biol. 18 (4): e1010066. https://doi.org/10.1371/journal.pcbi.1010066.

# Here, `assay.type` is not explicitly specified.
# Then The function uses the "counts" assay for the transformation.
tse <- transformAssay(tse, method = "pa")
head(assay(tse, "pa"))
##         CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr M11Plmr F21Plmr M31Tong M11Tong
##  549322   0   0   0       0       0       0       0       0       0       0
##  522457   0   0   0       0       0       0       0       0       0       0
##  951      0   0   0       0       0       0       1       0       0       0
##  244423   0   0   0       0       0       0       0       0       0       0
##  586076   0   0   0       0       0       0       0       0       0       0
##  246140   0   0   0       0       0       0       0       0       0       0
##         LMEpi24M SLEpi20M AQC1cm AQC4cm AQC7cm NP2 NP3 NP5 TRRsed1 TRRsed2
##  549322        0        1      1      1      1   1   0   0       0       0
##  522457        0        0      0      1      1   0   0   0       0       0
##  951           0        0      0      0      0   0   0   0       0       0
##  244423        0        0      0      1      1   0   0   0       0       0
##  586076        0        0      0      1      1   0   0   0       0       0
##  246140        0        0      0      1      1   0   0   0       0       0
##         TRRsed3 TS28 TS29 Even1 Even2 Even3
##  549322       0    0    0     0     0     0
##  522457       0    0    0     0     0     0
##  951          0    0    0     0     0     0
##  244423       0    0    0     0     0     0
##  586076       0    0    0     0     0     0
##  246140       0    0    0     0     0     0

You can now view the entire list of abundance assays in your data object with:

assays(tse)
##  List of length 4
##  names(4): counts relabundance clr pa