These functions are designed to calculate dissimilarities on data stored
within a
TreeSummarizedExperiment
object. For overlap, Unifrac, and Jensen-Shannon Divergence (JSD)
dissimilarities, the functions use mia internal functions, while for other
types of dissimilarities, they rely on vegdist
by default.
addDissimilarity(x, method, ...)
getDissimilarity(x, method, ...)
# S4 method for class 'SummarizedExperiment'
addDissimilarity(x, method = "bray", name = method, ...)
# S4 method for class 'SummarizedExperiment'
getDissimilarity(
x,
method = "bray",
assay.type = "counts",
niter = NULL,
transposed = FALSE,
...
)
# S4 method for class 'TreeSummarizedExperiment'
getDissimilarity(
x,
method = "bray",
assay.type = "counts",
niter = NULL,
transposed = FALSE,
...
)
# S4 method for class 'ANY'
getDissimilarity(x, method = "bray", niter = NULL, ...)
TreeSummarizedExperiment
or matrix
.
Character scalar
. Specifies which dissimilarity to
calculate. (Default: "bray"
)
other arguments passed into avgdist
,
vegdist
, or into mia internal functions:
sample
: The sampling depth in rarefaction.
(Default: min(rowSums2(x))
)
dis.fun
: Character scalar
. Specifies the dissimilarity
function to be used.
transf
: Function
. Specifies the optional
transformation applied before calculating the dissimilarity matrix.
tree.name
: (Unifrac) Character scalar
. Specifies the
name of the tree from rowTree(x)
that is used in calculation.
Disabled when tree
is specified. (Default: "phylo"
)
tree
: (Unifrac) phylo
. A phylogenetic tree used in
calculation. (Default: NULL
)
weighted
: (Unifrac) Logical scalar
. Should use
weighted-Unifrac calculation?
Weighted-Unifrac takes into account the relative abundance of
species/taxa shared between samples, whereas unweighted-Unifrac only
considers presence/absence. Default is FALSE
, meaning the
unweighted-Unifrac dissimilarity is calculated for all pairs of samples.
(Default: FALSE
)
node.label
(Unifrac) character vector
. Used only if
x
is a matrix. Specifies links between rows/columns and tips of
tree
. The length must equal the number of rows/columns of x
.
Furthermore, all the node labs must be present in tree
.
chunkSize
: (JSD) Integer scalar
. Defines the size of
data send to the individual worker. Only has an effect, if BPPARAM
defines more than one worker. (Default: nrow(x)
)
BPPARAM
: (JSD)
BiocParallelParam
.
Specifies whether the calculation should be parallelized.
detection
: (Overlap) Numeric scalar
.
Defines detection threshold for absence/presence of features. Feature that
has abundance under threshold in either of samples, will be discarded when
evaluating overlap between samples. (Default: 0
)
Character scalar
. The name to be used to store the result
in metadata of the output. (Default: method
)
Character scalar
. Specifies the name of assay
used in calculation. (Default: "counts"
)
Integer scalar
. Specifies the number of
rarefaction rounds. Rarefaction is not applied when niter=NULL
(see Details section). (Default: NULL
)
Logical scalar
. Specifies if x is transposed with
cells in rows. (Default: FALSE
)
getDissimilarity
returns a sample-by-sample dissimilarity matrix.
addDissimilarity
returns x
that includes dissimilarity matrix
in its metadata.
Overlap reflects similarity between sample-pairs. When overlap is calculated using relative abundances, the higher the value the higher the similarity is. When using relative abundances, overlap value 1 means that all the abundances of features are equal between two samples, and 0 means that samples have completely different relative abundances.
Unifrac is calculated with rbiom:unifrac()
.
If rarefaction is enabled, vegan:avgdist()
is
utilized.
Rarefaction can be used to control uneven sequencing depths. Although, it is highly debated method. Some think that it is the only option that successfully controls the variation caused by uneven sampling depths. The biggest argument against rarefaction is the fact that it omits data.
Rarefaction works by sampling the counts randomly. This random sampling
is done niter
times. In each sampling iteration, sample
number
of random samples are drawn, and dissimilarity is calculated for this
subset. After the iterative process, there are niter
number of
result that are then averaged to get the final result.
Refer to Schloss (2024) for more details on rarefaction.
For unifrac dissimilarity: http://bmf.colorado.edu/unifrac/
See also additional descriptions of Unifrac in the following articles:
Lozupone, Hamady and Knight, “Unifrac - An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context.”, BMC Bioinformatics 2006, 7:371
Lozupone, Hamady, Kelley and Knight, “Quantitative and qualitative (beta) diversity measures lead to different insights into factors that structure microbial communities.” Appl Environ Microbiol. 2007
Lozupone C, Knight R. “Unifrac: a new phylogenetic method for comparing microbial communities.” Appl Environ Microbiol. 2005 71 (12):8228-35.
For JSD dissimilarity: Jensen-Shannon Divergence and Hilbert space embedding. Bent Fuglede and Flemming Topsoe University of Copenhagen, Department of Mathematics http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf
For rarefaction: Schloss PD (2024) Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere 28;9(2):e0035423. doi: 10.1128/msphere.00354-23
library(mia)
library(scater)
# load dataset
data(GlobalPatterns)
tse <- GlobalPatterns
### Overlap dissimilarity
tse <- addDissimilarity(tse, method = "overlap", detection = 0.25)
metadata(tse)[["overlap"]][1:6, 1:6]
#> CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr
#> CL3 0.0 985499.5 692281 951745 1217896 466124
#> CC1 985499.5 0.0 848128 975408 1253552 626083
#> SV1 692281.0 848128.0 0 808197 1083470 493475
#> M31Fcsw 951745.0 975408.0 808197 0 1785250 1014236
#> M11Fcsw 1217896.5 1253551.5 1083470 1785250 0 1281626
#> M31Plmr 466124.0 626083.0 493475 1014236 1281626 0
### JSD dissimilarity
tse <- addDissimilarity(tse, method = "jsd")
metadata(tse)[["jsd"]][1:6, 1:6]
#> CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr
#> CL3 0.0000000 0.2547080 0.5051481 0.6809263 0.6813418 0.6713074
#> CC1 0.2547080 0.0000000 0.4378036 0.6863213 0.6869444 0.6620168
#> SV1 0.5051481 0.4378036 0.0000000 0.6848186 0.6850675 0.6649845
#> M31Fcsw 0.6809263 0.6863213 0.6848186 0.0000000 0.2864774 0.6720375
#> M11Fcsw 0.6813418 0.6869444 0.6850675 0.2864774 0.0000000 0.6831013
#> M31Plmr 0.6713074 0.6620168 0.6649845 0.6720375 0.6831013 0.0000000
# Multi Dimensional Scaling applied to JSD dissimilarity matrix
tse <- addMDS(tse, method = "overlap", assay.type = "counts")
reducedDim(tse, "MDS") |> head()
#> [,1] [,2]
#> CL3 -43284.813 -45143.87
#> CC1 -4790.904 -40203.23
#> SV1 26147.150 -15952.00
#> M31Fcsw -133864.366 -14824.29
#> M11Fcsw 610755.430 803373.51
#> M31Plmr -78217.650 3434.70
### Unifrac dissimilarity
res <- getDissimilarity(tse, method = "unifrac", weighted = FALSE)
dim(as.matrix(res))
#> [1] 26 26
tse <- addDissimilarity(tse, method = "unifrac", weighted = TRUE)
metadata(tse)[["unifrac"]][1:6, 1:6]
#> CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr
#> CL3 0.0000000 0.1720386 0.2817016 0.5772531 0.5625487 0.4504863
#> CC1 0.1720386 0.0000000 0.2473207 0.5782640 0.5632083 0.4493349
#> SV1 0.2817016 0.2473207 0.0000000 0.5701039 0.5517097 0.4609602
#> M31Fcsw 0.5772531 0.5782640 0.5701039 0.0000000 0.1924949 0.5244770
#> M11Fcsw 0.5625487 0.5632083 0.5517097 0.1924949 0.0000000 0.5484897
#> M31Plmr 0.4504863 0.4493349 0.4609602 0.5244770 0.5484897 0.0000000
### Bray dissimilarity
# Bray is usually applied to relative abundances so we have to apply
# transformation first
tse <- transformAssay(tse, method = "relabundance")
res <- getDissimilarity(tse, method = "bray", assay.type = "relabundance")
as.matrix(res)[1:6, 1:6]
#> CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr
#> CL3 0.0000000 0.5902949 0.8438458 0.9947506 0.9948792 0.9844500
#> CC1 0.5902949 0.0000000 0.7835588 0.9969718 0.9972773 0.9765050
#> SV1 0.8438458 0.7835588 0.0000000 0.9961894 0.9963561 0.9815678
#> M31Fcsw 0.9947506 0.9969718 0.9961894 0.0000000 0.5468973 0.9874252
#> M11Fcsw 0.9948792 0.9972773 0.9963561 0.5468973 0.0000000 0.9955337
#> M31Plmr 0.9844500 0.9765050 0.9815678 0.9874252 0.9955337 0.0000000
# If applying rarefaction, the input must be count matrix and transformation
# method specified in function call (Note: increase niter)
rclr <- function(x){
vegan::decostand(x, method="rclr")
}
res <- getDissimilarity(
tse, method = "euclidean", transf = rclr, niter = 2L)
as.matrix(res)[1:6, 1:6]
#> CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr
#> CL3 0.00000 92.41533 111.91988 100.26503 96.37790 104.02765
#> CC1 92.41533 0.00000 107.63145 101.04726 96.98399 105.23199
#> SV1 111.91988 107.63145 0.00000 97.99801 94.11056 102.95526
#> M31Fcsw 100.26503 101.04726 97.99801 0.00000 53.73290 80.20711
#> M11Fcsw 96.37790 96.98399 94.11056 53.73290 0.00000 75.98772
#> M31Plmr 104.02765 105.23199 102.95526 80.20711 75.98772 0.00000