Calculate dissimilarities — addDissimilarity • mia

These functions are designed to calculate dissimilarities on data stored within a TreeSummarizedExperiment object. For overlap, Unifrac, and Jensen-Shannon Divergence (JSD) dissimilarities, the functions use mia internal functions, while for other types of dissimilarities, they rely on vegdist by default.

addDissimilarity(x, method, ...)

getDissimilarity(x, method, ...)

# S4 method for class 'SummarizedExperiment'
addDissimilarity(x, method = "bray", name = method, ...)

# S4 method for class 'SummarizedExperiment'
getDissimilarity(
  x,
  method = "bray",
  assay.type = "counts",
  niter = NULL,
  transposed = FALSE,
  ...
)

# S4 method for class 'TreeSummarizedExperiment'
getDissimilarity(
  x,
  method = "bray",
  assay.type = "counts",
  niter = NULL,
  transposed = FALSE,
  ...
)

# S4 method for class 'ANY'
getDissimilarity(x, method = "bray", niter = NULL, ...)

Arguments

x

TreeSummarizedExperiment or matrix.

method

Character scalar. Specifies which dissimilarity to calculate. (Default: "bray")

...

other arguments passed into avgdist, vegdist, or into mia internal functions:

sample: The sampling depth in rarefaction. (Default: min(rowSums2(x)))
dis.fun: Character scalar. Specifies the dissimilarity function to be used.
transf: Function. Specifies the optional transformation applied before calculating the dissimilarity matrix.
tree.name: (Unifrac) Character scalar. Specifies the name of the tree from rowTree(x) that is used in calculation. Disabled when tree is specified. (Default: "phylo")
tree: (Unifrac) phylo. A phylogenetic tree used in calculation. (Default: NULL)
weighted: (Unifrac) Logical scalar. Should use weighted-Unifrac calculation? Weighted-Unifrac takes into account the relative abundance of species/taxa shared between samples, whereas unweighted-Unifrac only considers presence/absence. Default is FALSE, meaning the unweighted-Unifrac dissimilarity is calculated for all pairs of samples. (Default: FALSE)
node.label (Unifrac) character vector. Used only if x is a matrix. Specifies links between rows/columns and tips of tree. All the node labs must be present in tree. For the links, you can provide a vector with whose length equals to the number of rows/columns in x. Alternatively, you can provide a named vector where names represent names in abundance table and values their corresponding node in tree.
chunkSize: (JSD) Integer scalar. Defines the size of data send to the individual worker. Only has an effect, if BPPARAM defines more than one worker. (Default: nrow(x))
BPPARAM: (JSD) BiocParallelParam. Specifies whether the calculation should be parallelized.
detection: (Overlap) Numeric scalar. Defines detection threshold for absence/presence of features. Feature that has abundance under threshold in either of samples, will be discarded when evaluating overlap between samples. (Default: 0)
binary: Logical scalar. Whether to perform presence/absence transformation before dissimilarity calculation. For Jaccard index the default is TRUE. For other dissimilarity metrics, please see vegdist.

name

Character scalar. The name to be used to store the result in metadata of the output. (Default: method)

assay.type

Character scalar. Specifies the name of assay used in calculation. (Default: "counts")

niter

Integer scalar. Specifies the number of rarefaction rounds. Rarefaction is not applied when niter=NULL (see Details section). (Default: NULL)

transposed

Logical scalar. Specifies if x is transposed with cells in rows. (Default: FALSE)

Value

getDissimilarity returns a sample-by-sample dissimilarity matrix.

addDissimilarity returns x that includes dissimilarity matrix in its metadata.

Details

Overlap reflects similarity between sample-pairs. When overlap is calculated using relative abundances, the higher the value the higher the similarity is. When using relative abundances, overlap value 1 means that all the abundances of features are equal between two samples, and 0 means that samples have completely different relative abundances.

Unifrac is calculated with rbiom:unifrac().

If rarefaction is enabled, vegan:avgdist() is utilized.

Rarefaction can be used to control uneven sequencing depths. Although, it is highly debated method. Some think that it is the only option that successfully controls the variation caused by uneven sampling depths. The biggest argument against rarefaction is the fact that it omits data.

Rarefaction works by sampling the counts randomly. This random sampling is done niter times. In each sampling iteration, sample number of random samples are drawn, and dissimilarity is calculated for this subset. After the iterative process, there are niter number of result that are then averaged to get the final result.

Refer to Schloss (2024) for more details on rarefaction.

References

For unifrac dissimilarity: http://bmf.colorado.edu/unifrac/

See also additional descriptions of Unifrac in the following articles:

Lozupone, Hamady and Knight, “Unifrac - An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context.”, BMC Bioinformatics 2006, 7:371

Lozupone, Hamady, Kelley and Knight, “Quantitative and qualitative (beta) diversity measures lead to different insights into factors that structure microbial communities.” Appl Environ Microbiol. 2007

Lozupone C, Knight R. “Unifrac: a new phylogenetic method for comparing microbial communities.” Appl Environ Microbiol. 2005 71 (12):8228-35.

For JSD dissimilarity: Jensen-Shannon Divergence and Hilbert space embedding. Bent Fuglede and Flemming Topsoe University of Copenhagen, Department of Mathematics http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf

For rarefaction: Schloss PD (2024) Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere 28;9(2):e0035423. doi: 10.1128/msphere.00354-23

Examples

library(mia)
library(scater)

# load dataset
data(GlobalPatterns)
tse <- GlobalPatterns

### Overlap dissimilarity

tse <- addDissimilarity(tse, method = "overlap", detection = 0.25)
metadata(tse)[["overlap"]][1:6, 1:6]
#>               CL3       CC1     SV1 M31Fcsw M11Fcsw M31Plmr
#> CL3           0.0  985499.5  692281  951745 1217896  466124
#> CC1      985499.5       0.0  848128  975408 1253552  626083
#> SV1      692281.0  848128.0       0  808197 1083470  493475
#> M31Fcsw  951745.0  975408.0  808197       0 1785250 1014236
#> M11Fcsw 1217896.5 1253551.5 1083470 1785250       0 1281626
#> M31Plmr  466124.0  626083.0  493475 1014236 1281626       0

### JSD dissimilarity

tse <- addDissimilarity(tse, method = "jsd")
metadata(tse)[["jsd"]][1:6, 1:6]
#>               CL3       CC1       SV1   M31Fcsw   M11Fcsw   M31Plmr
#> CL3     0.0000000 0.2547080 0.5051481 0.6809263 0.6813418 0.6713074
#> CC1     0.2547080 0.0000000 0.4378036 0.6863213 0.6869444 0.6620168
#> SV1     0.5051481 0.4378036 0.0000000 0.6848186 0.6850675 0.6649845
#> M31Fcsw 0.6809263 0.6863213 0.6848186 0.0000000 0.2864774 0.6720375
#> M11Fcsw 0.6813418 0.6869444 0.6850675 0.2864774 0.0000000 0.6831013
#> M31Plmr 0.6713074 0.6620168 0.6649845 0.6720375 0.6831013 0.0000000

# Multi Dimensional Scaling applied to JSD dissimilarity matrix
tse <- addMDS(tse, method = "overlap", assay.type = "counts")
reducedDim(tse, "MDS") |> head()
#>                [,1]      [,2]
#> CL3      -43284.813 -45143.87
#> CC1       -4790.904 -40203.23
#> SV1       26147.150 -15952.00
#> M31Fcsw -133864.366 -14824.29
#> M11Fcsw  610755.430 803373.51
#> M31Plmr  -78217.650   3434.70

### Unifrac dissimilarity

res <- getDissimilarity(tse, method = "unifrac", weighted = FALSE)
dim(as.matrix(res))
#> [1] 26 26

tse <- addDissimilarity(tse, method = "unifrac", weighted = TRUE)
metadata(tse)[["unifrac"]][1:6, 1:6]
#>               CL3       CC1       SV1   M31Fcsw   M11Fcsw   M31Plmr
#> CL3     0.0000000 0.2196313 0.3601371 0.8567702 0.8661073 0.5892417
#> CC1     0.2196313 0.0000000 0.3145117 0.8543620 0.8633159 0.5846985
#> SV1     0.3601371 0.3145117 0.0000000 0.8433292 0.8466807 0.6006535
#> M31Fcsw 0.8567702 0.8543620 0.8433292 0.0000000 0.3350245 0.7913482
#> M11Fcsw 0.8661073 0.8633159 0.8466807 0.3350245 0.0000000 0.8579621
#> M31Plmr 0.5892417 0.5846985 0.6006535 0.7913482 0.8579621 0.0000000

### Bray dissimilarity

# Bray is usually applied to relative abundances so we have to apply
# transformation first
tse <- transformAssay(tse, method = "relabundance")
res <- getDissimilarity(tse, method = "bray", assay.type = "relabundance")
as.matrix(res)[1:6, 1:6]
#>               CL3       CC1       SV1   M31Fcsw   M11Fcsw   M31Plmr
#> CL3     0.0000000 0.5902949 0.8438458 0.9947506 0.9948792 0.9844500
#> CC1     0.5902949 0.0000000 0.7835588 0.9969718 0.9972773 0.9765050
#> SV1     0.8438458 0.7835588 0.0000000 0.9961894 0.9963561 0.9815678
#> M31Fcsw 0.9947506 0.9969718 0.9961894 0.0000000 0.5468973 0.9874252
#> M11Fcsw 0.9948792 0.9972773 0.9963561 0.5468973 0.0000000 0.9955337
#> M31Plmr 0.9844500 0.9765050 0.9815678 0.9874252 0.9955337 0.0000000

# If applying rarefaction, the input must be count matrix and transformation
# method specified in function call (Note: increase niter)
rclr <- function(x){
    vegan::decostand(x, method="rclr")
}
res <- getDissimilarity(
    tse, method = "euclidean", transf = rclr, niter = 2L)
#> Warning: The following sampling units were removed because they were below sampling depth: CL3, CC1, SV1, M31Fcsw, M11Fcsw, M31Plmr, M11Plmr, F21Plmr, M31Tong, M11Tong, LMEpi24M, SLEpi20M, AQC1cm, AQC4cm, AQC7cm, NP2, NP3, NP5, TRRsed1, TRRsed2, TRRsed3, TS28, TS29, Even1, Even2, Even3
as.matrix(res)[1:6, 1:6]
#>           1         2         3         4         5         6
#> 1   0.00000  18.91211  56.61105 122.83058 112.14053 134.94832
#> 2  18.91211   0.00000  45.98824 127.87625 116.45978 129.18737
#> 3  56.61105  45.98824   0.00000 108.33852  96.92338  92.90449
#> 4 122.83058 127.87625 108.33852   0.00000  12.47208  77.54683
#> 5 112.14053 116.45978  96.92338  12.47208   0.00000  71.20355
#> 6 134.94832 129.18737  92.90449  77.54683  71.20355   0.00000