R/calculateUnifrac.R
calculateUnifrac.Rd
This function calculates the (Fast) Unifrac distance for all sample-pairs
in a TreeSummarizedExperiment
object.
calculateUnifrac(x, tree, ...)
# S4 method for ANY,phylo
calculateUnifrac(
x,
tree,
weighted = FALSE,
normalized = TRUE,
BPPARAM = SerialParam(),
...
)
# S4 method for TreeSummarizedExperiment,missing
calculateUnifrac(
x,
assay.type = assay_name,
assay_name = exprs_values,
exprs_values = "counts",
tree_name = "phylo",
transposed = FALSE,
...
)
runUnifrac(
x,
tree,
weighted = FALSE,
normalized = TRUE,
nodeLab = NULL,
BPPARAM = SerialParam(),
...
)
a numeric matrix or a
TreeSummarizedExperiment
object containing a tree.
Please note that runUnifrac
expects a matrix with samples per row
and not per column. This is implemented to be compatible with other
distance calculations such as dist
as much as
possible.
if x
is a matrix, a
phylo
object matching the
matrix. This means that the phylo object and the columns should relate
to the same type of features (aka. microorganisms).
optional arguments not used.
TRUE
or FALSE
: Should use weighted-Unifrac
calculation? Weighted-Unifrac takes into account the relative abundance of
species/taxa shared between samples, whereas unweighted-Unifrac only
considers presence/absence. Default is FALSE
, meaning the
unweighted-Unifrac distance is calculated for all pairs of samples.
TRUE
or FALSE
: Should the output be
normalized such that values range from 0 to 1 independent of branch length
values? Default is TRUE
. Note that (unweighted) Unifrac
is
always normalized by total branch-length, and so this value is ignored when
weighted == FALSE
.
A
BiocParallelParam
object specifying whether the Unifrac calculation should be parallelized.
a single character
value for specifying which
assay to use for calculation.
a single character
value for specifying which
assay to use for calculation.
(Please use assay.type
instead. At some point assay_name
will be disabled.)
a single character
value for specifying which
assay to use for calculation.
(Please use assay.type
instead.)
a single character
value for specifying which
tree will be used in calculation.
(By default: tree_name = "phylo"
)
Logical scalar, is x transposed with cells in rows, i.e.,
is Unifrac distance calculated based on rows (FALSE) or columns (TRUE).
(By default: transposed = FALSE
)
if x
is a matrix,
a character
vector specifying links between rows/columns and tips of tree
.
The length must equal the number of rows/columns of x
. Furthermore, all the
node labs must be present in tree
.
a sample-by-sample distance matrix, suitable for NMDS, etc.
Please note that if calculateUnifrac
is used as a FUN
for
runMDS
, the argument ntop
has to be set to nrow(x)
.
http://bmf.colorado.edu/unifrac/
The main implementation (Fast Unifrac) is adapted from the algorithm's description in:
Hamady, Lozupone, and Knight, ``Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data.'' The ISME Journal (2010) 4, 17--27.
See also additional descriptions of Unifrac in the following articles:
Lozupone, Hamady and Knight, ``Unifrac - An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context.'', BMC Bioinformatics 2006, 7:371
Lozupone, Hamady, Kelley and Knight, ``Quantitative and qualitative (beta) diversity measures lead to different insights into factors that structure microbial communities.'' Appl Environ Microbiol. 2007
Lozupone C, Knight R. ``Unifrac: a new phylogenetic method for comparing microbial communities.'' Appl Environ Microbiol. 2005 71 (12):8228-35.
data(esophagus)
library(scater)
calculateUnifrac(esophagus, weighted = FALSE)
#> B C
#> C 0.5175550
#> D 0.5182284 0.5422394
calculateUnifrac(esophagus, weighted = TRUE)
#> B C
#> C 0.2035424
#> D 0.2603371 0.2477016
calculateUnifrac(esophagus, weighted = TRUE, normalized = FALSE)
#> B C
#> C 0.1050480
#> D 0.1401124 0.1422409
# for using calculateUnifrac in conjunction with runMDS the tree argument
# has to be given separately. In addition, subsetting using ntop must
# be disabled
esophagus <- runMDS(esophagus, FUN = calculateUnifrac, name = "Unifrac",
tree = rowTree(esophagus),
exprs_values = "counts",
ntop = nrow(esophagus))
reducedDim(esophagus)
#> [,1] [,2]
#> B 0.1802389 -0.25452056
#> C -0.3288021 -0.01688879
#> D 0.1485632 0.27140935
#> attr(,"eig")
#> [1] 0.1626679 0.1387290 0.0000000
#> attr(,"GOF")
#> [1] 1 1