R/getCrossAssociation.R
getCrossAssociation.Rd
Calculate correlations between features of two experiments.
getCrossAssociation(x, ...)
# S4 method for class 'MultiAssayExperiment'
getCrossAssociation(
x,
experiment1 = 1,
experiment2 = 2,
assay.type1 = assay_name1,
assay_name1 = "counts",
assay.type2 = assay_name2,
assay_name2 = "counts",
altexp1 = NULL,
altexp2 = NULL,
col.var1 = colData_variable1,
colData_variable1 = NULL,
col.var2 = colData_variable2,
colData_variable2 = NULL,
by = MARGIN,
MARGIN = 1,
method = c("kendall", "spearman", "categorical", "pearson"),
mode = "table",
p.adj.method = p_adj_method,
p_adj_method = c("fdr", "BH", "bonferroni", "BY", "hochberg", "holm", "hommel", "none"),
p.adj.threshold = p_adj_threshold,
p_adj_threshold = NULL,
cor.threshold = cor_threshold,
cor_threshold = NULL,
sort = FALSE,
filter.self.cor = filter_self_correlations,
filter_self_correlations = FALSE,
verbose = TRUE,
test.signif = test_significance,
test_significance = FALSE,
show.warnings = show_warnings,
show_warnings = TRUE,
paired = FALSE,
...
)
# S4 method for class 'SummarizedExperiment'
getCrossAssociation(x, experiment2 = x, ...)
A
MultiAssayExperiment
or
SummarizedExperiment
object.
Additional arguments:
symmetric
: Logical scalar
. Specifies if
measure is symmetric or not. When symmetric = TRUE
, associations
are calculated only for unique variable-pairs, and they are assigned to
corresponding variable-pair. This decreases the number of calculations in
2-fold meaning faster execution. (By default: FALSE
)
association.fun
: A function that is used to calculate
(dis-)similarity between features. Function must take matrix as an input
and give numeric values as an output. Adjust method
and other
parameters correspondingly. Supported functions are, for example,
stats::dist
and vegan::vegdist
.
Character scalar
or numeric scalar
.
Selects the experiment 1 from experiments(x)
of
MultiassayExperiment
object. (Default: 1
)
Character scalar
or numeric scalar
.
Selects the experiment 2 fromexperiments(x)
of
MultiAssayExperiment
object or
altExp(x)
of TreeSummarizedExperiment
object. Alternatively,
experiment2
can also be TreeSE
object when x
is
TreeSE
object. (Default: 2
when x
is MAE
and
x
when x
is TreeSE
)
Character scalar
. Specifies the name of the assay
in experiment 1 to be transformed.. (Default: "counts"
)
Deprecated. Use assay.type1
instead.
Character scalar
. Specifies the name of the
assay in experiment 2 to be transformed.. (Default: "counts"
)
Deprecated. Use assay.type2
instead.
Character scalar
or numeric scalar
. Specifies
alternative experiment from the altExp of experiment 1. If NULL, then the
experiment is itself and altExp option is disabled. (Default: NULL
)
Character scalar
or numeric scalar
. Specifies
alternative experiment from the altExp of experiment 2. If NULL, then the
experiment is itself and altExp option is disabled. (Default: NULL
)
Character scalar
. Specifies column(s) from
colData
of experiment 1. If col.var1 is used, assay.type1 is disabled.
(Default: NULL
)
Deprecated. Use col.var1
instead.
Character scalar
. Specifies column(s) from colData
of experiment 2. If col.var2 is used, assay.type2 is disabled.
(Default: NULL
)
Deprecated. Use col.var2
instead.
ACharacter scalar
. Determines if association are calculated
row-wise / for features ('rows') or column-wise / for samples ('cols').
Must be 'rows'
or 'cols'
.
Deprecated. Use by
instead.
Character scalar
. Defines the association method
('kendall', pearson', or 'spearman' for continuous/numeric; 'categorical'
for discrete) (Default: "kendall"
)
Character scalar
. Specifies the output format
Available formats are 'table' and 'matrix'. (Default: "table"
)
Character scalar
. Specifies adjustment method of
p-values. Passed to p.adjust
function.
(Default: "fdr"
)
Deprecated. Use p.adj.method
instead.
Numeric scalar
. From 0 to 1
, specifies
adjusted p-value threshold for filtering.
(Default: NULL
)
Deprecated. Use p.dj.threshold
instead.
Numeric scalar
. From 0 to 1
, specifies
correlation threshold for filtering.
(Default: NULL
)
Deprecated. Use cor.threshold
instead.
Logical scalar
. Specifies whether to sort features or not
in result matrices. Used method is hierarchical clustering.
(Default: FALSE
)
Logical scalar
. Specifies whether to
filter out correlations between identical items. Applies only when
correlation between experiment itself is tested, i.e., when assays are
identical. (Default: FALSE
)
Deprecated. Use filter.self.cor
instead.
Logical scalar
. Specifies whether to get messages
about progress of calculation. (Default: FALSE
)
Logical scalar
. Specifies whether to test
statistical significance of associations.
(Default: FALSE
)
Deprecated. Use test.signif
instead.
Logical scalar
. specifies whether to show
warnings that might occur when correlations and p-values are calculated.
(Default: FALSE
)
Deprecated. use show.warnings
instead.
Logical scalar
. Specifies if samples are paired or not.
colnames
must match between twp experiments. paired
is disabled
when by = 1
. (Default: FALSE
)
This function returns associations in table or matrix format. In table format, returned value is a data frame that includes features and associations (and p-values) in columns. In matrix format, returned value is a one matrix when only associations are calculated. If also significances are tested, then returned value is a list of matrices.
The function getCrossAssociation
calculates associations between
features of two experiments. By default, it not only computes associations
but also tests their significance. If desired, setting
test.signif
to FALSE disables significance calculation.
We recommend the non-parametric Kendall's tau as the default method for association analysis. Kendall's tau has desirable statistical properties and robustness at lower sample sizes. Spearman rank correlation can provide faster solutions when running times are critical.
data(HintikkaXOData)
mae <- HintikkaXOData
# Subset so that less observations / quicker to run, just for example
mae[[1]] <- mae[[1]][1:20, 1:10]
#> harmonizing input:
#> removing 30 sampleMap rows with 'colname' not in colnames of experiments
mae[[2]] <- mae[[2]][1:20, 1:10]
#> harmonizing input:
#> removing 30 sampleMap rows with 'colname' not in colnames of experiments
# Several rows in the counts assay have a standard deviation of zero
# Remove them, since they do not add useful information about
# cross-association
mae[[1]] <- mae[[1]][rowSds(assay(mae[[1]])) > 0, ]
# Transform data
mae[[1]] <- transformAssay(mae[[1]], method = "rclr")
# Calculate cross-correlations
result <- getCrossAssociation(mae, method = "pearson", assay.type2 = "nmr")
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, col.var1: -, assay.type2: nmr, col.var2: -
#> by: 1, function: stats::cor, method: pearson, test.signif: FALSE, p.adj.method: -, paired: FALSE, show.warnings: TRUE
#> Warning: the standard deviation is zero
# Show first 5 entries
head(result, 5)
#> Var1 Var2 cor
#> 1 CVJT01000011.50.2173 Butyrate -0.15394918
#> 2 AYSG01000002.292.2076 Butyrate -0.17674170
#> 3 CCPS01000022.154.1916 Butyrate 0.43730668
#> 4 EU622687.1.1587 Butyrate 0.06046832
#> 5 KY646024.1.1591 Butyrate 0.96598301
# Use altExp option to specify alternative experiment from the experiment
altExp(mae[[1]], "Phylum") <- agglomerateByRank(mae[[1]], rank = "Phylum")
#> Warning: 'rclr' includes negative values.
#> Agglomeration of it might lead to meaningless values.
#> Check the assay, and consider doing transformation againmanually with agglomerated data.
# Transform data
altExp(mae[[1]], "Phylum") <- transformAssay(
altExp(mae[[1]], "Phylum"), method = "relabundance")
# When mode = "matrix", the return value is a matrix
result <- getCrossAssociation(
mae, experiment2 = 2, assay.type1 = "relabundance", assay.type2 = "nmr",
altexp1 = "Phylum", method = "pearson", mode = "matrix")
#> Calculating correlations...
#> altexp1: Phylum, altexp2: -, assay.type1: relabundance, col.var1: -, assay.type2: nmr, col.var2: -
#> by: 1, function: stats::cor, method: pearson, test.signif: FALSE, p.adj.method: -, paired: FALSE, show.warnings: TRUE
#> Converting table into matrices...
# Show first 5 entries
head(result, 5)
#> Butyrate Acetate Propionate Valerate Isovalerate
#> D_1__Firmicutes -0.2126789 0.6345497 0.7073503 0.3731159 -0.01672613
#> D_1__Proteobacteria 0.2126789 -0.6345497 -0.7073503 -0.3731159 0.01672613
#> Isobutyrate Formate Glucose Glycerol Threonine
#> D_1__Firmicutes 0.1945935 -0.0818866 -0.0253739 -0.695665 0.2707132
#> D_1__Proteobacteria -0.1945935 0.0818866 0.0253739 0.695665 -0.2707132
#> Leucine Methionine Valine Proline Lactate
#> D_1__Firmicutes -0.6423713 -0.5407879 -0.08533952 -0.2741442 -0.5819794
#> D_1__Proteobacteria 0.6423713 0.5407879 0.08533952 0.2741442 0.5819794
#> Glutamate Ethanol Tryptophan Aspartate Alanine
#> D_1__Firmicutes -0.03341937 -0.3118914 -0.005711173 0.09035227 -0.01673503
#> D_1__Proteobacteria 0.03341937 0.3118914 0.005711173 -0.09035227 0.01673503
# If test.signif = TRUE, then getCrossAssociation additionally returns
# significances
# filter.self.cor = TRUE filters self correlations
# p.adj.threshold can be used to filter those features that do not
# have any correlations whose p-value is lower than the threshold
result <- getCrossAssociation(
mae[[1]], experiment2 = mae[[1]], method = "pearson",
filter.self.cor = TRUE, p.adj.threshold = 0.05, test.signif = TRUE)
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, col.var1: -, assay.type2: counts, col.var2: -
#> by: 1, function: stats::cor.test, method: pearson, test.signif: TRUE, p.adj.method: fdr, paired: FALSE, show.warnings: TRUE
#> Filtering results...
#> p_adj_threshold: 0.05, cor.threshold: -, filter.self.cor: TRUE
# Show first 5 entries
head(result, 5)
#> Var1 Var2 cor pval
#> 27 LG042641.312212.313771 CCPS01000022.154.1916 0.8005191 0.005403200
#> 48 AB559692.1.1569 KY646024.1.1591 1.0000000 0.000000000
#> 63 CCPS01000022.154.1916 LG042641.312212.313771 0.8005191 0.005403200
#> 69 AY349382.1.1568 LG042641.312212.313771 0.8259905 0.003232798
#> 75 KY646024.1.1591 AB559692.1.1569 1.0000000 0.000000000
#> p_adj
#> 27 0.03377000
#> 48 0.00000000
#> 63 0.03377000
#> 69 0.02309142
#> 75 0.00000000
# Returned value is a list of matrices
names(result)
#> [1] "Var1" "Var2" "cor" "pval" "p_adj"
# Calculate Bray-Curtis dissimilarity between samples. If dataset includes
# paired samples, you can use paired = TRUE.
result <- getCrossAssociation(
mae[[1]], mae[[1]], by = 2, paired = FALSE,
association.fun = getDissimilarity, method = "bray")
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, col.var1: -, assay.type2: counts, col.var2: -
#> by: 2, function: getDissimilarity, method: bray, test.signif: FALSE, p.adj.method: -, paired: FALSE, show.warnings: TRUE
# If experiments are equal and measure is symmetric
# (e.g., taxa1 vs taxa2 == taxa2 vs taxa1),
# it is possible to speed-up calculations by calculating association only
# for unique variable-pairs. Use "symmetric" to choose whether to measure
# association for only other half of of variable-pairs.
result <- getCrossAssociation(
mae, experiment1 = "microbiota", experiment2 = "microbiota",
assay.type1 = "counts", assay.type2 = "counts", symmetric = TRUE)
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, col.var1: -, assay.type2: counts, col.var2: -
#> by: 1, function: stats::cor, method: kendall, test.signif: FALSE, p.adj.method: -, paired: FALSE, show.warnings: TRUE
# For big data sets, the calculations might take a long time.
# To speed them up, you can take a random sample from the data.
# When dealing with complex biological problems, random samples can be
# enough to describe the data. Here, our random sample is 30 % of whole data.
sample_size <- 0.3
tse <- mae[[1]]
tse_sub <- tse[ sample( seq_len( nrow(tse) ), sample_size * nrow(tse) ), ]
result <- getCrossAssociation(tse_sub)
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, col.var1: -, assay.type2: counts, col.var2: -
#> by: 1, function: stats::cor, method: kendall, test.signif: FALSE, p.adj.method: -, paired: FALSE, show.warnings: TRUE
# It is also possible to choose variables from colData and calculate
# association between assay and sample metadata or between variables of
# sample metadata
mae[[1]] <- addAlpha(mae[[1]])
#> Warning: 'faith' index can be calculated only for TreeSE with rowTree(x) populated or with 'tree' provided separately.
# col.var works similarly to assay.type. Instead of fetching
# an assay named assay.type from assay slot, it fetches a column named
# col.var from colData.
result <- getCrossAssociation(
mae[[1]], assay.type1 = "counts",
col.var2 = c("shannon_diversity", "coverage_diversity"),
test.signif = TRUE)
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, col.var1: -, assay.type2: -, col.var2: shannon_diversity + coverage_diversity
#> by: 1, function: stats::cor.test, method: kendall, test.signif: TRUE, p.adj.method: fdr, paired: FALSE, show.warnings: TRUE
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties