Calculate correlations between features of two experiments.

getExperimentCrossAssociation(x, ...)

# S4 method for MultiAssayExperiment
getExperimentCrossAssociation(
  x,
  experiment1 = 1,
  experiment2 = 2,
  assay.type1 = assay_name1,
  assay_name1 = "counts",
  assay.type2 = assay_name2,
  assay_name2 = "counts",
  altexp1 = NULL,
  altexp2 = NULL,
  colData_variable1 = NULL,
  colData_variable2 = NULL,
  MARGIN = 1,
  method = c("kendall", "spearman", "categorical", "pearson"),
  mode = "table",
  p_adj_method = c("fdr", "BH", "bonferroni", "BY", "hochberg", "holm", "hommel", "none"),
  p_adj_threshold = NULL,
  cor_threshold = NULL,
  sort = FALSE,
  filter_self_correlations = FALSE,
  verbose = TRUE,
  test_significance = FALSE,
  show_warnings = TRUE,
  paired = FALSE,
  ...
)

# S4 method for SummarizedExperiment
getExperimentCrossAssociation(x, experiment2 = x, ...)

testExperimentCrossAssociation(x, ...)

# S4 method for ANY
testExperimentCrossAssociation(x, ...)

testExperimentCrossCorrelation(x, ...)

# S4 method for ANY
testExperimentCrossCorrelation(x, ...)

getExperimentCrossCorrelation(x, ...)

# S4 method for ANY
getExperimentCrossCorrelation(x, ...)

Arguments

x

A MultiAssayExperiment or SummarizedExperiment object.

...

Additional arguments:

  • symmetric A single boolean value for specifying if measure is symmetric or not. When symmetric = TRUE, associations are calculated only for unique variable-pairs, and they are assigned to corresponding variable-pair. This decreases the number of calculations in 2-fold meaning faster execution. (By default: symmetric = FALSE)

  • association_FUN A function that is used to calculate (dis-)similarity between features. Function must take matrix as an input and give numeric values as an output. Adjust method and other parameters correspondingly. Supported functions are, for example, stats::dist and vegan::vegdist.

experiment1

A single character or numeric value for selecting the experiment 1 from experiments(x) of MultiassayExperiment object. (By default: experiment1 = 1)

experiment2

A single character or numeric value for selecting the experiment 2 fromexperiments(x) of MultiAssayExperiment object or altExp(x) of TreeSummarizedExperiment object. Alternatively, experiment2 can also be TreeSE object when x is TreeSE object. (By default: experiment2 = 2 when x is MAE and experiment2 = x when x is TreeSE)

assay.type1

A single character value for selecting the assay of experiment 1 to be transformed. (By default: assay.type1 = "counts")

assay_name1

a single character value for specifying which assay of experiment 1 to use for calculation. (Please use assay.type1 instead. At some point assay_name1 will be disabled.)

assay.type2

A single character value for selecting the assay of experiment 2 to be transformed. (By default: assay.type2 = "counts")

assay_name2

a single character value for specifying which assay of experiment 2 to use for calculation. (Please use assay.type2 instead. At some point assay_name2 will be disabled.)

altexp1

A single numeric or character value specifying alternative experiment from the altExp of experiment 1. If NULL, then the experiment is itself and altExp option is disabled. (By default: altexp1 = NULL)

altexp2

A single numeric or character value specifying alternative experiment from the altExp of experiment 2. If NULL, then the experiment is itself and altExp option is disabled. (By default: altexp2 = NULL)

colData_variable1

A character value specifying column(s) from colData of experiment 1. If colData_variable1 is used, assay.type1 is disabled. (By default: colData_variable1 = NULL)

colData_variable2

A character value specifying column(s) from colData of experiment 2. If colData_variable2 is used, assay.type2 is disabled. (By default: colData_variable2 = NULL)

MARGIN

A single numeric value for selecting if association are calculated row-wise / for features (1) or column-wise / for samples (2). Must be 1 or 2. (By default: MARGIN = 1)

method

A single character value for selecting association method ('kendall', pearson', or 'spearman' for continuous/numeric; 'categorical' for discrete) (By default: method = "kendall")

mode

A single character value for selecting output format Available formats are 'table' and 'matrix'. (By default: mode = "table")

p_adj_method

A single character value for selecting adjustment method of p-values. Passed to p.adjust function. (By default: p_adj_method = "fdr")

p_adj_threshold

A single numeric value (from 0 to 1) for selecting adjusted p-value threshold for filtering. (By default: p_adj_threshold = NULL)

cor_threshold

A single numeric absolute value (from 0 to 1) for selecting correlation threshold for filtering. (By default: cor_threshold = NULL)

sort

A single boolean value for selecting whether to sort features or not in result matrices. Used method is hierarchical clustering. (By default: sort = FALSE)

filter_self_correlations

A single boolean value for selecting whether to filter out correlations between identical items. Applies only when correlation between experiment itself is tested, i.e., when assays are identical. (By default: filter_self_correlations = FALSE)

verbose

A single boolean value for selecting whether to get messages about progress of calculation.

test_significance

A single boolean value for selecting whether to test statistical significance of associations.

show_warnings

A single boolean value for selecting whether to show warnings that might occur when correlations and p-values are calculated.

paired

A single boolean value for specifying if samples are paired or not. colnames must match between twp experiments. paired is disabled when MARGIN = 1. (By default: paired = FALSE)

Value

These functions return associations in table or matrix format. In table format, returned value is a data frame that includes features and associations (and p-values) in columns. In matrix format, returned value is a one matrix when only associations are calculated. If also significances are tested, then returned value is a list of matrices.

Details

These functions calculates associations between features of two experiments. getExperimentCrossAssociation calculates only associations by default. testExperimentCrossAssociation calculates also significance of associations.

We recommend the non-parametric Kendall's tau as the default method for association analysis. Kendall's tau has desirable statistical properties and robustness at lower sample sizes. Spearman rank correlation can provide faster solutions when running times are critical.

Author

Leo Lahti and Tuomas Borman. Contact: microbiome.github.io

Examples

data(HintikkaXOData)
mae <- HintikkaXOData

# Subset so that less observations / quicker to run, just for example
mae[[1]] <- mae[[1]][1:20, 1:10]
#> harmonizing input:
#>   removing 30 sampleMap rows with 'colname' not in colnames of experiments
mae[[2]] <- mae[[2]][1:20, 1:10]
#> harmonizing input:
#>   removing 30 sampleMap rows with 'colname' not in colnames of experiments
# Several rows in the counts assay have a standard deviation of zero
# Remove them, since they do not add useful information about cross-association
mae[[1]] <- mae[[1]][rowSds(assay(mae[[1]])) > 0, ]
# Transform data
mae[[1]] <- transformAssay(mae[[1]], method = "rclr")

# Calculate cross-correlations
result <- getExperimentCrossAssociation(mae, method = "pearson", assay.type2 = "nmr")
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, colData_variable1: -, assay.type2: nmr, colData_variable2: -
#> MARGIN: 1, function: stats::cor, method: pearson, test_significance: FALSE, p_adj_method: -, paired: FALSE, show_warnings: TRUE
#> Warning: the standard deviation is zero
# Show first 5 entries
head(result, 5)
#>                    Var1     Var2         cor
#> 1  CVJT01000011.50.2173 Butyrate -0.15394918
#> 2 AYSG01000002.292.2076 Butyrate -0.17674170
#> 3 CCPS01000022.154.1916 Butyrate  0.43730668
#> 4       EU622687.1.1587 Butyrate  0.06046832
#> 5       KY646024.1.1591 Butyrate  0.96598301

# Use altExp option to specify alternative experiment from the experiment
altExp(mae[[1]], "Phylum") <- agglomerateByRank(mae[[1]], rank = "Phylum")
#> Warning: 'rclr' includes negative values.
#> Agglomeration of it might lead to meaningless values.
#> Check the assay, and consider doing transformation again manually with agglomerated data.
# Transform data
altExp(mae[[1]], "Phylum") <- transformAssay(altExp(mae[[1]], "Phylum"), method = "relabundance")
# When mode = "matrix", the return value is a matrix
result <- getExperimentCrossAssociation(mae, experiment2 = 2, 
                                        assay.type1 = "relabundance", assay.type2 = "nmr",
                                        altexp1 = "Phylum", 
                                        method = "pearson", mode = "matrix")
#> Calculating correlations...
#> altexp1: Phylum, altexp2: -, assay.type1: relabundance, colData_variable1: -, assay.type2: nmr, colData_variable2: -
#> MARGIN: 1, function: stats::cor, method: pearson, test_significance: FALSE, p_adj_method: -, paired: FALSE, show_warnings: TRUE
#> Converting table into matrices...
# Show first 5 entries
head(result, 5)
#>                       Butyrate    Acetate Propionate   Valerate Isovalerate
#> D_1__Firmicutes     -0.2126789  0.6345497  0.7073503  0.3731159 -0.01672613
#> D_1__Proteobacteria  0.2126789 -0.6345497 -0.7073503 -0.3731159  0.01672613
#>                     Isobutyrate    Formate    Glucose  Glycerol  Threonine
#> D_1__Firmicutes       0.1945935 -0.0818866 -0.0253739 -0.695665  0.2707132
#> D_1__Proteobacteria  -0.1945935  0.0818866  0.0253739  0.695665 -0.2707132
#>                        Leucine Methionine      Valine    Proline    Lactate
#> D_1__Firmicutes     -0.6423713 -0.5407879 -0.08533952 -0.2741442 -0.5819794
#> D_1__Proteobacteria  0.6423713  0.5407879  0.08533952  0.2741442  0.5819794
#>                       Glutamate    Ethanol   Tryptophan   Aspartate     Alanine
#> D_1__Firmicutes     -0.03341937 -0.3118914 -0.005711173  0.09035227 -0.01673503
#> D_1__Proteobacteria  0.03341937  0.3118914  0.005711173 -0.09035227  0.01673503

# testExperimentCorrelation additionally returns significances
# filter_self_correlations = TRUE filters self correlations
# p_adj_threshold can be used to filter those features that do not
# have any correlations whose p-value is lower than the threshold
result <- testExperimentCrossAssociation(mae[[1]], experiment2 = mae[[1]], method = "pearson",
                                         filter_self_correlations = TRUE,
                                         p_adj_threshold = 0.05)
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, colData_variable1: -, assay.type2: counts, colData_variable2: -
#> MARGIN: 1, function: stats::cor.test, method: pearson, test_significance: TRUE, p_adj_method: fdr, paired: FALSE, show_warnings: TRUE
#> Filtering results...
#> p_adj_threshold: 0.05, cor_threshold: -, filter_self_correlations: TRUE
# Show first 5 entries
head(result, 5)
#>                      Var1                   Var2       cor        pval
#> 27 LG042641.312212.313771  CCPS01000022.154.1916 0.8005191 0.005403200
#> 48        AB559692.1.1569        KY646024.1.1591 1.0000000 0.000000000
#> 63  CCPS01000022.154.1916 LG042641.312212.313771 0.8005191 0.005403200
#> 69        AY349382.1.1568 LG042641.312212.313771 0.8259905 0.003232798
#> 75        KY646024.1.1591        AB559692.1.1569 1.0000000 0.000000000
#>         p_adj
#> 27 0.03377000
#> 48 0.00000000
#> 63 0.03377000
#> 69 0.02309142
#> 75 0.00000000

# getExperimentCrossAssociation also returns significances when 
# test_significance = TRUE
# Warnings can be suppressed by using show_warnings = FALSE
result <- getExperimentCrossAssociation(mae[[1]], experiment2 = mae[[2]], method = "pearson",
                                        assay.type2 = "nmr",
                                        mode = "matrix", test_significance = TRUE,
                                        show_warnings = FALSE)
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, colData_variable1: -, assay.type2: nmr, colData_variable2: -
#> MARGIN: 1, function: stats::cor.test, method: pearson, test_significance: TRUE, p_adj_method: fdr, paired: FALSE, show_warnings: FALSE
#> Converting table into matrices...
                                        
# Returned value is a list of matrices
names(result)
#> [1] "cor"   "pval"  "p_adj"

# Calculate Bray-Curtis dissimilarity between samples. If dataset includes
# paired samples, you can use paired = TRUE.
result <- getExperimentCrossAssociation(mae[[1]], mae[[1]], MARGIN = 2, paired = FALSE,
                                        association_FUN = vegan::vegdist, method = "bray")
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, colData_variable1: -, assay.type2: counts, colData_variable2: -
#> MARGIN: 2, function: vegan::vegdist, method: bray, test_significance: FALSE, p_adj_method: -, paired: FALSE, show_warnings: TRUE
                                        

# If experiments are equal and measure is symmetric (e.g., taxa1 vs taxa2 == taxa2 vs taxa1),
# it is possible to speed-up calculations by calculating association only for unique
# variable-pairs. Use "symmetric" to choose whether to measure association for only
# other half of of variable-pairs.
result <- getExperimentCrossAssociation(mae, experiment1 = "microbiota", experiment2 = "microbiota", 
                                        assay.type1 = "counts", assay.type2 = "counts",
                                        symmetric = TRUE)
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, colData_variable1: -, assay.type2: counts, colData_variable2: -
#> MARGIN: 1, function: stats::cor, method: kendall, test_significance: FALSE, p_adj_method: -, paired: FALSE, show_warnings: TRUE

# For big data sets, the calculations might take a long time.
# To speed them up, you can take a random sample from the data. 
# When dealing with complex biological problems, random samples can be 
# enough to describe the data. Here, our random sample is 30 % of whole data.
sample_size <- 0.3
tse <- mae[[1]]
tse_sub <- tse[ sample( seq_len( nrow(tse) ), sample_size * nrow(tse) ), ]
result <- testExperimentCrossAssociation(tse_sub)
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, colData_variable1: -, assay.type2: counts, colData_variable2: -
#> MARGIN: 1, function: stats::cor.test, method: kendall, test_significance: TRUE, p_adj_method: fdr, paired: FALSE, show_warnings: TRUE
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties
#> Warning: Cannot compute exact p-value with ties

# It is also possible to choose variables from colData and calculate association
# between assay and sample metadata or between variables of sample metadata
mae[[1]] <- estimateDiversity(mae[[1]])
#> Warning: Faith diversity has been excluded from the results since it cannot be calculated without rowTree. This requires a rowTree in the input argument x. Make sure that 'rowTree(x)' is not empty, or make sure to specify 'tree_name' in the input arguments. Warning is also provided if the tree does not have any branches. You can consider adding rowTree to include this index.
# colData_variable works similarly to assay.type. Instead of fetching an assay
# named assay.type from assay slot, it fetches a column named colData_variable
# from colData.
result <- getExperimentCrossAssociation(mae[[1]], assay.type1 = "counts", 
                                        colData_variable2 = c("shannon", "coverage"))
#> Calculating correlations...
#> altexp1: -, altexp2: -, assay.type1: counts, colData_variable1: -, assay.type2: -, colData_variable2: shannon + coverage
#> MARGIN: 1, function: stats::cor, method: kendall, test_significance: FALSE, p_adj_method: -, paired: FALSE, show_warnings: TRUE