Estimate divergence against a given reference sample.

addDivergence(x, name = "divergence", ...)

# S4 method for class 'SummarizedExperiment'
addDivergence(x, name = "divergence", ...)

getDivergence(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  reference = "median",
  method = "bray",
  ...
)

# S4 method for class 'SummarizedExperiment'
getDivergence(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  reference = "median",
  method = "bray",
  ...
)

Arguments

x

a SummarizedExperiment object.

name

Character scalar. The name to be used to store the result in metadata of the output. (Default: method)

...

optional arguments passed to addDissimilarity. Additionally:

  • dimred: Character scalar. Specifies the name of dimension reduction result from reducedDim(x). If used, these values are used to calculate divergence instead of the assay. Can be disabled with NULL. (Default: NULL)

assay.type

Character scalar. Specifies which assay to use for calculation. (Default: "counts")

assay_name

Deprecated. Use assay.type instead.

reference

Character scalar. A column name from colData(x) or either "mean" or "median". (Default: "median")

method

Character scalar. Specifies which dissimilarity to calculate. (Default: "bray")

Value

x with additional colData named name

Details

Microbiota divergence (heterogeneity / spread) within a given sample set can be quantified by the average sample dissimilarity or beta diversity with respect to a given reference sample.

The calculation makes use of the function getDissimilarity(). The divergence measure is sensitive to sample size. Subsampling or bootstrapping can be applied to equalize sample sizes between comparisons.

Examples

data(GlobalPatterns)
tse <- GlobalPatterns

# By default, reference is median of all samples. The name of column where
# results is "divergence" by default, but it can be specified. 
tse <- addDivergence(tse)

# The method that are used to calculate distance in divergence and 
# reference can be specified. Here, euclidean distance is used. Reference is
# the first sample. It is recommended # to add reference to colData.
tse[["reference"]] <- rep(colnames(tse)[[1]], ncol(tse))
tse <- addDivergence(
    tse, name = "divergence_first_sample", 
    reference = "reference",
    method = "euclidean")

# Here we compare samples to global mean
tse <- addDivergence(tse, name = "divergence_average", reference = "mean")

# All three divergence results are stored in colData.
colData(tse)
#> DataFrame with 26 rows and 11 columns
#>         X.SampleID   Primer Final_Barcode Barcode_truncated_plus_T
#>           <factor> <factor>      <factor>                 <factor>
#> CL3        CL3      ILBC_01        AACGCA                   TGCGTT
#> CC1        CC1      ILBC_02        AACTCG                   CGAGTT
#> SV1        SV1      ILBC_03        AACTGT                   ACAGTT
#> M31Fcsw    M31Fcsw  ILBC_04        AAGAGA                   TCTCTT
#> M11Fcsw    M11Fcsw  ILBC_05        AAGCTG                   CAGCTT
#> ...            ...      ...           ...                      ...
#> TS28         TS28   ILBC_25        ACCAGA                   TCTGGT
#> TS29         TS29   ILBC_26        ACCAGC                   GCTGGT
#> Even1        Even1  ILBC_27        ACCGCA                   TGCGGT
#> Even2        Even2  ILBC_28        ACCTCG                   CGAGGT
#> Even3        Even3  ILBC_29        ACCTGT                   ACAGGT
#>         Barcode_full_length SampleType
#>                    <factor>   <factor>
#> CL3             CTAGCGTGCGT      Soil 
#> CC1             CATCGACGAGT      Soil 
#> SV1             GTACGCACAGT      Soil 
#> M31Fcsw         TCGACATCTCT      Feces
#> M11Fcsw         CGACTGCAGCT      Feces
#> ...                     ...        ...
#> TS28            GCATCGTCTGG      Feces
#> TS29            CTAGTCGCTGG      Feces
#> Even1           TGACTCTGCGG      Mock 
#> Even2           TCTGATCGAGG      Mock 
#> Even3           AGAGAGACAGG      Mock 
#>                                        Description divergence   reference
#>                                           <factor>  <numeric> <character>
#> CL3     Calhoun South Carolina Pine soil, pH 4.9     0.989114         CL3
#> CC1     Cedar Creek Minnesota, grassland, pH 6.1     0.991217         CL3
#> SV1     Sevilleta new Mexico, desert scrub, pH 8.3   0.986994         CL3
#> M31Fcsw M3, Day 1, fecal swab, whole body study      0.995435         CL3
#> M11Fcsw M1, Day 1, fecal swab, whole body study      0.996395         CL3
#> ...                                            ...        ...         ...
#> TS28                                       Twin #1   0.991388         CL3
#> TS29                                       Twin #2   0.992698         CL3
#> Even1                                      Even1     0.990063         CL3
#> Even2                                      Even2     0.989827         CL3
#> Even3                                      Even3     0.991461         CL3
#>         divergence_first_sample divergence_average
#>                       <numeric>          <numeric>
#> CL3                         0.0           0.879196
#> CC1                     83210.0           0.875744
#> SV1                     73809.5           0.915286
#> M31Fcsw                419594.0           0.842727
#> M11Fcsw                626574.7           0.870541
#> ...                         ...                ...
#> TS28                     185596           0.813599
#> TS29                     352153           0.863493
#> Even1                    225268           0.809229
#> Even2                    194434           0.808371
#> Even3                    204304           0.814546