Estimate divergence against a given reference sample.

addDivergence(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  name = "divergence",
  reference = "median",
  FUN = vegan::vegdist,
  method = "bray",
  ...
)

# S4 method for SummarizedExperiment
addDivergence(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  name = "divergence",
  reference = "median",
  FUN = vegan::vegdist,
  method = "bray",
  ...
)

Arguments

x

a SummarizedExperiment object.

assay.type

the name of the assay used for calculation of the sample-wise estimates.

assay_name

a single character value for specifying which assay to use for calculation. (Please use assay.type instead. At some point assay_name will be disabled.)

name

a name for the column of the colData the results should be stored in. By default, name is "divergence".

reference

a numeric vector that has length equal to number of features, or a non-empty character value; either 'median' or 'mean'. reference specifies the reference that is used to calculate divergence. by default, reference is "median".

FUN

a function for distance calculation. The function must expect the input matrix as its first argument. With rows as samples and columns as features. By default, FUN is vegan::vegdist.

method

a method that is used to calculate the distance. Method is passed to the function that is specified by FUN. By default, method is "bray".

...

optional arguments

Value

x with additional colData named *name*

Details

Microbiota divergence (heterogeneity / spread) within a given sample set can be quantified by the average sample dissimilarity or beta diversity with respect to a given reference sample.

This measure is sensitive to sample size. Subsampling or bootstrapping can be applied to equalize sample sizes between comparisons.

Author

Leo Lahti and Tuomas Borman. Contact: microbiome.github.io

Examples

data(GlobalPatterns)
tse <- GlobalPatterns

# By default, reference is median of all samples. The name of column where results
# is "divergence" by default, but it can be specified. 
tse <- addDivergence(tse)

# The method that are used to calculate distance in divergence and 
# reference can be specified. Here, euclidean distance and dist function from 
# stats package are used. Reference is the first sample.
tse <- addDivergence(tse, name = "divergence_first_sample", 
                          reference = assays(tse)$counts[,1], 
                          FUN = stats::dist, method = "euclidean")

# Reference can also be median or mean of all samples. 
# By default, divergence is calculated by using median. Here, mean is used.
tse <- addDivergence(tse, name = "divergence_average", reference = "mean")

# All three divergence results are stored in colData.
colData(tse)
#> DataFrame with 26 rows and 10 columns
#>         X.SampleID   Primer Final_Barcode Barcode_truncated_plus_T
#>           <factor> <factor>      <factor>                 <factor>
#> CL3        CL3      ILBC_01        AACGCA                   TGCGTT
#> CC1        CC1      ILBC_02        AACTCG                   CGAGTT
#> SV1        SV1      ILBC_03        AACTGT                   ACAGTT
#> M31Fcsw    M31Fcsw  ILBC_04        AAGAGA                   TCTCTT
#> M11Fcsw    M11Fcsw  ILBC_05        AAGCTG                   CAGCTT
#> ...            ...      ...           ...                      ...
#> TS28         TS28   ILBC_25        ACCAGA                   TCTGGT
#> TS29         TS29   ILBC_26        ACCAGC                   GCTGGT
#> Even1        Even1  ILBC_27        ACCGCA                   TGCGGT
#> Even2        Even2  ILBC_28        ACCTCG                   CGAGGT
#> Even3        Even3  ILBC_29        ACCTGT                   ACAGGT
#>         Barcode_full_length SampleType
#>                    <factor>   <factor>
#> CL3             CTAGCGTGCGT      Soil 
#> CC1             CATCGACGAGT      Soil 
#> SV1             GTACGCACAGT      Soil 
#> M31Fcsw         TCGACATCTCT      Feces
#> M11Fcsw         CGACTGCAGCT      Feces
#> ...                     ...        ...
#> TS28            GCATCGTCTGG      Feces
#> TS29            CTAGTCGCTGG      Feces
#> Even1           TGACTCTGCGG      Mock 
#> Even2           TCTGATCGAGG      Mock 
#> Even3           AGAGAGACAGG      Mock 
#>                                        Description divergence
#>                                           <factor>  <numeric>
#> CL3     Calhoun South Carolina Pine soil, pH 4.9     0.989114
#> CC1     Cedar Creek Minnesota, grassland, pH 6.1     0.991217
#> SV1     Sevilleta new Mexico, desert scrub, pH 8.3   0.986994
#> M31Fcsw M3, Day 1, fecal swab, whole body study      0.995435
#> M11Fcsw M1, Day 1, fecal swab, whole body study      0.996395
#> ...                                            ...        ...
#> TS28                                       Twin #1   0.991388
#> TS29                                       Twin #2   0.992698
#> Even1                                      Even1     0.990063
#> Even2                                      Even2     0.989827
#> Even3                                      Even3     0.991461
#>         divergence_first_sample divergence_average
#>                       <numeric>          <numeric>
#> CL3                         0.0           0.879196
#> CC1                     83210.0           0.875744
#> SV1                     73809.5           0.915286
#> M31Fcsw                419594.0           0.842727
#> M11Fcsw                626574.7           0.870541
#> ...                         ...                ...
#> TS28                     185596           0.813599
#> TS29                     352153           0.863493
#> Even1                    225268           0.809229
#> Even2                    194434           0.808371
#> Even3                    204304           0.814546