Several functions for calculation of community richness indices available via
wrapper functions. They are implemented via the vegan
package.
estimateRichness(
x,
assay.type = assay_name,
assay_name = "counts",
index = c("ace", "chao1", "hill", "observed"),
name = index,
detection = 0,
...,
BPPARAM = SerialParam()
)
# S4 method for SummarizedExperiment
estimateRichness(
x,
assay.type = assay_name,
assay_name = "counts",
index = c("ace", "chao1", "hill", "observed"),
name = index,
detection = 0,
...,
BPPARAM = SerialParam()
)
a SummarizedExperiment
object.
the name of the assay used for calculation of the sample-wise estimates.
a single character
value for specifying which
assay to use for calculation.
(Please use assay.type
instead. At some point assay_name
will be disabled.)
a character
vector, specifying the richness measures
to be calculated.
a name for the column(s) of the colData the results should be stored in.
a numeric value for selecting detection threshold for the abundances. The default detection threshold is 0.
additional parameters passed to estimateRichness
A
BiocParallelParam
object specifying whether calculation of estimates should be parallelized.
x
with additional colData
named
*name*
These include the ‘ace’, ‘Chao1’, ‘Hill’, and ‘Observed’ richness measures. See details for more information and references.
The richness is calculated per sample. This is a standard index in community ecology, and it provides an estimate of the number of unique species in the community. This is often not directly observed for the whole community but only for a limited sample from the community. This has led to alternative richness indices that provide different ways to estimate the species richness.
Richness index differs from the concept of species diversity or evenness in that it ignores species abundance, and focuses on the binary presence/absence values that indicate simply whether the species was detected.
The function takes all index names in full lowercase. The user can provide
the desired spelling through the argument name
(see examples).
The following richness indices are provided.
'ace': Abundance-based coverage estimator (ACE) is another
nonparametric richness
index that uses sample coverage, defined based on the sum of the
probabilities
of the observed species. This method divides the species into abundant
(more than 10
reads or observations) and rare groups
in a sample and tends to underestimate the real number of species. The
ACE index
ignores the abundance information for the abundant species,
based on the assumption that the abundant species are observed regardless
of their
exact abundance. We use here the bias-corrected version
(O'Hara 2005, Chiu et al. 2014) implemented in
estimateR
.
For an exact formulation, see estimateR
.
Note that this index comes with an additional column with standard
error information.
'chao1': This is a nonparametric estimator of species richness. It
assumes that rare species carry information about the (unknown) number
of unobserved species. We use here the bias-corrected version
(O'Hara 2005, Chiu et al. 2014) implemented in
estimateR
. This index implicitly
assumes that every taxa has equal probability of being observed. Note
that it gives a lower bound to species richness. The bias-corrected
for an exact formulation, see estimateR
.
This estimator uses only the singleton and doubleton counts, and
hence it gives more weight to the low abundance species.
Note that this index comes with an additional column with standard
error information.
'hill': Effective species richness aka Hill index (see e.g. Chao et al. 2016). Currently only the case 1D is implemented. This corresponds to the exponent of Shannon diversity. Intuitively, the effective richness indicates the number of species whose even distribution would lead to the same diversity than the observed community, where the species abundances are unevenly distributed.
'observed': The observed richness gives the number of species that
is detected above a given detection
threshold in the observed sample
(default 0). This is conceptually the simplest richness index. The
corresponding index in the vegan package is "richness".
Chao A. (1984) Non-parametric estimation of the number of classes in a population. Scand J Stat. 11:265–270.
Chao A, Chun-Huo C, Jost L (2016). Phylogenetic Diversity Measures and Their Decomposition: A Framework Based on Hill Numbers. Biodiversity Conservation and Phylogenetic Systematics, Springer International Publishing, pp. 141–172, doi:10.1007/978-3-319-22461-9_8.
Chiu, C.H., Wang, Y.T., Walther, B.A. & Chao, A. (2014). Improved nonparametric lower bound of species richness via a modified Good-Turing frequency formula. Biometrics 70, 671-682.
O'Hara, R.B. (2005). Species richness estimators: how many species can dance on the head of a pin? J. Anim. Ecol. 74, 375-386.
data(esophagus)
# Calculates all richness indices by default
esophagus <- estimateRichness(esophagus)
# Shows all indices
colData(esophagus)
#> DataFrame with 3 rows and 6 columns
#> ace ace_se chao1 chao1_se hill observed
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> B 49.0970 4.12372 39.1429 8.22425 9.48173 28
#> C 40.9465 3.10030 37.5000 4.10381 15.83763 33
#> D 88.9768 6.50671 71.0000 19.32178 7.63309 38
# Shows Hill index
colData(esophagus)$hill
#> B C D
#> 9.481732 15.837635 7.633094
# Deletes hill index
colData(esophagus)$hill <- NULL
# Shows all indices, hill is deleted
colData(esophagus)
#> DataFrame with 3 rows and 5 columns
#> ace ace_se chao1 chao1_se observed
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> B 49.0970 4.12372 39.1429 8.22425 28
#> C 40.9465 3.10030 37.5000 4.10381 33
#> D 88.9768 6.50671 71.0000 19.32178 38
# Delete the remaining indices
colData(esophagus)[, c("observed", "chao1", "ace")] <- NULL
# Calculates observed richness index and saves them with specific names
esophagus <- estimateRichness(esophagus,
index = c("observed", "chao1", "ace", "hill"),
name = c("Observed", "Chao1", "ACE", "Hill"))
# Show the new indices
colData(esophagus)
#> DataFrame with 3 rows and 8 columns
#> ace_se chao1_se Observed Chao1 Chao1_se ACE ACE_se
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> B 4.12372 8.22425 28 39.1429 8.22425 49.0970 4.12372
#> C 3.10030 4.10381 33 37.5000 4.10381 40.9465 3.10030
#> D 6.50671 19.32178 38 71.0000 19.32178 88.9768 6.50671
#> Hill
#> <numeric>
#> B 9.48173
#> C 15.83763
#> D 7.63309
# Deletes all colData (including the indices)
colData(esophagus) <- NULL
# Calculate observed richness excluding singletons (detection limit 1)
esophagus <- estimateRichness(esophagus, index="observed", detection = 1)
# Deletes all colData (including the indices)
colData(esophagus) <- NULL
# Indices must be written correctly (all lowercase), otherwise an error
# gets thrown
esophagus <- estimateRichness(esophagus, index="ace")
# Calculates Chao1 and ACE indices only
esophagus <- estimateRichness(esophagus, index=c("chao1", "ace"),
name=c("Chao1", "ACE"))
# Deletes all colData (including the indices)
colData(esophagus) <- NULL
# Names of columns can be chosen arbitrarily, but the length of arguments
# must match.
esophagus <- estimateRichness(esophagus,
index = c("ace", "chao1"),
name = c("index1", "index2"))
# Shows all indices
colData(esophagus)
#> DataFrame with 3 rows and 4 columns
#> index1 index1_se index2 index2_se
#> <numeric> <numeric> <numeric> <numeric>
#> B 49.0970 4.12372 39.1429 8.22425
#> C 40.9465 3.10030 37.5000 4.10381
#> D 88.9768 6.50671 71.0000 19.32178