These functions estimates alpha diversity indices optionally using rarefaction.
addAlpha(x, ...)
getAlpha(x, ...)
# S4 method for class 'SummarizedExperiment'
addAlpha(x, ...)
# S4 method for class 'SummarizedExperiment'
getAlpha(
x,
assay.type = "counts",
index = c("coverage_diversity", "fisher_diversity", "faith_diversity",
"gini_simpson_diversity", "inverse_simpson_diversity",
"log_modulo_skewness_diversity", "shannon_diversity", "absolute_dominance",
"dbp_dominance", "core_abundance_dominance", "gini_dominance", "dmn_dominance",
"relative_dominance", "simpson_lambda_dominance", "camargo_evenness",
"pielou_evenness", "simpson_evenness", "evar_evenness", "bulla_evenness",
"ace_richness", "chao1_richness", "hill_richness", "observed_richness"),
name = index,
niter = NULL,
BPPARAM = SerialParam(),
...
)
a SummarizedExperiment
object.
optional arguments:
sample
: Integer scalar
. Specifies the rarefaction
depth i.e. the number of counts drawn from each sample.
(Default: min(colSums2(assay(x, assay.type)))
)
tree.name
: Character scalar
. Specifies which rowTree
will be used. ( Faith's index). (Default: "phylo"
)
node.label
: Character vector
or NULL
Specifies
the links between rows and node labels of phylogeny tree specified
by tree.name
. If a certain row is not linked with the tree, missing
instance should be noted as NA. When NULL
, all the rownames should
be found from the tree. (Faith's index). (Default: NULL
)
only.tips
: (Faith's index). Logical scalar
. Specifies
whether to remove internal nodes when Faith's index is calculated.
When only.tips=TRUE
, those rows that are not tips of tree are
removed. (Default: FALSE
)
threshold
: (Coverage and all evenness indices).
Numeric scalar
.
From 0 to 1
, determines the threshold for coverage and evenness
indices. When evenness indices are calculated values under or equal to
this threshold are denoted as zeroes. For coverage index, see details.
(Default: 0.5
for coverage, 0
for evenness indices)
quantile
: (log modulo skewness index). Numeric scalar
.
Arithmetic abundance classes are evenly cut up to to this quantile of the
data. The assumption is that abundances higher than this are not common,
and they are classified in their own group. (Default: 0.5
)
nclasses
: (log modulo skewness index). Integer scalar
.
The number of arithmetic abundance classes from zero to the quantile cutoff
indicated by quantile
. (Default: 50
)
ntaxa
: (absolute and relative indices). Integer scalar
.
The n-th position of the dominant taxa to consider. (Default: 1
)
aggregate
: (absolute, dbp, dmn, and relative indices).
Logical scalar
. Aggregate the values for top members selected by
ntaxa
or not. If TRUE
, then the sum of relative abundances
is returned. Otherwise the relative abundance is returned for the single
taxa with the indicated rank (default: aggregate = TRUE
).
detection
: (observed index). Numeric scalar
Selects
detection threshold for the abundances (Default: 0
)
Character scalar
. Specifies the name of assay
used in calculation. (Default: "counts"
)
Character vector
. Specifies the alpha diversity
indices to be calculated.
Character vector
. A name for the column of the
colData
where results will be stored. (Default: index
)
Integer scalar
. Specifies the number of
rarefaction rounds. Rarefaction is not applied when niter=NULL
(see Details section). (Default: NULL
)
A
BiocParallelParam
object specifying whether the calculation should be parallelized.
getAlpha
returns a DataFrame
.
addAlpha
returns a x
with additional colData
column(s)
named name
.
Alpha diversity is a joint quantity that combines elements or community richness and evenness. Diversity increases, in general, when species richness or evenness increase.
The following diversity indices are available:
'coverage': Number of species needed to cover a given fraction of the ecosystem (50 percent by default). Tune this with the threshold argument.
'faith': Faith's phylogenetic alpha diversity index measures how long the taxonomic distance is between taxa that are present in the sample. Larger values represent higher diversity. Using this index requires rowTree. (Faith 1992)
If the data includes features that are not in tree's tips but in
internal nodes, there are two options. First, you can keep those features,
and prune the tree to match features so that each tip can be found from
the features. Other option is to remove all features that are not tips.
(See only.tips
parameter)
'fisher': Fisher's alpha; as implemented in
vegan::fisher.alpha
. (Fisher et al. 1943)
'gini_simpson': Gini-Simpson diversity i.e. \(1 - lambda\),
where \(lambda\) is the
Simpson index, calculated as the sum of squared relative abundances.
This corresponds to the diversity index
'simpson' in vegan::diversity
.
This is also called Gibbs–Martin, or Blau index in sociology,
psychology and management studies. The Gini-Simpson index (1-lambda)
should not be
confused with Simpson's dominance (lambda), Gini index, or
inverse Simpson index (1/lambda).
'inverse_simpson': Inverse Simpson diversity: \(1/lambda\) where \(lambda=sum(p^2)\) and p refers to relative abundances. This corresponds to the diversity index 'invsimpson' in vegan::diversity. Don't confuse this with the closely related Gini-Simpson index
'log_modulo_skewness': The rarity index characterizes the concentration of species at low abundance. Here, we use the skewness of the frequency distribution of arithmetic abundance classes (see Magurran & McGill 2011). These are typically right-skewed; to avoid taking log of occasional negative skews, we follow Locey & Lennon (2016) and use the log-modulo transformation that adds a value of one to each measure of skewness to allow logarithmization.
'shannon': Shannon diversity (entropy).
A dominance index quantifies the dominance of one or few species in a community. Greater values indicate higher dominance.
Dominance indices are in general negatively correlated with alpha diversity indices (species richness, evenness, diversity, rarity). More dominant communities are less diverse.
The following community dominance indices are available:
'absolute': Absolute index equals to the absolute abundance of the
most dominant n species of the sample (specify the number with the argument
ntaxa
). Index gives positive integer values.
'dbp': Berger-Parker index (See Berger & Parker 1970) calculation is a special case of the 'relative' index. dbp is the relative abundance of the most abundant species of the sample. Index gives values in interval 0 to 1, where bigger value represent greater dominance.
$$dbp = \frac{N_1}{N_{tot}}$$ where \(N_1\) is the absolute abundance of the most dominant species and \(N_{tot}\) is the sum of absolute abundances of all species.
'core_abundance': Core abundance index is related to core species. Core species are species that are most abundant in all samples, i.e., in whole data set. Core species are defined as those species that have prevalence over 50\ species must be prevalent in 50\ calculate the core abundance index. Core abundance index is sum of relative abundances of core species in the sample. Index gives values in interval 0 to 1, where bigger value represent greater dominance.
$$core_abundance = \frac{N_{core}}{N_{tot}}$$ where \(N_{core}\) is the sum of absolute abundance of the core species and \(N_{tot}\) is the sum of absolute abundances of all species.
'gini': Gini index is probably best-known from socio-economic contexts (Gini 1921). In economics, it is used to measure, for example, how unevenly income is distributed among population. Here, Gini index is used similarly, but income is replaced with abundance.
If there is small group of species that represent large portion of total abundance of microbes, the inequality is large and Gini index closer to 1. If all species has equally large abundances, the equality is perfect and Gini index equals 0. This index should not be confused with Gini-Simpson index, which quantifies diversity.
'dmn': McNaughton’s index is the sum of relative abundances of the two most abundant species of the sample (McNaughton & Wolf, 1970). Index gives values in the unit interval:
$$dmn = (N_1 + N_2)/N_tot$$
where \(N_1\) and \(N_2\) are the absolute abundances of the two most dominant species and \(N_{tot}\) is the sum of absolute abundances of all species.
'relative': Relative index equals to the relative abundance of the
most dominant n species of the sample (specify the number with the
argument ntaxa
).
This index gives values in interval 0 to 1.
$$relative = N_1/N_tot$$
where \(N_1\) is the absolute abundance of the most dominant species and \(N_{tot}\) is the sum of absolute abundances of all species.
'simpson_lambda': Simpson's (dominance) index or Simpson's lambda is the sum of squared relative abundances. This index gives values in the unit interval. This value equals the probability that two randomly chosen individuals belongs to the same species. The higher the probability, the greater the dominance (See e.g. Simpson 1949).
$$lambda = \sum(p^2)$$
where p refers to relative abundances.
There is also a more advanced Simpson dominance index (Simpson 1949). However, this is not provided and the simpler squared sum of relative abundances is used instead as the alternative index is not in the unit interval and it is highly correlated with the simpler variant implemented here.
Evenness is a standard index in community ecology, and it quantifies how evenly the abundances of different species are distributed. The following evenness indices are provided:
By default, this function returns all indices.
The available evenness indices include the following (all in lowercase):
'camargo': Camargo's evenness (Camargo 1992)
'simpson_evenness': Simpson’s evenness is calculated as inverse Simpson diversity (1/lambda) divided by observed species richness S: (1/lambda)/S.
'pielou': Pielou's evenness (Pielou, 1966), also known as Shannon or Shannon-Weaver/Wiener/Weiner evenness; H/ln(S). The Shannon-Weaver is the preferred term; see Spellerberg and Fedor (2003).
'evar': Smith and Wilson’s Evar index (Smith & Wilson 1996).
'bulla': Bulla’s index (O) (Bulla 1994).
Desirable statistical evenness metrics avoid strong bias towards very large or very small abundances; are independent of richness; and range within the unit interval with increasing evenness (Smith & Wilson 1996). Evenness metrics that fulfill these criteria include at least camargo, simpson, smith-wilson, and bulla. Also see Magurran & McGill (2011) and Beisel et al. (2003) for further details.
The richness is calculated per sample. This is a standard index in community ecology, and it provides an estimate of the number of unique species in the community. This is often not directly observed for the whole community but only for a limited sample from the community. This has led to alternative richness indices that provide different ways to estimate the species richness.
Richness index differs from the concept of species diversity or evenness in that it ignores species abundance, and focuses on the binary presence/absence values that indicate simply whether the species was detected.
The function takes all index names in full lowercase. The user can provide
the desired spelling through the argument name
(see examples).
The following richness indices are provided.
'ace': Abundance-based coverage estimator (ACE) is another
nonparametric richness
index that uses sample coverage, defined based on the sum of the
probabilities
of the observed species. This method divides the species into abundant
(more than 10
reads or observations) and rare groups
in a sample and tends to underestimate the real number of species. The
ACE index
ignores the abundance information for the abundant species,
based on the assumption that the abundant species are observed regardless
of their
exact abundance. We use here the bias-corrected version
(O'Hara 2005, Chiu et al. 2014) implemented in
estimateR
.
For an exact formulation, see estimateR
.
Note that this index comes with an additional column with standard
error information.
'chao1': This is a nonparametric estimator of species richness. It
assumes that rare species carry information about the (unknown) number
of unobserved species. We use here the bias-corrected version
(O'Hara 2005, Chiu et al. 2014) implemented in
estimateR
. This index implicitly
assumes that every taxa has equal probability of being observed. Note
that it gives a lower bound to species richness. The bias-corrected
for an exact formulation, see estimateR
.
This estimator uses only the singleton and doubleton counts, and
hence it gives more weight to the low abundance species.
Note that this index comes with an additional column with standard
error information.
'hill': Effective species richness aka Hill index (see e.g. Chao et al. 2016). Currently only the case 1D is implemented. This corresponds to the exponent of Shannon diversity. Intuitively, the effective richness indicates the number of species whose even distribution would lead to the same diversity than the observed community, where the species abundances are unevenly distributed.
'observed': The observed richness gives the number of species that
is detected above a given detection
threshold in the observed sample
(default 0). This is conceptually the simplest richness index. The
corresponding index in the vegan package is "richness".
Beisel J-N. et al. (2003) A Comparative Analysis of Diversity Index Sensitivity. Internal Rev. Hydrobiol. 88(1):3-15. https://portais.ufg.br/up/202/o/2003-comparative_evennes_index.pdf
Berger WH & Parker FL (1970) Diversity of Planktonic Foraminifera in Deep-Sea Sediments. Science 168(3937):1345-1347. doi: 10.1126/science.168.3937.1345
Bulla L. (1994) An index of diversity and its associated diversity measure. Oikos 70:167–171
Camargo, JA. (1992) New diversity index for assessing structural alterations in aquatic communities. Bull. Environ. Contam. Toxicol. 48:428–434.
Chao A. (1984) Non-parametric estimation of the number of classes in a population. Scand J Stat. 11:265–270.
Chao A, Chun-Huo C, Jost L (2016). Phylogenetic Diversity Measures and Their Decomposition: A Framework Based on Hill Numbers. Biodiversity Conservation and Phylogenetic Systematics, Springer International Publishing, pp. 141–172, doi:10.1007/978-3-319-22461-9_8.
Chiu, C.H., Wang, Y.T., Walther, B.A. & Chao, A. (2014). Improved nonparametric lower bound of species richness via a modified Good-Turing frequency formula. Biometrics 70, 671-682.
Faith D.P. (1992) Conservation evaluation and phylogenetic diversity. Biological Conservation 61(1):1-10.
Fisher R.A., Corbet, A.S. & Williams, C.B. (1943) The relation between the number of species and the number of individuals in a random sample of animal population. Journal of Animal Ecology 12, 42-58.
Gini C (1921) Measurement of Inequality of Incomes. The Economic Journal 31(121): 124-126. doi: 10.2307/2223319
Locey KJ and Lennon JT. (2016) Scaling laws predict global microbial diversity. PNAS 113(21):5970-5975; doi:10.1073/pnas.1521291113.
Magurran AE, McGill BJ, eds (2011) Biological Diversity: Frontiers in Measurement and Assessment (Oxford Univ Press, Oxford), Vol 12.
McNaughton, SJ and Wolf LL. (1970). Dominance and the niche in ecological systems. Science 167:13, 1–139
O'Hara, R.B. (2005). Species richness estimators: how many species can dance on the head of a pin? J. Anim. Ecol. 74, 375-386.
Pielou, EC. (1966) The measurement of diversity in different types of biological collections. J Theoretical Biology 13:131–144.
Simpson EH (1949) Measurement of Diversity. Nature 163(688). doi: 10.1038/163688a0
Smith B and Wilson JB. (1996) A Consumer's Guide to Evenness Indices. Oikos 76(1):70-82.
Spellerberg and Fedor (2003). A tribute to Claude Shannon (1916 –2001) and a plea for more rigorous use of species richness, species diversity and the ‘Shannon–Wiener’ Index. Alpha Ecology & Biogeography 12, 177–197.
data("GlobalPatterns")
tse <- GlobalPatterns
# Calculate the default Shannon index with no rarefaction
tse <- addAlpha(tse, index = "shannon")
# Shows the estimated Shannon index
tse$shannon
#> [1] 6.576517 6.776603 6.498494 3.828368 3.287666 4.289269 4.849999 4.874747
#> [9] 2.672103 3.905419 3.093981 3.651142 3.552736 3.372495 4.027716 4.230515
#> [17] 4.483806 4.563943 6.157462 4.869817 5.461840 4.126538 3.452772 4.083665
#> [25] 3.956909 4.006375
# Calculate observed richness with 10 rarefaction rounds
tse <- addAlpha(tse,
assay.type = "counts",
index = "observed_richness",
sample = min(colSums(assay(tse, "counts")), na.rm = TRUE),
niter=10)
# Shows the estimated observed richness
tse$observed_richness
#> [1] 3502 3864 2845 740 588 1347 2261 1765 553 1597 792 889 2233 2034 2258
#> [16] 989 1162 872 2995 2076 2421 827 678 889 696 654
# One can also calculate the indices and get the results without adding
# them to colData
res <- getAlpha(tse, index = "shannon")
res |> head()
#> DataFrame with 6 rows and 1 column
#> shannon
#> <numeric>
#> CL3 6.57652
#> CC1 6.77660
#> SV1 6.49849
#> M31Fcsw 3.82837
#> M11Fcsw 3.28767
#> M31Plmr 4.28927