R/getPrevalence.R
getPrevalence.Rd
These functions calculate the population prevalence for taxonomic ranks in a
SummarizedExperiment-class
object.
getPrevalence(x, ...)
# S4 method for class 'ANY'
getPrevalence(
x,
detection = 0,
include.lowest = include_lowest,
include_lowest = FALSE,
sort = FALSE,
na.rm = TRUE,
...
)
# S4 method for class 'SummarizedExperiment'
getPrevalence(
x,
assay.type = assay_name,
assay_name = "counts",
rank = NULL,
...
)
getPrevalent(x, ...)
# S4 method for class 'ANY'
getPrevalent(
x,
prevalence = 50/100,
include.lowest = include_lowest,
include_lowest = FALSE,
...
)
# S4 method for class 'SummarizedExperiment'
getPrevalent(
x,
rank = NULL,
prevalence = 50/100,
include.lowest = include_lowest,
include_lowest = FALSE,
...
)
getRare(x, ...)
# S4 method for class 'ANY'
getRare(
x,
prevalence = 50/100,
include.lowest = include_lowest,
include_lowest = FALSE,
...
)
# S4 method for class 'SummarizedExperiment'
getRare(
x,
rank = NULL,
prevalence = 50/100,
include.lowest = include_lowest,
include_lowest = FALSE,
...
)
subsetByPrevalent(x, ...)
# S4 method for class 'SummarizedExperiment'
subsetByPrevalent(x, rank = NULL, ...)
# S4 method for class 'TreeSummarizedExperiment'
subsetByPrevalent(x, update.tree = FALSE, ...)
subsetByRare(x, ...)
# S4 method for class 'SummarizedExperiment'
subsetByRare(x, rank = NULL, ...)
# S4 method for class 'TreeSummarizedExperiment'
subsetByRare(x, update.tree = FALSE, ...)
getPrevalentAbundance(
x,
assay.type = assay_name,
assay_name = "relabundance",
...
)
# S4 method for class 'ANY'
getPrevalentAbundance(
x,
assay.type = assay_name,
assay_name = "relabundance",
...
)
# S4 method for class 'SummarizedExperiment'
getPrevalentAbundance(x, assay.type = assay_name, assay_name = "counts", ...)
additional arguments
If !is.null(rank)
arguments are passed on to
agglomerateByRank
. See
?agglomerateByRank
for more details.
for getPrevalent
, getRare
, subsetByPrevalent
and subsetByRare
additional parameters passed to
getPrevalence
for getPrevalentAbundance
additional parameters passed to
getPrevalent
Numeric scalar
. Detection threshold for
absence/presence. If as_relative = FALSE
,
it sets the counts threshold for a taxon to be considered present.
If as_relative = TRUE
, it sets the relative abundance threshold
for a taxon to be considered present. (Default: 0
)
Logical scalar
. Should the lower boundary of the
detection and prevalence cutoffs be included? (Default: FALSE
)
Deprecated. Use include.lowest
instead.
Logical scalar
. Should the result be sorted by prevalence?
(Default: FALSE
)
Logical scalar
. Should NA values be omitted?
(Default: TRUE
)
Character scalar
. Specifies which assay to use for
calculation. (Default: "counts"
)
Deprecated. Use assay.type
instead.
Character scalar
. Defines a taxonomic rank. Must be a
value of taxonomyRanks()
function.
Prevalence threshold (in 0 to 1). The
required prevalence is strictly greater by default. To include the
limit, set include.lowest
to TRUE
.
Logical scalar
. Should
rowTree()
also be agglomerated? (Default: FALSE
)
subsetPrevalent
and subsetRareFeatures
return subset of
x
.
All other functions return a named vectors:
getPrevalence
returns a numeric
vector with the
names being set to either the row names of x
or the names after
agglomeration.
getPrevalentAbundance
returns a numeric
vector with
the names corresponding to the column name of x
and include the
joint abundance of prevalent taxa.
getPrevalent
and getRare
return a
character
vector with only the names exceeding the threshold set
by prevalence
, if the rownames
of x
is set.
Otherwise an integer
vector is returned matching the rows in
x
.
getPrevalence
calculates the frequency of samples that exceed
the detection threshold. For SummarizedExperiment
objects, the
prevalence is calculated for the selected taxonomic rank, otherwise for the
rows. The absolute population prevalence can be obtained by multiplying the
prevalence by the number of samples (ncol(x)
).
The core abundance index from getPrevalentAbundance
gives the relative
proportion of the core species (in between 0 and 1). The core taxa are
defined as those that exceed the given population prevalence threshold at the
given detection level as set for getPrevalent
.
subsetPrevalent
and subsetRareFeatures
return a subset of
x
.
The subset includes the most prevalent or rare taxa that are calculated with
getPrevalent
or getRare
respectively.
getPrevalent
returns taxa that are more prevalent with the
given detection threshold for the selected taxonomic rank.
getRare
returns complement of getPrevalent
.
A Salonen et al. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16 20, 2012. To cite the R package, see citation('mia')
data(GlobalPatterns)
tse <- GlobalPatterns
# Get prevalence estimates for individual ASV/OTU
prevalence.frequency <- getPrevalence(tse,
detection = 0,
sort = TRUE)
head(prevalence.frequency)
#> 145149 114821 108747 526804 332405 98605
#> 1 1 1 1 1 1
# Get prevalence estimates for phyla
# - the getPrevalence function itself always returns population frequencies
prevalence.frequency <- getPrevalence(tse,
rank = "Phylum",
detection = 0,
sort = TRUE)
head(prevalence.frequency)
#> WS3 WPS-2 Verrucomicrobia Tenericutes Spirochaetes
#> 1 1 1 1 1
#> Proteobacteria
#> 1
# - to obtain population counts, multiply frequencies with the sample size,
# which answers the question "In how many samples is this phylum detectable"
prevalence.count <- prevalence.frequency * ncol(tse)
head(prevalence.count)
#> WS3 WPS-2 Verrucomicrobia Tenericutes Spirochaetes
#> 26 26 26 26 26
#> Proteobacteria
#> 26
# Detection threshold 1 (strictly greater by default);
# Note that the data (GlobalPatterns) is here in absolute counts
# (and not compositional, relative abundances)
# Prevalence threshold 50 percent (strictly greater by default)
prevalent <- getPrevalent(
tse,
rank = "Phylum",
detection = 10,
prevalence = 50/100)
head(prevalent)
#> [1] "Acidobacteria" "Actinobacteria" "Bacteroidetes" "Chlorobi"
#> [5] "Chloroflexi" "Crenarchaeota"
# Add relative aundance data
tse <- transformAssay(tse, assay.type = "counts", method = "relabundance")
# Gets a subset of object that includes prevalent taxa
altExp(tse, "prevalent") <- subsetByPrevalent(tse,
rank = "Family",
assay.type = "relabundance",
detection = 0.001,
prevalence = 0.55)
altExp(tse, "prevalent")
#> class: TreeSummarizedExperiment
#> dim: 3 26
#> metadata(1): agglomerated_by_rank
#> assays(2): counts relabundance
#> rownames(3): Flavobacteriaceae Lachnospiraceae Rhodobacteraceae_1
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (3 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
# getRare returns the inverse
rare <- getRare(tse,
rank = "Phylum",
assay.type = "relabundance",
detection = 1/100,
prevalence = 50/100)
head(rare)
#> [1] "ABY1_OD1" "AC1" "AD3" "Acidobacteria"
#> [5] "Armatimonadetes" "BRC1"
# Gets a subset of object that includes rare taxa
altExp(tse, "rare") <- subsetByRare(
tse,
rank = "Class",
assay.type = "relabundance",
detection = 0.001,
prevalence = 0.001)
altExp(tse, "rare")
#> class: TreeSummarizedExperiment
#> dim: 71 26
#> metadata(1): agglomerated_by_rank
#> assays(2): counts relabundance
#> rownames(71): 09D2Y74 12-24 ... koll11 vadinHA49
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (71 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
# Names of both experiments, prevalent and rare, can be found from slot
# altExpNames
tse
#> class: TreeSummarizedExperiment
#> dim: 19216 26
#> metadata(0):
#> assays(2): counts relabundance
#> rownames(19216): 549322 522457 ... 200359 271582
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(2): prevalent rare
#> rowLinks: a LinkDataFrame (19216 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
data(esophagus)
getPrevalentAbundance(esophagus, assay.type = "counts")
#> B C D
#> 0.9605911 0.8980392 0.9086758