These functions calculate the population prevalence for taxonomic ranks in a SummarizedExperiment-class object.

getPrevalence(x, ...)

# S4 method for ANY
getPrevalence(
  x,
  detection = 0,
  include_lowest = FALSE,
  sort = FALSE,
  na.rm = TRUE,
  ...
)

# S4 method for SummarizedExperiment
getPrevalence(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  as_relative = FALSE,
  rank = NULL,
  ...
)

getPrevalentFeatures(x, ...)

# S4 method for ANY
getPrevalentFeatures(x, prevalence = 50/100, include_lowest = FALSE, ...)

# S4 method for SummarizedExperiment
getPrevalentFeatures(
  x,
  rank = NULL,
  prevalence = 50/100,
  include_lowest = FALSE,
  ...
)

getPrevalentTaxa(x, ...)

# S4 method for ANY
getPrevalentTaxa(x, ...)

getRareFeatures(x, ...)

# S4 method for ANY
getRareFeatures(x, prevalence = 50/100, include_lowest = FALSE, ...)

# S4 method for SummarizedExperiment
getRareFeatures(
  x,
  rank = NULL,
  prevalence = 50/100,
  include_lowest = FALSE,
  ...
)

getRareTaxa(x, ...)

# S4 method for ANY
getRareTaxa(x, ...)

subsetByPrevalentFeatures(x, ...)

# S4 method for SummarizedExperiment
subsetByPrevalentFeatures(x, rank = NULL, ...)

subsetByPrevalentTaxa(x, ...)

# S4 method for ANY
subsetByPrevalentTaxa(x, ...)

subsetByRareFeatures(x, ...)

# S4 method for SummarizedExperiment
subsetByRareFeatures(x, rank = NULL, ...)

subsetByRareTaxa(x, ...)

# S4 method for ANY
subsetByRareTaxa(x, ...)

getPrevalentAbundance(
  x,
  assay.type = assay_name,
  assay_name = "relabundance",
  ...
)

# S4 method for ANY
getPrevalentAbundance(
  x,
  assay.type = assay_name,
  assay_name = "relabundance",
  ...
)

# S4 method for SummarizedExperiment
getPrevalentAbundance(x, assay.type = assay_name, assay_name = "counts", ...)

Arguments

x

a SummarizedExperiment object

...

additional arguments

  • If !is.null(rank) arguments are passed on to agglomerateByRank. See ?agglomerateByRank for more details. Note that you can specify whether to remove empty ranks with agg.na.rm instead of na.rm. (default: FALSE)

  • for getPrevalentFeatures, getRareFeatures, subsetByPrevalentFeatures and subsetByRareFeatures additional parameters passed to getPrevalence

  • for getPrevalentAbundance additional parameters passed to getPrevalentFeatures

detection

Detection threshold for absence/presence. Either an absolute value compared directly to the values of x or a relative value between 0 and 1, if as_relative = FALSE.

include_lowest

logical scalar: Should the lower boundary of the detection and prevalence cutoffs be included? (default: FALSE)

sort

logical scalar: Should the result be sorted by prevalence? (default: FALSE)

na.rm

logical scalar: Should NA values be omitted when calculating prevalence? (default: na.rm = TRUE)

assay.type

A single character value for selecting the assay to use for prevalence calculation.

assay_name

a single character value for specifying which assay to use for calculation. (Please use assay.type instead. At some point assay_name will be disabled.)

as_relative

logical scalar: Should the detection threshold be applied on compositional (relative) abundances? (default: FALSE)

rank

a single character defining a taxonomic rank. Must be a value of taxonomyRanks() function.

prevalence

Prevalence threshold (in 0 to 1). The required prevalence is strictly greater by default. To include the limit, set include_lowest to TRUE.

Value

subsetPrevalentFeatures and subsetRareFeatures return subset of x.

All other functions return a named vectors:

  • getPrevalence returns a numeric vector with the names being set to either the row names of x or the names after agglomeration.

  • getPrevalentAbundance returns a numeric vector with the names corresponding to the column name of x and include the joint abundance of prevalent taxa.

  • getPrevalentTaxa and getRareFeatures return a character vector with only the names exceeding the threshold set by prevalence, if the rownames of x is set. Otherwise an integer vector is returned matching the rows in x.

Details

getPrevalence calculates the relative frequency of samples that exceed the detection threshold. For SummarizedExperiment objects, the prevalence is calculated for the selected taxonomic rank, otherwise for the rows. The absolute population prevalence can be obtained by multiplying the prevalence by the number of samples (ncol(x)). If as_relative = FALSE the relative frequency (between 0 and 1) is used to check against the detection threshold.

The core abundance index from getPrevalentAbundance gives the relative proportion of the core species (in between 0 and 1). The core taxa are defined as those that exceed the given population prevalence threshold at the given detection level as set for getPrevalentFeatures.

subsetPrevalentFeatures and subsetRareFeatures return a subset of x. The subset includes the most prevalent or rare taxa that are calculated with getPrevalentFeatures or getRareFeatures respectively.

getPrevalentFeatures returns taxa that are more prevalent with the given detection threshold for the selected taxonomic rank.

getRareFeatures returns complement of getPrevalentTaxa.

References

A Salonen et al. The adult intestinal core microbiota is determined by analysis depth and health status. Clinical Microbiology and Infection 18(S4):16 20, 2012. To cite the R package, see citation('mia')

Author

Leo Lahti For getPrevalentAbundance: Leo Lahti and Tuomas Borman. Contact: microbiome.github.io

Examples

data(GlobalPatterns)
tse <- GlobalPatterns
# Get prevalence estimates for individual ASV/OTU
prevalence.frequency <- getPrevalence(tse,
                                      detection = 0,
                                      sort = TRUE,
                                      as_relative = TRUE)
head(prevalence.frequency)
#> 145149 114821 108747 526804 332405  98605 
#>      1      1      1      1      1      1 

# Get prevalence estimates for phylums
# - the getPrevalence function itself always returns population frequencies
prevalence.frequency <- getPrevalence(tse,
                                      rank = "Phylum",
                                      detection = 0,
                                      sort = TRUE,
                                      as_relative = TRUE)
head(prevalence.frequency)
#>     Phylum:Chloroflexi           Phylum:WPS-2      Phylum:Firmicutes 
#>                      1                      1                      1 
#>  Phylum:Planctomycetes Phylum:Verrucomicrobia             Phylum:WS3 
#>                      1                      1                      1 

# - to obtain population counts, multiply frequencies with the sample size,
# which answers the question "In how many samples is this phylum detectable"
prevalence.count <- prevalence.frequency * ncol(tse)
head(prevalence.count)
#>     Phylum:Chloroflexi           Phylum:WPS-2      Phylum:Firmicutes 
#>                     26                     26                     26 
#>  Phylum:Planctomycetes Phylum:Verrucomicrobia             Phylum:WS3 
#>                     26                     26                     26 

# Detection threshold 1 (strictly greater by default);
# Note that the data (GlobalPatterns) is here in absolute counts
# (and not compositional, relative abundances)
# Prevalence threshold 50 percent (strictly greater by default)
prevalent <- getPrevalentFeatures(tse,
                              rank = "Phylum",
                              detection = 10,
                              prevalence = 50/100,
                              as_relative = FALSE)
head(prevalent)
#> [1] "Phylum:Crenarchaeota"  "Phylum:Euryarchaeota"  "Phylum:Actinobacteria"
#> [4] "Phylum:Spirochaetes"   "Phylum:Proteobacteria" "Phylum:Fusobacteria"  

# Gets a subset of object that includes prevalent taxa
altExp(tse, "prevalent") <- subsetByPrevalentFeatures(tse,
                                       rank = "Family",
                                       detection = 0.001,
                                       prevalence = 0.55,
                                       as_relative = TRUE)
altExp(tse, "prevalent")                                 
#> class: TreeSummarizedExperiment 
#> dim: 5 26 
#> metadata(1): agglomerated_by_rank
#> assays(1): counts
#> rownames(5): Order:Stramenopiles Family:Rhodobacteraceae_1
#>   Family:Flavobacteriaceae Order:Sphingobacteriales
#>   Family:Lachnospiraceae
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (5 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL

# getRareFeatures returns the inverse
rare <- getRareFeatures(tse,
                    rank = "Phylum",
                    detection = 1/100,
                    prevalence = 50/100,
                    as_relative = TRUE)
head(rare)
#> [1] "Phylum:Crenarchaeota" "Phylum:Euryarchaeota" "Phylum:Spirochaetes" 
#> [4] "Phylum:MVP-15"        "Phylum:SBR1093"       "Phylum:Fusobacteria" 

# Gets a subset of object that includes rare taxa
altExp(tse, "rare") <- subsetByRareFeatures(tse,
                             rank = "Class",
                             detection = 0.001,
                             prevalence = 0.001,
                             as_relative = TRUE)
altExp(tse, "rare")      
#> class: TreeSummarizedExperiment 
#> dim: 105 26 
#> metadata(1): agglomerated_by_rank
#> assays(1): counts
#> rownames(105): Class:Thermoprotei Class:Sd-NA ... Class:Thermotogae
#>   Class:Synergistia
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (105 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL

# Names of both experiments, prevalent and rare, can be found from slot altExpNames
tse
#> class: TreeSummarizedExperiment 
#> dim: 19216 26 
#> metadata(0):
#> assays(1): counts
#> rownames(19216): 549322 522457 ... 200359 271582
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(2): prevalent rare
#> rowLinks: a LinkDataFrame (19216 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
                         
data(esophagus)
getPrevalentAbundance(esophagus, assay.type = "counts")
#> Warning: The 'getPrevalentTaxa' function is deprecated. Use 'getPrevalentFeatures' instead.
#>         B         C         D 
#> 0.9605911 0.8980392 0.9086758