These function work on data present in rowData and define a way to represent taxonomic data alongside the features of a SummarizedExperiment.

TAXONOMY_RANKS

taxonomyRanks(x)

# S4 method for SummarizedExperiment
taxonomyRanks(x)

taxonomyRankEmpty(
  x,
  rank = taxonomyRanks(x)[1L],
  empty.fields = c(NA, "", " ", "\t", "-", "_")
)

# S4 method for SummarizedExperiment
taxonomyRankEmpty(
  x,
  rank = taxonomyRanks(x)[1],
  empty.fields = c(NA, "", " ", "\t", "-", "_")
)

checkTaxonomy(x, ...)

# S4 method for SummarizedExperiment
checkTaxonomy(x)

setTaxonomyRanks(ranks)

getTaxonomyRanks()

getTaxonomyLabels(x, ...)

# S4 method for SummarizedExperiment
getTaxonomyLabels(
  x,
  empty.fields = c(NA, "", " ", "\t", "-", "_"),
  with_rank = FALSE,
  make_unique = TRUE,
  resolve_loops = FALSE,
  ...
)

mapTaxonomy(x, ...)

# S4 method for SummarizedExperiment
mapTaxonomy(x, taxa = NULL, from = NULL, to = NULL, use_grepl = FALSE)

IdTaxaToDataFrame(from)

Format

a character vector of length 8 containing the taxonomy ranks recognized. In functions this is used as case insensitive.

Arguments

x

a SummarizedExperiment object

rank

a single character defining a taxonomic rank. Must be a value of taxonomyRanks() function

empty.fields

a character value defining, which values should be regarded as empty. (Default: c(NA, "", " ", "\t")). They will be removed if na.rm = TRUE before agglomeration

...

optional arguments not used currently.

ranks

Avector of ranks to be set

with_rank

TRUE or FALSE: Should the level be add as a suffix? For example: "Phylum:Crenarchaeota" (default: with_rank = FALSE)

make_unique

TRUE or FALSE: Should the labels be made unique, if there are any duplicates? (default: make_unique = TRUE)

resolve_loops

TRUE or FALSE: Should resolveLoops be applied to the taxonomic data? Please note that has only an effect, if the data is unique. (default: resolve_loops = TRUE)

taxa

a character vector, which is used for subsetting the taxonomic information. If no information is found,NULL is returned for the individual element. (default: NULL)

from
  • For mapTaxonomy: a scalar character value, which must be a valid taxonomic rank. (default: NULL)

  • otherwise a Taxa object as returned by IdTaxa

to

a scalar character value, which must be a valid taxonomic rank. (default: NULL)

use_grepl

TRUE or FALSE: should pattern matching via grepl be used? Otherwise literal matching is used. (default: FALSE)

Value

  • taxonomyRanks: a character vector with all the taxonomic ranks found in colnames(rowData(x))

  • taxonomyRankEmpty: a logical value

  • mapTaxonomy: a list per element of taxa. Each element is either a DataFrame, a character or NULL. If all character results have the length of one, a single character vector is returned.

Details

taxonomyRanks returns, which columns of rowData(x) are regarded as columns containing taxonomic information.

taxonomyRankEmpty checks, if a selected rank is empty of information.

checkTaxonomy checks, if taxonomy information is valid and whether it contains any problems. This is a soft test, which reports some diagnostic and might mature into a data validator used upon object creation.

getTaxonomyLabels generates a character vector per row consisting of the lowest taxonomic information possible. If data from different levels, is to be mixed, the taxonomic level is prepended by default.

IdTaxaToDataFrame extracts taxonomic results from results of IdTaxa.

mapTaxonomy maps the given features (taxonomic groups; taxa) to the specified taxonomic level (to argument) in rowData of the SummarizedExperiment data object (i.e. rowData(x)[,taxonomyRanks(x)]). If the argument to is not provided, then all matching taxonomy rows in rowData will be returned. This function allows handy conversions between different

Taxonomic information from the IdTaxa function of DECIPHER package are returned as a special class. With as(taxa,"DataFrame") the information can be easily converted to a DataFrame compatible with storing the taxonomic information a rowData. Please note that the assigned confidence information are returned as metatdata and can be accessed using metadata(df)$confidence.

Examples

data(GlobalPatterns)
GlobalPatterns
#> class: TreeSummarizedExperiment 
#> dim: 19216 26 
#> metadata(0):
#> assays(1): counts
#> rownames(19216): 549322 522457 ... 200359 271582
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (19216 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
taxonomyRanks(GlobalPatterns)
#> [1] "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"

checkTaxonomy(GlobalPatterns)
#> [1] TRUE

table(taxonomyRankEmpty(GlobalPatterns,"Kingdom"))
#> 
#> FALSE 
#> 19216 
table(taxonomyRankEmpty(GlobalPatterns,"Species"))
#> 
#> FALSE  TRUE 
#>  1413 17803 

getTaxonomyLabels(GlobalPatterns[1:20,])
#>  [1] "Class:Thermoprotei"               "Class:Thermoprotei_1"            
#>  [3] "Species:Sulfolobusacidocaldarius" "Class:Sd-NA"                     
#>  [5] "Class:Sd-NA_1"                    "Class:Sd-NA_2"                   
#>  [7] "Order:NRP-J"                      "Order:NRP-J_1"                   
#>  [9] "Order:NRP-J_2"                    "Order:NRP-J_3"                   
#> [11] "Order:NRP-J_4"                    "Family:SAGMA-X"                  
#> [13] "Family:SAGMA-X_1"                 "Family:SAGMA-X_2"                
#> [15] "Family:Cenarchaeaceae"            "Family:Cenarchaeaceae_1"         
#> [17] "Family:Cenarchaeaceae_2"          "Family:Cenarchaeaceae_3"         
#> [19] "Family:Cenarchaeaceae_4"          "Family:Cenarchaeaceae_5"         

# mapTaxonomy
## returns the unique taxonomic information
mapTaxonomy(GlobalPatterns)
#> DataFrame with 2307 rows and 7 columns
#>            Kingdom        Phylum          Class                  Order
#>        <character>   <character>    <character>            <character>
#> 549322     Archaea Crenarchaeota   Thermoprotei                     NA
#> 951        Archaea Crenarchaeota   Thermoprotei           Sulfolobales
#> 244423     Archaea Crenarchaeota          Sd-NA                     NA
#> 143239     Archaea Crenarchaeota          Sd-NA                  NRP-J
#> 215972     Archaea Crenarchaeota Thaumarchaeota          Cenarchaeales
#> ...            ...           ...            ...                    ...
#> 246195    Bacteria Synergistetes    Synergistia          Synergistales
#> 484439    Bacteria Synergistetes    Synergistia          Synergistales
#> 579616    Bacteria Synergistetes    Synergistia          Synergistales
#> 546622    Bacteria    Firmicutes     Clostridia Thermoanaerobacterales
#> 278222    Bacteria           SR1             NA                     NA
#>                        Family             Genus                Species
#>                   <character>       <character>            <character>
#> 549322                     NA                NA                     NA
#> 951             Sulfolobaceae        Sulfolobus Sulfolobusacidocalda..
#> 244423                     NA                NA                     NA
#> 143239                     NA                NA                     NA
#> 215972                SAGMA-X                NA                     NA
#> ...                       ...               ...                    ...
#> 246195 Dethiosulfovibrionac..               TG5                     NA
#> 484439 Dethiosulfovibrionac..       Jonquetella    Jonquetellaanthropi
#> 579616 Dethiosulfovibrionac..    Pyramidobacter                     NA
#> 546622   Thermodesulfobiaceae Coprothermobacter                     NA
#> 278222                     NA                NA                     NA
# returns specific unique taxonomic information
mapTaxonomy(GlobalPatterns, taxa = "Escherichia")
#> $Escherichia
#> DataFrame with 1 row and 7 columns
#>            Kingdom         Phylum               Class             Order
#>        <character>    <character>         <character>       <character>
#> 249227    Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales
#>                    Family       Genus     Species
#>               <character> <character> <character>
#> 249227 Enterobacteriaceae Escherichia          NA
#> 
# returns information on a single output
mapTaxonomy(GlobalPatterns, taxa = "Escherichia",to="Family")
#>          Escherichia 
#> "Enterobacteriaceae" 

# setTaxonomyRanks
tse <- GlobalPatterns
colnames(rowData(tse))[1] <- "TAXA1"

setTaxonomyRanks(colnames(rowData(tse)))
# Taxonomy ranks set to: taxa1 phylum class order family genus species 

# getTaxonomyRanks is to get/check if the taxonomic ranks is set to "TAXA1"
getTaxonomyRanks()
#> [1] "taxa1"   "phylum"  "class"   "order"   "family"  "genus"   "species"