These function work on data present in rowData
and define a way to
represent taxonomic data alongside the features of a
SummarizedExperiment
.
TAXONOMY_RANKS
taxonomyRanks(x)
# S4 method for SummarizedExperiment
taxonomyRanks(x)
taxonomyRankEmpty(
x,
rank = taxonomyRanks(x)[1L],
empty.fields = c(NA, "", " ", "\t", "-", "_")
)
# S4 method for SummarizedExperiment
taxonomyRankEmpty(
x,
rank = taxonomyRanks(x)[1],
empty.fields = c(NA, "", " ", "\t", "-", "_")
)
checkTaxonomy(x, ...)
# S4 method for SummarizedExperiment
checkTaxonomy(x)
setTaxonomyRanks(ranks)
getTaxonomyRanks()
getTaxonomyLabels(x, ...)
# S4 method for SummarizedExperiment
getTaxonomyLabels(
x,
empty.fields = c(NA, "", " ", "\t", "-", "_"),
with_rank = FALSE,
make_unique = TRUE,
resolve_loops = FALSE,
...
)
mapTaxonomy(x, ...)
# S4 method for SummarizedExperiment
mapTaxonomy(x, taxa = NULL, from = NULL, to = NULL, use_grepl = FALSE)
IdTaxaToDataFrame(from)
a character
vector of length 8 containing the taxonomy ranks
recognized. In functions this is used as case insensitive.
a
SummarizedExperiment
object
a single character defining a taxonomic rank. Must be a value of
taxonomyRanks()
function
a character
value defining, which values should be
regarded as empty. (Default: c(NA, "", " ", "\t")
). They will be
removed if na.rm = TRUE
before agglomeration
optional arguments not used currently.
Avector of ranks to be set
TRUE
or FALSE
: Should the level be add as a
suffix? For example: "Phylum:Crenarchaeota" (default:
with_rank = FALSE
)
TRUE
or FALSE
: Should the labels be made
unique, if there are any duplicates? (default: make_unique = TRUE
)
TRUE
or FALSE
: Should resolveLoops
be applied to the taxonomic data? Please note that has only an effect,
if the data is unique. (default: resolve_loops = TRUE
)
a character
vector, which is used for subsetting the
taxonomic information. If no information is found,NULL
is returned
for the individual element. (default: NULL
)
For mapTaxonomy
: a scalar character
value, which
must be a valid taxonomic rank. (default: NULL
)
otherwise a Taxa
object as returned by
IdTaxa
a scalar character
value, which must be a valid
taxonomic rank. (default: NULL
)
TRUE
or FALSE
: should pattern matching via
grepl
be used? Otherwise literal matching is used.
(default: FALSE
)
taxonomyRanks
: a character
vector with all the
taxonomic ranks found in colnames(rowData(x))
taxonomyRankEmpty
: a logical
value
mapTaxonomy
: a list
per element of taxa. Each
element is either a DataFrame
, a character
or NULL
.
If all character
results have the length of one, a single
character
vector is returned.
taxonomyRanks
returns, which columns of rowData(x)
are regarded
as columns containing taxonomic information.
taxonomyRankEmpty
checks, if a selected rank is empty of information.
checkTaxonomy
checks, if taxonomy information is valid and whether
it contains any problems. This is a soft test, which reports some
diagnostic and might mature into a data validator used upon object
creation.
getTaxonomyLabels
generates a character vector per row consisting of
the lowest taxonomic information possible. If data from different levels,
is to be mixed, the taxonomic level is prepended by default.
IdTaxaToDataFrame
extracts taxonomic results from results of
IdTaxa
.
mapTaxonomy
maps the given features (taxonomic groups; taxa
)
to the specified taxonomic level (to
argument) in rowData
of the SummarizedExperiment
data object
(i.e. rowData(x)[,taxonomyRanks(x)]
). If the argument to
is
not provided, then all matching taxonomy rows in rowData
will be
returned. This function allows handy conversions between different
Taxonomic information from the IdTaxa
function of DECIPHER
package are returned as a special class. With as(taxa,"DataFrame")
the information can be easily converted to a DataFrame
compatible
with storing the taxonomic information a rowData
. Please note that the
assigned confidence information are returned as metatdata
and can
be accessed using metadata(df)$confidence
.
data(GlobalPatterns)
GlobalPatterns
#> class: TreeSummarizedExperiment
#> dim: 19216 26
#> metadata(0):
#> assays(1): counts
#> rownames(19216): 549322 522457 ... 200359 271582
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (19216 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
taxonomyRanks(GlobalPatterns)
#> [1] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus" "Species"
checkTaxonomy(GlobalPatterns)
#> [1] TRUE
table(taxonomyRankEmpty(GlobalPatterns,"Kingdom"))
#>
#> FALSE
#> 19216
table(taxonomyRankEmpty(GlobalPatterns,"Species"))
#>
#> FALSE TRUE
#> 1413 17803
getTaxonomyLabels(GlobalPatterns[1:20,])
#> [1] "Class:Thermoprotei" "Class:Thermoprotei_1"
#> [3] "Species:Sulfolobusacidocaldarius" "Class:Sd-NA"
#> [5] "Class:Sd-NA_1" "Class:Sd-NA_2"
#> [7] "Order:NRP-J" "Order:NRP-J_1"
#> [9] "Order:NRP-J_2" "Order:NRP-J_3"
#> [11] "Order:NRP-J_4" "Family:SAGMA-X"
#> [13] "Family:SAGMA-X_1" "Family:SAGMA-X_2"
#> [15] "Family:Cenarchaeaceae" "Family:Cenarchaeaceae_1"
#> [17] "Family:Cenarchaeaceae_2" "Family:Cenarchaeaceae_3"
#> [19] "Family:Cenarchaeaceae_4" "Family:Cenarchaeaceae_5"
# mapTaxonomy
## returns the unique taxonomic information
mapTaxonomy(GlobalPatterns)
#> DataFrame with 2307 rows and 7 columns
#> Kingdom Phylum Class Order
#> <character> <character> <character> <character>
#> 549322 Archaea Crenarchaeota Thermoprotei NA
#> 951 Archaea Crenarchaeota Thermoprotei Sulfolobales
#> 244423 Archaea Crenarchaeota Sd-NA NA
#> 143239 Archaea Crenarchaeota Sd-NA NRP-J
#> 215972 Archaea Crenarchaeota Thaumarchaeota Cenarchaeales
#> ... ... ... ... ...
#> 246195 Bacteria Synergistetes Synergistia Synergistales
#> 484439 Bacteria Synergistetes Synergistia Synergistales
#> 579616 Bacteria Synergistetes Synergistia Synergistales
#> 546622 Bacteria Firmicutes Clostridia Thermoanaerobacterales
#> 278222 Bacteria SR1 NA NA
#> Family Genus Species
#> <character> <character> <character>
#> 549322 NA NA NA
#> 951 Sulfolobaceae Sulfolobus Sulfolobusacidocalda..
#> 244423 NA NA NA
#> 143239 NA NA NA
#> 215972 SAGMA-X NA NA
#> ... ... ... ...
#> 246195 Dethiosulfovibrionac.. TG5 NA
#> 484439 Dethiosulfovibrionac.. Jonquetella Jonquetellaanthropi
#> 579616 Dethiosulfovibrionac.. Pyramidobacter NA
#> 546622 Thermodesulfobiaceae Coprothermobacter NA
#> 278222 NA NA NA
# returns specific unique taxonomic information
mapTaxonomy(GlobalPatterns, taxa = "Escherichia")
#> $Escherichia
#> DataFrame with 1 row and 7 columns
#> Kingdom Phylum Class Order
#> <character> <character> <character> <character>
#> 249227 Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales
#> Family Genus Species
#> <character> <character> <character>
#> 249227 Enterobacteriaceae Escherichia NA
#>
# returns information on a single output
mapTaxonomy(GlobalPatterns, taxa = "Escherichia",to="Family")
#> Escherichia
#> "Enterobacteriaceae"
# setTaxonomyRanks
tse <- GlobalPatterns
colnames(rowData(tse))[1] <- "TAXA1"
setTaxonomyRanks(colnames(rowData(tse)))
# Taxonomy ranks set to: taxa1 phylum class order family genus species
# getTaxonomyRanks is to get/check if the taxonomic ranks is set to "TAXA1"
getTaxonomyRanks()
#> [1] "taxa1" "phylum" "class" "order" "family" "genus" "species"