These function work on data present in rowData
and define a way to
represent taxonomic data alongside the features of a
SummarizedExperiment
.
taxonomyRanks(x)
# S4 method for class 'SummarizedExperiment'
taxonomyRanks(x)
taxonomyRankEmpty(
x,
rank = taxonomyRanks(x)[1L],
empty.fields = c(NA, "", " ", "\t", "-", "_")
)
# S4 method for class 'SummarizedExperiment'
taxonomyRankEmpty(
x,
rank = taxonomyRanks(x)[1],
empty.fields = c(NA, "", " ", "\t", "-", "_")
)
checkTaxonomy(x, ...)
# S4 method for class 'SummarizedExperiment'
checkTaxonomy(x)
setTaxonomyRanks(ranks)
getTaxonomyRanks()
getTaxonomyLabels(x, ...)
# S4 method for class 'SummarizedExperiment'
getTaxonomyLabels(
x,
empty.fields = c(NA, "", " ", "\t", "-", "_"),
with.rank = with_rank,
with_rank = FALSE,
make.unique = make_unique,
make_unique = TRUE,
resolve.loops = resolve_loops,
resolve_loops = FALSE,
...
)
mapTaxonomy(x, ...)
# S4 method for class 'SummarizedExperiment'
mapTaxonomy(
x,
taxa = NULL,
from = NULL,
to = NULL,
use.grepl = use_grepl,
use_grepl = FALSE
)
IdTaxaToDataFrame(from)
Character scalar
. Defines a taxonomic rank. Must be a
value of taxonomyRanks()
function.
Character vector
. Defines which values should be
regarded as empty. (Default: c(NA, "", " ", "\t")
). They will be
removed if na.rm = TRUE
before agglomeration.
optional arguments not used currently.
Character vector
. A vector of ranks to be set.
Logical scalar
. Should the level be add as a
suffix? For example: "Phylum:Crenarchaeota". (Default: FALSE
)
Deprecated. Use with.rank
instead.
Logical scalar
. Should the labels be made
unique, if there are any duplicates? (Default: TRUE
)
Deprecated. Use make.unique
instead.
Logical scalar
. Should resolveLoops
be applied to the taxonomic data? Please note that has only an effect,
if the data is unique. (Default: TRUE
)
Deprecated. Use resolve.loops
instead.
Character vector
. Used for subsetting the
taxonomic information. If no information is found,NULL
is returned
for the individual element. (Default: NULL
)
For mapTaxonomy
: character scalar
. A value which
must be a valid taxonomic rank. (Default: NULL
)
otherwise a Taxa
object as returned by
IdTaxa
Character Scalar
. Must be a valid
taxonomic rank. (Default: NULL
)
Logical
. Should pattern matching via
grepl
be used? Otherwise literal matching is used.
(Default: FALSE
)
Deprecated. Use use.grepl
instead.
taxonomyRanks
: a character
vector with all the
taxonomic ranks found in colnames(rowData(x))
taxonomyRankEmpty
: a logical
value
mapTaxonomy
: a list
per element of taxa. Each
element is either a DataFrame
, a character
or NULL
.
If all character
results have the length of one, a single
character
vector is returned.
taxonomyRanks
returns, which columns of rowData(x)
are regarded
as columns containing taxonomic information.
taxonomyRankEmpty
checks, if a selected rank is empty of information.
checkTaxonomy
checks, if taxonomy information is valid and whether
it contains any problems. This is a soft test, which reports some
diagnostic and might mature into a data validator used upon object
creation.
getTaxonomyLabels
generates a character vector per row consisting of
the lowest taxonomic information possible. If data from different levels,
is to be mixed, the taxonomic level is prepended by default.
IdTaxaToDataFrame
extracts taxonomic results from results of
IdTaxa
.
mapTaxonomy
maps the given features (taxonomic groups; taxa
)
to the specified taxonomic level (to
argument) in rowData
of the SummarizedExperiment
data object
(i.e. rowData(x)[,taxonomyRanks(x)]
). If the argument to
is
not provided, then all matching taxonomy rows in rowData
will be
returned. This function allows handy conversions between different
Taxonomic information from the IdTaxa
function of DECIPHER
package are returned as a special class. With as(taxa,"DataFrame")
the information can be easily converted to a DataFrame
compatible
with storing the taxonomic information a rowData
. Please note that the
assigned confidence information are returned as metatdata
and can
be accessed using metadata(df)$confidence
.
data(GlobalPatterns)
GlobalPatterns
#> class: TreeSummarizedExperiment
#> dim: 19216 26
#> metadata(0):
#> assays(1): counts
#> rownames(19216): 549322 522457 ... 200359 271582
#> rowData names(7): Kingdom Phylum ... Genus Species
#> colnames(26): CL3 CC1 ... Even2 Even3
#> colData names(7): X.SampleID Primer ... SampleType Description
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (19216 rows)
#> rowTree: 1 phylo tree(s) (19216 leaves)
#> colLinks: NULL
#> colTree: NULL
taxonomyRanks(GlobalPatterns)
#> [1] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus" "Species"
checkTaxonomy(GlobalPatterns)
#> [1] TRUE
table(taxonomyRankEmpty(GlobalPatterns,"Kingdom"))
#>
#> FALSE
#> 19216
table(taxonomyRankEmpty(GlobalPatterns,"Species"))
#>
#> FALSE TRUE
#> 1413 17803
getTaxonomyLabels(GlobalPatterns[1:20,])
#> [1] "Class:Thermoprotei" "Class:Thermoprotei_1"
#> [3] "Species:Sulfolobusacidocaldarius" "Class:Sd-NA"
#> [5] "Class:Sd-NA_1" "Class:Sd-NA_2"
#> [7] "Order:NRP-J" "Order:NRP-J_1"
#> [9] "Order:NRP-J_2" "Order:NRP-J_3"
#> [11] "Order:NRP-J_4" "Family:SAGMA-X"
#> [13] "Family:SAGMA-X_1" "Family:SAGMA-X_2"
#> [15] "Family:Cenarchaeaceae" "Family:Cenarchaeaceae_1"
#> [17] "Family:Cenarchaeaceae_2" "Family:Cenarchaeaceae_3"
#> [19] "Family:Cenarchaeaceae_4" "Family:Cenarchaeaceae_5"
# mapTaxonomy
## returns the unique taxonomic information
mapTaxonomy(GlobalPatterns)
#> DataFrame with 2307 rows and 7 columns
#> Kingdom Phylum Class Order
#> <character> <character> <character> <character>
#> 549322 Archaea Crenarchaeota Thermoprotei NA
#> 951 Archaea Crenarchaeota Thermoprotei Sulfolobales
#> 244423 Archaea Crenarchaeota Sd-NA NA
#> 143239 Archaea Crenarchaeota Sd-NA NRP-J
#> 215972 Archaea Crenarchaeota Thaumarchaeota Cenarchaeales
#> ... ... ... ... ...
#> 246195 Bacteria Synergistetes Synergistia Synergistales
#> 484439 Bacteria Synergistetes Synergistia Synergistales
#> 579616 Bacteria Synergistetes Synergistia Synergistales
#> 546622 Bacteria Firmicutes Clostridia Thermoanaerobacterales
#> 278222 Bacteria SR1 NA NA
#> Family Genus Species
#> <character> <character> <character>
#> 549322 NA NA NA
#> 951 Sulfolobaceae Sulfolobus Sulfolobusacidocalda..
#> 244423 NA NA NA
#> 143239 NA NA NA
#> 215972 SAGMA-X NA NA
#> ... ... ... ...
#> 246195 Dethiosulfovibrionac.. TG5 NA
#> 484439 Dethiosulfovibrionac.. Jonquetella Jonquetellaanthropi
#> 579616 Dethiosulfovibrionac.. Pyramidobacter NA
#> 546622 Thermodesulfobiaceae Coprothermobacter NA
#> 278222 NA NA NA
# returns specific unique taxonomic information
mapTaxonomy(GlobalPatterns, taxa = "Escherichia")
#> $Escherichia
#> DataFrame with 1 row and 7 columns
#> Kingdom Phylum Class Order
#> <character> <character> <character> <character>
#> 249227 Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales
#> Family Genus Species
#> <character> <character> <character>
#> 249227 Enterobacteriaceae Escherichia NA
#>
# returns information on a single output
mapTaxonomy(GlobalPatterns, taxa = "Escherichia",to="Family")
#> Escherichia
#> "Enterobacteriaceae"
# setTaxonomyRanks
tse <- GlobalPatterns
colnames(rowData(tse))[1] <- "TAXA1"
setTaxonomyRanks(colnames(rowData(tse)))
# Taxonomy ranks set to: taxa1 phylum class order family genus species
# getTaxonomyRanks is to get/check if the taxonomic ranks is set to "TAXA1"
getTaxonomyRanks()
#> [1] "taxa1" "phylum" "class" "order" "family" "genus" "species"