Get dominant taxa — getDominant • mia

These functions return information about the most dominant taxa in a SummarizedExperiment object.

getDominant(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  group = rank,
  rank = NULL,
  other.name = "Other",
  n = NULL,
  complete = TRUE,
  ...
)

addDominant(x, name = "dominant_taxa", other.name = "Other", n = NULL, ...)

# S4 method for class 'SummarizedExperiment'
getDominant(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  group = rank,
  rank = NULL,
  other.name = "Other",
  n = NULL,
  complete = TRUE,
  ...
)

# S4 method for class 'SummarizedExperiment'
addDominant(
  x,
  name = "dominant_taxa",
  other.name = "Other",
  n = NULL,
  complete = FALSE,
  ...
)

Arguments

x: TreeSummarizedExperiment.
assay.type: Character scalar. Specifies the name of assay used in calculation. (Default: "counts")
assay_name: Deprecated. Use assay.type instead.
group: Character scalar. Defines a group. Must be one of the columns from rowData(x). (Default: NULL)
rank: Deprecated. Use group instead.
other.name: Character scalar. A name for features that are not included in n the most frequent dominant features in the data. (Default: "Other")
n: Numeric scalar. The number of features that are the most frequent dominant features. Default is NULL, which defaults that each sample is assigned a dominant taxon. (Default: NULL)
complete: Logical scalar. A value to manage multiple dominant taxa for a sample. Default for getDominant is TRUE to include all equally dominant taxa for each sample. complete = FALSE samples one taxa for the samples that have multiple. Default for addDominant is FALSE to add a column with only one dominant taxon assigned for each sample into colData. complete = TRUE adds a list that includes all dominant taxa for each sample into colData.
...: Additional arguments passed on to agglomerateByRank() when rank is specified.
name: Character scalar. A name for the column of the colData where results will be stored. (Default: "dominant_taxa")

Value

getDominant returns a named character vector x while addDominant returns SummarizedExperiment with additional column in colData named *name*.

Details

addDominant extracts the most abundant taxa in a SummarizedExperiment object, and stores the information in the colData. It is a wrapper for getDominant.

With group parameter, it is possible to agglomerate rows based on groups. If the value is one of the columns in taxonomyRanks(), agglomerateByRank() is applied. Otherwise, agglomerateByVariable() is utilized. E.g. if 'Genus' rank is used, all abundances of same Genus are added together, and agglomerated features are returned. See corresponding functions for additional arguments to deal with missing values or special characters.

Examples

data(GlobalPatterns)
x <- GlobalPatterns

# Finds the dominant taxa.
sim.dom <- getDominant(x, group = "Genus")

# Add information to colData
x <- addDominant(x, group = "Genus", name ="dominant_genera")
colData(x)
#> DataFrame with 26 rows and 8 columns
#>         X.SampleID   Primer Final_Barcode Barcode_truncated_plus_T
#>           <factor> <factor>      <factor>                 <factor>
#> CL3        CL3      ILBC_01        AACGCA                   TGCGTT
#> CC1        CC1      ILBC_02        AACTCG                   CGAGTT
#> SV1        SV1      ILBC_03        AACTGT                   ACAGTT
#> M31Fcsw    M31Fcsw  ILBC_04        AAGAGA                   TCTCTT
#> M11Fcsw    M11Fcsw  ILBC_05        AAGCTG                   CAGCTT
#> ...            ...      ...           ...                      ...
#> TS28         TS28   ILBC_25        ACCAGA                   TCTGGT
#> TS29         TS29   ILBC_26        ACCAGC                   GCTGGT
#> Even1        Even1  ILBC_27        ACCGCA                   TGCGGT
#> Even2        Even2  ILBC_28        ACCTCG                   CGAGGT
#> Even3        Even3  ILBC_29        ACCTGT                   ACAGGT
#>         Barcode_full_length SampleType
#>                    <factor>   <factor>
#> CL3             CTAGCGTGCGT      Soil 
#> CC1             CATCGACGAGT      Soil 
#> SV1             GTACGCACAGT      Soil 
#> M31Fcsw         TCGACATCTCT      Feces
#> M11Fcsw         CGACTGCAGCT      Feces
#> ...                     ...        ...
#> TS28            GCATCGTCTGG      Feces
#> TS29            CTAGTCGCTGG      Feces
#> Even1           TGACTCTGCGG      Mock 
#> Even2           TCTGATCGAGG      Mock 
#> Even3           AGAGAGACAGG      Mock 
#>                                        Description        dominant_genera
#>                                           <factor>            <character>
#> CL3     Calhoun South Carolina Pine soil, pH 4.9     CandidatusSolibacter
#> CC1     Cedar Creek Minnesota, grassland, pH 6.1                     MC18
#> SV1     Sevilleta new Mexico, desert scrub, pH 8.3 CandidatusNitrososph..
#> M31Fcsw M3, Day 1, fecal swab, whole body study               Bacteroides
#> M11Fcsw M1, Day 1, fecal swab, whole body study               Bacteroides
#> ...                                            ...                    ...
#> TS28                                       Twin #1       Faecalibacterium
#> TS29                                       Twin #2       Faecalibacterium
#> Even1                                      Even1              Bacteroides
#> Even2                                      Even2              Bacteroides
#> Even3                                      Even3              Bacteroides