Get dominant taxa — perSampleDominantTaxa • mia

These functions return information about the most dominant taxa in a SummarizedExperiment object.

perSampleDominantFeatures(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  rank = NULL,
  other.name = "Other",
  n = NULL,
  complete = TRUE,
  ...
)

# S4 method for SummarizedExperiment
perSampleDominantFeatures(
  x,
  assay.type = assay_name,
  assay_name = "counts",
  rank = NULL,
  other.name = "Other",
  n = NULL,
  complete = TRUE,
  ...
)

perSampleDominantTaxa(x, ...)

# S4 method for SummarizedExperiment
perSampleDominantTaxa(x, ...)

addPerSampleDominantFeatures(
  x,
  name = "dominant_taxa",
  other.name = "Other",
  n = NULL,
  ...
)

# S4 method for SummarizedExperiment
addPerSampleDominantFeatures(
  x,
  name = "dominant_taxa",
  other.name = "Other",
  n = NULL,
  complete = FALSE,
  ...
)

addPerSampleDominantTaxa(x, ...)

# S4 method for SummarizedExperiment
addPerSampleDominantTaxa(x, ...)

Arguments

x: A SummarizedExperiment object.
assay.type: A single character value for selecting the assay to use for identifying dominant taxa.
assay_name: a single character value for specifying which assay to use for calculation. (Please use assay.type instead. At some point assay_name will be disabled.)
rank: A single character defining a taxonomic rank. Must be a value of the output of taxonomyRanks().
other.name: A name for features that are not included in n the most frequent dominant features in the data. Default is "Other".
n: The number of features that are the most frequent dominant features. Default is NULL, which defaults that each sample is assigned a dominant taxon.
complete: A boolean value to manage multiple dominant taxa for a sample. Default for perSampleDominantTaxa is TRUE to include all equally dominant taxa for each sample. complete = FALSE samples one taxa for the samples that have multiple. Default for addPerSampleDominantTaxa is FALSE to add a column with only one dominant taxon assigned for each sample into colData. complete = TRUE adds a list that includes all dominant taxa for each sample into colData.
...: Additional arguments passed on to agglomerateByRank() when rank is specified.
name: A name for the column of the colData where the dominant taxa will be stored in when using addPerSampleDominantFeatures.

Value

perSampleDominantFeatures returns a named character vector x

while addPerSampleDominantFeatures returns SummarizedExperiment

with additional column in colData named *name*.

Details

addPerSampleDominantFeatures extracts the most abundant taxa in a SummarizedExperiment object, and stores the information in the colData. It is a wrapper for perSampleDominantFeatures.

With rank parameter, it is possible to agglomerate taxa based on taxonomic ranks. E.g. if 'Genus' rank is used, all abundances of same Genus are added together, and those families are returned. See agglomerateByRank() for additional arguments to deal with missing values or special characters.

Author

Leo Lahti, Tuomas Borman and Sudarshan A. Shetty.

Examples

data(GlobalPatterns)
x <- GlobalPatterns

# Finds the dominant taxa.
sim.dom <- perSampleDominantFeatures(x, rank="Genus")

# Add information to colData
x <- addPerSampleDominantFeatures(x, rank = "Genus", name="dominant_genera")
colData(x)
#> DataFrame with 26 rows and 8 columns
#>         X.SampleID   Primer Final_Barcode Barcode_truncated_plus_T
#>           <factor> <factor>      <factor>                 <factor>
#> CL3        CL3      ILBC_01        AACGCA                   TGCGTT
#> CC1        CC1      ILBC_02        AACTCG                   CGAGTT
#> SV1        SV1      ILBC_03        AACTGT                   ACAGTT
#> M31Fcsw    M31Fcsw  ILBC_04        AAGAGA                   TCTCTT
#> M11Fcsw    M11Fcsw  ILBC_05        AAGCTG                   CAGCTT
#> ...            ...      ...           ...                      ...
#> TS28         TS28   ILBC_25        ACCAGA                   TCTGGT
#> TS29         TS29   ILBC_26        ACCAGC                   GCTGGT
#> Even1        Even1  ILBC_27        ACCGCA                   TGCGGT
#> Even2        Even2  ILBC_28        ACCTCG                   CGAGGT
#> Even3        Even3  ILBC_29        ACCTGT                   ACAGGT
#>         Barcode_full_length SampleType
#>                    <factor>   <factor>
#> CL3             CTAGCGTGCGT      Soil 
#> CC1             CATCGACGAGT      Soil 
#> SV1             GTACGCACAGT      Soil 
#> M31Fcsw         TCGACATCTCT      Feces
#> M11Fcsw         CGACTGCAGCT      Feces
#> ...                     ...        ...
#> TS28            GCATCGTCTGG      Feces
#> TS29            CTAGTCGCTGG      Feces
#> Even1           TGACTCTGCGG      Mock 
#> Even2           TCTGATCGAGG      Mock 
#> Even3           AGAGAGACAGG      Mock 
#>                                        Description        dominant_genera
#>                                           <factor>            <character>
#> CL3     Calhoun South Carolina Pine soil, pH 4.9   Genus:CandidatusSoli..
#> CC1     Cedar Creek Minnesota, grassland, pH 6.1               Genus:MC18
#> SV1     Sevilleta new Mexico, desert scrub, pH 8.3 Class:Chloracidobact..
#> M31Fcsw M3, Day 1, fecal swab, whole body study         Genus:Bacteroides
#> M11Fcsw M1, Day 1, fecal swab, whole body study         Genus:Bacteroides
#> ...                                            ...                    ...
#> TS28                                       Twin #1 Genus:Faecalibacterium
#> TS29                                       Twin #2 Family:Ruminococcaceae
#> Even1                                      Even1        Genus:Bacteroides
#> Even2                                      Even2        Genus:Bacteroides
#> Even3                                      Even3        Genus:Bacteroides