Morning: data wrangling
Afternoon: data visualizations
Task: visualize the abundance of a specific microbial Species against the measurement Site
Use the available tools to assess and visualize alpha diversity, and augment colData
Healthy & normal obese subjects.
How many types?
Distribution of types?
Dominance of types?
How many types?
Distribution of types?
Dominance of types?
Richness
number of types
Eetimates of true richness based on finite sample sizes (Howard Sanders 1968); see e.g. Chao1
Evenness
Diversity
Dominance
High-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota.
Unified Human Gastrointestinal Genome (UHGG):
4,644 gut prokaryotes (>70% lack cultured representatives)
204,938 nonredundant genomes
Encode >170 million protein sequences, collated into Unified Human Gastrointestinal Protein (UHGP) catalog.
UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog.
40% of the UHGP lack functional annotations
Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations.
The UHGG and UHGP collections enable studies linking genotypes to phenotypes in the human gut microbiome.
Phylogenetically neutral diversities:
Phylogeny-aware diversities:
How likely it is to pick two members of the same species at random?
Beware the variants:
Simpson (\(\lambda\))
reciprocal Simpson (\(1-\lambda\))
inverse Simpson (\(\frac{1}{\lambda}\))
Shannon Index:
True Richness:
True diversity, or the effective number of types, refers to the number of equally abundant types needed for the average proportional abundance of the types to equal what is observed in the dataset of interest.
H / ln(S)
Hill’s alpha diversities
R: richness (number of distinct types)
pi: proportion of type I
Order of diversity:
Hill’s alpha diversities
Transform
Subset
Merge
Aggregate
Split
Load example data set:
Check dimension:
Check dimension for a subset:
Task: Alternative assays
Agglomerate microbiota data to higher taxonomic levels:
Alternative assays vs. alternative experiments?
Splitting by:
The alternative experiments (altExp) mechanism allows us to include multiple abundance tables at different taxonomic levels.
Option | Rows (features) | Cols (samples) | Recommendation |
---|---|---|---|
assays | match | match | Data transformations |
altExp | free | match | Alternative experiments |
MultiAssay | free | free (mapping) | Multi-omic experiments |
Huang et al. F1000, 2021