Huang et al. F1000, 2021
Current standard for (16S) microbiome bioinformatics in R (J McMurdie, S Holmes et al.)
Available in TreeSummarizedExperiment format:
We have already loaded HintikkaXOData
Now, let us look at the data components
Explore TreeSE components (OMA 17.2); extract and visualize:
(assays, colData, rowData)
rowTree, colTree
metadata
Construct a new TreeSE object from scratch, starting from original data files.
Taxonomic profiling from 40 rat Cecum samples including 12706 OTUs from 318 species
Diet comparison with High/Low fat diet and xylo-oligosaccaride supplementation.
Construct a new TreeSE object from scratch, starting from original data files.
OMA, Chapter 18 (18.3.1-3) includes additional tips
Read in the CSV files. You can use shared example data files on the cloud server; see in R: dir("shared/data")
Data files include sample data (coldata.csv
); taxonomic table (rowdata_taxa.csv
); taxonomic abundance table (assay_taxa.csv
). Load these in your RStudio session with e.g. read.csv("shared/data/coldata.csv")
Construct TreeSE in R (see OMA Ch. 2)
OMA, Chapter 18 (18.3.4)
Follow the biom file example in OMA 2.3.2.1
Example data files are available on the cloud server; see in R: dir("shared/data")
OMA, Chapter 18 (18.3.5)
TreeSummarizedExperiment and phyloseq are alternative containers for microbiome data in R. It is useful to know how to convert between these two formats.
Convert your TreeSE into phyloseq
Convert the phyloseq back to TreeSE
OMA, Chapter 18 (18.4.3 - 18.4.5)
TreeSE/MAE, exploration & analysis, how to find more
mia & miaViz R packages:
Intro to miaverse (mia, miaViz, miaTime, miaSim, OMA)
2-3 simple examples to summarize & visualize data from TreeSE container
Bias in compositional data:
Possible solutions:
Divide by the total number of reads per sample (compositional abundance)
Rarify (subsample) to even sampling depth
→ Problem: Abundant taxa may distort the ratios
-> Compositional data analysis (CoDa)
Potentially drastic effect on conclusions!
Abundance along the community landscape
Aitchison transformations are used to reduce compositional bias.
Balances, or ratios between taxa abundances, are conserved in compositional transformation: \(\frac{x}{y} = \frac{cx}{cy}\)
Create and add new assays in the data:
- - there exists no single taxonomic resolution at which taxonomic variation unambiguously reflects functional variation, and at which environmental selection of certain functions - - unambiguously translates to a selection of specific taxa (Louca et al. 2018).
(Based on phILR; Silverman et al. 2017)
Pathways in representative bacterial genomes of Clostridium subclusters IV and XIVa indicated the presence of e.g., ethanol fermentation pathways → endogenous ethanol producers associated with fatty liver?
In addition to age and sex, the models included differences in 11 microbial groups from class Clostridia, mostly belonging to orders Lachnospirales and Oscillospirales. Previously NAFLD-associated Clostridia XIVa group members were detected. Two species in Clostridia IV group were not previously associated with fatty liver disease.
Key associations validated in another Finnish cohort (N=258).
GlobalPatterns
data set in the mia R package)If you complete the task fast, check out other OMA Exercises on data containers.