Preamble

This work - Orchestrating Microbiome Analysis with Bioconductor (Lahti et al. 2021) - contributes novel methods and educational resources for microbiome data science. It aims to teach the grammar of Bioconductor workflows in the context of microbiome data science. We show, through concrete examples, how to use the latest developments and data analytical strategies in R/Bioconductor for the manipulation, analysis, and reproducible reporting of hierarchical, heterogeneous, and multi-modal microbiome profiling data. The data science methodology is tightly integrated with the broader R/Bioconductor ecosystem that focuses on the development of high-quality open research software for life sciences (Gentleman et al. (2004), Huber et al. (2015)). The support for modularity and interoperability is key to efficient resource sharing and collaborative development both within and across research fields. The central data infrastructure, the SummarizedExperiment data container and its derivatives, have already been widely adopted in microbiome research, single cell sequencing, and in other fields, allowing rapid adoption and the extension of emerging data science techniques across application domains.

Lahti, Leo, Sudarshan Shetty, Felix M Ernst, et al. 2021. Orchestrating Microbiome Analysis with Bioconductor [Beta Version]. microbiome.github.io/oma/.

Gentleman, Robert C, Vincent J Carey, Douglas M Bates, Ben Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, et al. 2004. “Bioconductor: Open Software Development for Computational Biology and Bioinformatics.” Genome Biology 5: R80.

Huber, W., V. J. Carey, R. Gentleman, S. Anders, M. Carlson, B. S. Carvalho, H. C. Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2): 115–21. http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html.

We assume that the reader is already familiar with R programming. For references and tips on introductory material for R and Bioconductor, see 23 Resources. This online resource and its associated ecosystem of microbiome data science tools is a result of a community-driven development process, welcoming new users and contributors. You can find more information on how to find us online and join the developer community through the project’s homepage at microbiome.github.io.

The book is organized into three parts. We start by introducing the material and link to further resources for learning R and Bioconductor. We describe the key data infrastructure, the TreeSummarizedExperiment class that provides a container for microbiome data, and how to get started by loading microbiome data set in the context of this new framework. The second section, Focus Topics, covers the common steps in microbiome data analysis, beginning with the most common steps and progressing to more specialized methods in subsequent sections. Third, Workflows, provides case studies for the various datasets used throughout the book. Finally, Appendix, links to further resources and acknowledgments.