Appendix B — Contributions

Core team

Contributions to this Gitbook from the various developers are coordinated by:

  • Leo Lahti, DSc, professor in Data Science at the Department of Computing, University of Turku, Finland, with a focus on computational microbiome analysis. Lahti obtained doctoral degree (DSc) from Aalto University in Finland (2010), developing probabilistic machine learning with applications to high-throughput life science data integration. Since then he has focused on microbiome research and developed, for instance, the phyloseq-based microbiome R package before starting to develop the TreeSummarizedExperiment / MultiAssayExperiment framework and the mia family of Bioconductor packages for microbiome data science introduced in this gitbook. Lahti led the development of national policy on open access to research methods in Finland. He is current member in the Bioconductor Community Advisory Board and runs regular training workshops in microbiome data science.

  • Tuomas Borman, PhD researcher and the lead developer of OMA/mia at the Department of Computing, University of Turku.

Contributors

This work is a remarkably collaborative effort. The full list of contributors is available via Github. Some key authors/contributors include:

  • Felix Ernst, PhD, among the first developers of R/Bioc methods for microbiome research based on the SummarizedExperiment class and its derivatives.

  • Giulio Benedetti, scientific programmer at the Department of Computing, University of Turku. His research interest is mostly related to Data Science. He has also helped to expand the SummarizedExperiment-based microbiome analysis framework to the Julia language, implementing MicrobiomeAnalysis.jl.

  • Sudarshan Shetty, PhD has supported the establishment of the framework and associated tools. He also maintains a list of microbiome R packages.

  • Henrik Eckermann, in particular to the development of the differential abundance analyses

  • Chouaib Benchraka provided various contributions to the package ecosystem and the OMA book

  • Yağmur Şimşek converted the miaSim R package to support the Bioconductor framework

  • Basil Courbayre provided various contributions to the package ecosystem and the OMA book, in particular on unsupervised machine learning

  • Matti Ruuskanen, PhD, added machine learning techniques for microbiome analysis

  • Stefanie Peschel has contributed chapters on the construction, analysis, and comparison of microbial association networks.

  • Christian L. Müller, group leader at the Computational Health Center, Helmholtz Zentrum München, Germany and a Professor for Biomedical Statistics and Data Science at LMU Munich. He assisted in writing the chapters on network learning and comparison.

  • Shigdel Rajesh, PhD

  • Artur Sannikov

  • Jeba Akewak

  • Himmi Lindgren

  • Lu Yang

  • Katariina Pärnänen

  • Jacques Serizay converted the OMA book to the BiocBook format. This allows the OMA book to be built and distributed by Bioconductor.

  • Himel Mallick, PhD, FASA, principal investigator and tenure-track faculty at Cornell University’s Department of Population Health Sciences and an adjunct faculty of Statistics and Data Science at Bowers College of Computing and Information Science. He contributed to the chapters on meta-analyses, microbe set enrichment analysis (MSEA) and multi-omics prediction and classification.

  • Yihan Liu, assisted Dr. Mallick in writing the chapters on meta-anlayses, MSEA and multi-omics prediction and classification.

Acknowledgments

This work would not have been possible without the countless contributions and interactions with other researchers, developers, and users. We express our gratitude to the entire Bioconductor community for developing this high-quality open research software repository for life science analytics, continuously pushing the limits in emerging fields (Gentleman et al. 2004), (Huber et al. 2015).

Gentleman, Robert C, Vincent J Carey, Douglas M Bates, Ben Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, et al. 2004. “Bioconductor: Open Software Development for Computational Biology and Bioinformatics.” Genome Biology 5: R80.
Huber, W., V. J. Carey, R. Gentleman, S. Anders, M. Carlson, B. S. Carvalho, H. C. Bravo, et al. 2015. Orchestrating High-Throughput Genomic Analysis with Bioconductor.” Nature Methods 12 (2): 115–21. http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html.
Huang, Ruizhu. 2020. TreeSummarizedExperiment: A S4 Class for Data with Tree Structures.
Ernst, F. G. M., S. A. Shetty, R. Huang, Braccia D. J., Bravo H. C., and L. Lahti. 2020. “The Emerging r Ecosystem for Microbiome Research.” F1000Research 9. https://doi.org/10.7490/f1000research.1118445.1.
Ramos, Marcel, Lucas Schiffer, Angela Re, Rimsha Azhar, Azfar Basunia, Carmen Rodriguez Cabrera, Tiffany Chan, et al. 2017. “Software for the Integration of Multiomics Experiments in Bioconductor.” Cancer Research. https://doi.org/10.1158/0008-5472.CAN-17-0344.
Shetty, Sudarshan, and Leo Lahti. 2019. “Microbiome Data Science.” Journal of Biosciences 44: 115. https://doi.org/10.1007/s12038-019-9930-2.

The presented framework for microbiome data science is based on the TreeSummarizedExperiment data container created by Ruizhu Huang and others (Huang 2020), (Ernst et al. 2020), and on the MultiAssayExperiment by Marcel Ramos et al. (Ramos et al. 2017). The idea of using these containers as a basis for microbiome data science was initially advanced by the groundwork of Domenick Braccia, Héctor Corrada Bravo and others and brought together with other microbiome data science developers (Shetty and Lahti 2019). Setting up the base ecosystem of packages and tutorials was then subsequently led by Tuomas Borman, Felix Ernst, and Leo Lahti. We would specifically like to thank everyone who contributed to the work supporting the TreeSummarizedExperiment ecosystem for microbiome research, including but not limited to the R packages mia, miaViz, miaTime, miaSim, philr, ANCOMBC, curatedMetagenomicData, scater, scuttle, and other packages, some of which are listed in Section Section 2.1. A number of other contributors have advanced the ecosystem further, and will be acknowledged in the individual packages, pull requests, issues, and other work.

Ample demonstration data resources supporting this framework have been made available through the curatedMetagenomicData project by Edoardo Pasolli, Lucas Schiffer, Levi Waldron and others (E 2017).

E, Pasolli. 2017. “Accessible, Curated Metagenomic Data Through ExperimentHub.” Nature Methods 14 (11): 1023–24. https://www.nature.com/articles/nmeth.4468.
McMurdie, PJ, and S Holmes. 2013. Phyloseq: An r Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8: e61217. https://doi.org/10.1371/journal.pone.0061217.
Amezquita, Robert A., Aaron T. L. Lun, Etienne Becht, Vince J. Carey, Lindsay N. Carpp, Ludwig Geistlinger, Federico Marini, et al. 2020. “Orchestrating Single-Cell Analysis with Bioconductor.” Nature Methods 17: 137–45. https://doi.org/10.1038/s41592-019-0654-x.

The work has drawn initial inspiration from many sources, most notably from the work on phyloseq by Paul McMurdie and Susan Holmes (McMurdie and Holmes 2013) who pioneered the work on rigorous and reproducible microbiome data science ecosystems in R/Bioconductor. The phyloseq framework continues to provide a vast array of complementary packages and methods for microbiome studies. The Orchestrating Single-Cell Analysis with Bioconductor, or OSCA book by Robert Amezquita, Aaron Lun, Stephanie Hicks, and Raphael Gottardo (Amezquita et al. 2020) has implemented closely related work on the SummarizedExperiment data container and its derivatives in the field of single cell sequencing studies that have inspired this work.

In the background, the open source books by Susan Holmes and Wolfgang Huber, Modern Statistics for Modern Biology (Holmes and Huber 2019) and by Garret Grolemund and Hadley Wickham, the R for Data Science (Grolemund and Wickham 2017), and Richard McElreath’s Statistical Rethinking and the associated online resources by Solomon Kurz (McElreath 2020) are key references that have advanced reproducible data science training and dissemination.

Holmes, Susan, and Wolfgang Huber. 2019. Modern Statistics for Modern Biology. New York, NY: Cambridge University Press. https://www.huber.embl.de/msmb/.
Grolemund, Garret, and Hadley Wickham. 2017. R for Data Science. Vol. 77(21); e39–42. O’Reilly.
McElreath, R. 2020. Statistical Rethinking. Chapman; Hall/CRC.

B.0.1 How to contribute

To contribute to the project, please follow the Git flow procedure introduced below. See instructions to get started with Github):

  1. Fork the project
  2. Clone your fork
  3. Modify the material
  4. Check locally that the changes render successfully
  5. Add and commit the changes to your fork
  6. Create a pull request from your fork back to the original repository
  7. Fix and discuss issues in a review process

More detailed instructions for contributing can be found on OMA README.

Support

This work has been supported by:

Moreno-Indias, Isabel, Leo Lahti, Miroslava Nedyalkova, Ilze Elbere, Gennady V. Roshchupkin, Muhamed Adilovic, Onder Aydemir, et al. 2021. “Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions.” Frontiers in Microbiology 12: 277. https://doi.org/10.3389/fmicb.2021.635781.
Back to top