miaViz.Rmd
miaViz
implements plotting function to work with
TreeSummarizedExperiment
and related objects in a context
of microbiome analysis. For more general plotting function on
SummarizedExperiment
objects the scater
package offers several options, such as plotColData
,
plotExpression
and plotRowData
.
To install miaViz
, install BiocManager
first, if it is not installed. Afterwards use the install
function from BiocManager
and load miaViz
.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("miaViz")
In contrast to other fields of sequencing based fields of research
for which expression of genes is usually studied, microbiome research
uses the more term Abundance to described the numeric data
measured and analyzed. Technically, especially in context of
SummarizedExperiment
objects, there is no difference.
Therefore plotExpression
can be used to plot
Abundance
data of a particular feature.
plotExpression(GlobalPatterns,
features = "549322", assay.type = "counts")
On the other hand, plotAbundance can be used to plot abundance by
rank
. A bar plot is returned showing the relative abundance
within each sample for a given rank
. At the same time the
features
argument can be set to NULL
(default).
GlobalPatterns <- transformAssay(GlobalPatterns, method = "relabundance")
plotAbundance(GlobalPatterns, rank = "Kingdom", assay.type = "relabundance")
If rank
is set to null however then the bars will be
colored by each individual taxon. Please note that if you’re doing this
make sure to agglomerate your data to a certain taxonomic hand before
plotting.
GlobalPatterns_king <- agglomerateByRank(GlobalPatterns, "Kingdom")
plotAbundance(GlobalPatterns_king, assay.type = "relabundance")
With subsetting to selected features the plot can be fine tuned.
prev_phylum <- getPrevalent(GlobalPatterns, rank = "Phylum",
detection = 0.01, onRankOnly = TRUE)
plotAbundance(GlobalPatterns[rowData(GlobalPatterns)$Phylum %in% prev_phylum],
rank = "Phylum",
assay.type = "relabundance")
#> Warning: Removed 101 rows containing missing values or values outside the scale range
#> (`geom_bar()`).
The features
argument is reused for plotting data along
the different samples. In the next example the SampleType is
plotted along the samples. In this case the result is a list, which can
combined using external tools, for example patchwork
.
library(patchwork)
plots <- plotAbundance(GlobalPatterns[rowData(GlobalPatterns)$Phylum %in% prev_phylum],
features = "SampleType",
rank = "Phylum",
assay.type = "relabundance")
plots$abundance / plots$SampleType +
plot_layout(heights = c(9, 1))
#> Warning: Removed 101 rows containing missing values or values outside the scale range
#> (`geom_bar()`).
Further example about composition barplot can be found at Orchestrating Microbiome Analysis (Lahti, Shetty, and Ernst 2021).
To visualize prevalence within the dataset, two functions are
available, plotFeaturePrevalence
,
plotPrevalenceAbundance
and
plotPrevalence
.
plotFeaturePrevalence
produces a so-called landscape
plot, which visualizes the prevalence of samples across abundance
thresholds.
plotFeaturePrevalence(GlobalPatterns, rank = "Phylum",
detections = c(0, 0.001, 0.01, 0.1, 0.2))
#> Warning in plotFeaturePrevalence(GlobalPatterns, rank = "Phylum", detections =
#> c(0, : The 'plotFeaturePrevalence' function is deprecated. Use
#> 'plotRowPrevalence' instead.
plotPrevalenceAbundance
plot the prevalence depending on
the mean relative abundance on the chosen taxonomic level.
plotPrevalentAbundance(GlobalPatterns, rank = "Family",
colour_by = "Phylum") +
scale_x_log10()
plotPrevalence
plot the number of samples and their
prevalence across different abundance thresholds. Abundance steps can be
adjusted using the detections
argument, whereas the
analyzed prevalence steps is set using the prevalences
argument.
plotPrevalence(GlobalPatterns,
rank = "Phylum",
detections = c(0.01, 0.1, 1, 2, 5, 10, 20)/100,
prevalences = seq(0.1, 1, 0.1))
The information stored in the rowTree
can be directly
plotted. However, sizes of stored trees have to be kept in mind and
plotting of large trees rarely makes sense.
For this example we limit the information plotted to the top 100 taxa as judged by mean abundance on the genus level.
altExp(GlobalPatterns,"Genus") <- agglomerateByRank(GlobalPatterns,"Genus")
altExp(GlobalPatterns,"Genus") <- addPerFeatureQC(altExp(GlobalPatterns,"Genus"))
rowData(altExp(GlobalPatterns,"Genus"))$log_mean <-
log(rowData(altExp(GlobalPatterns,"Genus"))$mean)
rowData(altExp(GlobalPatterns,"Genus"))$detected <-
rowData(altExp(GlobalPatterns,"Genus"))$detected / 100
top_taxa <- getTop(altExp(GlobalPatterns,"Genus"),
method="mean",
top=100L,
assay.type="counts")
Colour, size and shape of tree tips and nodes can be decorated based
on data present in the SE
object or by providing additional
information via the other_fields
argument. Note that
currently information for nodes have to be provided via the
other_fields
arguments.
Data will be matched via the node
or label
argument depending on which was provided. label
takes
precedent.
plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
tip_colour_by = "log_mean",
tip_size_by = "detected")
Tip and node labels can be shown as well. Setting
show_label = TRUE
shows the tip labels only …
plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
tip_colour_by = "log_mean",
tip_size_by = "detected",
show_label = TRUE)
… whereas node labels can be selectively shown by providing a named
logical vector to show_label
.
Please note that currently ggtree
can only plot node
labels in a rectangular layout.
labels <- c("Genus:Providencia", "Genus:Morganella", "0.961.60")
plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
tip_colour_by = "log_mean",
tip_size_by = "detected",
show_label = labels,
layout="rectangular")
Information can also be visualized on the edges of the tree plot.
plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
edge_colour_by = "Phylum",
tip_colour_by = "log_mean")
Similar to tree data, graph data can also be plotted in conjunction
with SummarizedExperiment
objects. Since the graph data in
itself cannot be stored in a specialized slot, a graph object can be
provided separately or as an element from the metedata
.
Here we load an example graph. As graph data, all objects types
accepted by as_tbl_graph
from the tidygraph
package are supported.
data(col_graph)
In the following examples, the weight
data is
automatically generated from the graph data. The
SummarizedExperiment
provided is required to have
overlapping rownames with the node names of the graph. Using this link
the graph plot can incorporate data from the
SummarizedExperiment
.
plotColGraph(col_graph,
altExp(GlobalPatterns,"Genus"),
colour_by = "SampleType",
edge_colour_by = "weight",
edge_width_by = "weight",
show_label = TRUE)
#> This graph was created by an old(er) igraph version.
#> ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
#> For now we convert it on the fly...
As mentioned the graph data can be provided from the
metadata
of the SummarizedExperiment
.
metadata(altExp(GlobalPatterns,"Genus"))$graph <- col_graph
This produces the same plot as shown above.
if(!requireNamespace("miaTime", quietly = TRUE)){
remotes::install_github("microbiome/miaTime", upgrade = "never")
}
# Load data from miaTime package
library("miaTime")
data(SilvermanAGutData, package="miaTime")
tse <- SilvermanAGutData
tse <- transformAssay(tse, method = "relabundance")
taxa <- getTop(tse, 2)
Data from samples collected along time can be visualized using
plotSeries
. The x
argument is used to
reference data from the colData
to use as descriptor for
ordering the data. The y
argument selects the feature to
show. Since plotting a lot of features is not advised a maximum of 20
features can plotted at the same time.
plotSeries(tse,
x = "DAY_ORDER",
y = taxa,
colour_by = "Family")
If replicated data is present, data is automatically used for
calculation of the mean
and sd
and plotted as
a range. Data from different assays can be used for plotting via the
assay.type
.
plotSeries(tse[taxa,],
x = "DAY_ORDER",
colour_by = "Family",
linetype_by = "Phylum",
assay.type = "relabundance")
Additional variables can be used to modify line type aesthetics.
plotSeries(tse,
x = "DAY_ORDER",
y = getTop(tse, 5),
colour_by = "Family",
linetype_by = "Phylum",
assay.type = "counts")
To visualize the relative relations between two groupings among the
factor data, two functions are available for the purpose;
plotColTile
and plotRowTile
.
data(GlobalPatterns, package="mia")
se <- GlobalPatterns
plotColTile(se,"SampleType","Primer") +
theme(axis.text.x.top = element_text(angle = 45, hjust = 0))
Searching for groups that are similar to each other among the
samples, could be approached with the Dirichlet Multinomial Mixtures
(Holmes, Harris, and Quince 2012). After
using runDMN
from the mia
package, several k
values as a number of clusters are used to observe the best fit (see
also getDMN
and getBestDMNFit
). To visualize
the fit using e.g. “laplace” as a measure of goodness of fit:
data(dmn_se, package = "mia")
names(metadata(dmn_se))
#> [1] "DMN"
# plot the fit
plotDMNFit(dmn_se, type = "laplace")
#> Warning in .local(x, name, type, ...): 'getDMN' is deprecated.
#> Use 'addCluster' instead.
#> See help("Deprecated") and help("Now runDMN and calculateDMN are deprecated. Use addCluster with DMMParam parameter and full parameter set as true instead.-deprecated").
Principal Coordinates Analysis using Bray-Curtis dissimilarity on the
hitchip1006
dataset:
library(miaTime)
data(hitchip1006, package = "miaTime")
tse <- hitchip1006
tse <- transformAssay(tse, method = "relabundance")
## Ordination with PCoA with Bray-Curtis dissimilarity
tse <- runMDS(tse, FUN = vegan::vegdist, method = "bray", name = "PCoA_BC",
assay.type = "relabundance", na.rm = TRUE)
# plot
p <- plotReducedDim(tse, dimred = "PCoA_BC")
p
Retrieving information about all available trajectories:
library(dplyr)
# List subjects with two time points
selected.subjects <- names(which(table(tse$subject)==2))
# Subjects counts per number of time points available in the data
table(table(tse$subject)) %>% as.data.frame() %>%
rename(Timepoints=Var1, Subjects=Freq)
Lets look at all trajectories having two time points in the data:
# plot
p + geom_path(aes(x=X1, y=X2, group=subject),
arrow=arrow(length = unit(0.1, "inches")),
# combining ordination data and metadata then selecting the subjects
# Note, scuttle::makePerCellDF could also be used for the purpose.
data = subset(data.frame(reducedDim(tse), colData(tse)),
subject %in% selected.subjects) %>% arrange(time))+
labs(title = "All trajectories with two time points")+
theme(plot.title = element_text(hjust = 0.5))
Filtering the two time point trajectories by divergence and displaying top 10%:
library(miaTime)
# calculating step wise divergence based on the microbial profiles
tse <- getStepwiseDivergence(tse, group = "subject", time_field = "time")
# retrieving the top 10% divergent subjects having two time points
top.selected.subjects <- subset(data.frame(reducedDim(tse), colData(tse)),
subject %in% selected.subjects) %>%
top_frac(0.1, time_divergence) %>% select(subject) %>% .[[1]]
# plot
p + geom_path(aes(x=X1, y=X2,
color=time_divergence, group=subject),
# the data is sorted in descending order in terms of time
# since geom_path will use the first occurring observation
# to color the corresponding segment. Without the sorting
# geom_path will pick up NA values (corresponding to initial time
# points); breaking the example.
data = subset(data.frame(reducedDim(tse), colData(tse)),
subject %in% top.selected.subjects) %>%
arrange(desc(time)),
# arrow end is reversed, due to the earlier sorting.
arrow=arrow(length = unit(0.1, "inches"), ends = "first"))+
labs(title = "Top 10% divergent trajectories from time point one to two")+
scale_color_gradient2(low="white", high="red")+
theme(plot.title = element_text(hjust = 0.5))
Plotting an example of the trajectory with the maximum total divergence:
# Get subject with the maximum total divergence
selected.subject <- data.frame(reducedDim(tse), colData(tse)) %>%
group_by(subject) %>%
summarise(total_divergence = sum(time_divergence, na.rm = TRUE)) %>%
filter(total_divergence==max(total_divergence)) %>% select(subject) %>% .[[1]]
# plot
p + geom_path(aes(x=X1, y=X2, group=subject),
data = subset(data.frame(reducedDim(tse), colData(tse)),
subject %in% selected.subject) %>% arrange(time),
arrow=arrow(length = unit(0.1, "inches")))+
labs(title = "Longest trajectory by divergence")+
theme(plot.title = element_text(hjust = 0.5))
More examples and materials are available at Orchestrating Microbiome Analysis (Lahti, Shetty, and Ernst 2021).
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] patchwork_1.3.0 scater_1.34.0
#> [3] scuttle_1.16.0 miaViz_1.15.2
#> [5] mia_1.14.0 TreeSummarizedExperiment_2.14.0
#> [7] Biostrings_2.74.0 XVector_0.46.0
#> [9] SingleCellExperiment_1.28.0 MultiAssayExperiment_1.32.0
#> [11] SummarizedExperiment_1.36.0 Biobase_2.66.0
#> [13] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
#> [15] IRanges_2.40.0 S4Vectors_0.44.0
#> [17] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
#> [19] matrixStats_1.4.1 ggraph_2.2.1
#> [21] ggplot2_3.5.1 BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.4.1 ggplotify_0.1.2
#> [3] tibble_3.2.1 polyclip_1.10-7
#> [5] rpart_4.1.23 DirichletMultinomial_1.48.0
#> [7] lifecycle_1.0.4 lattice_0.22-6
#> [9] MASS_7.3-61 SnowballC_0.7.1
#> [11] backports_1.5.0 magrittr_2.0.3
#> [13] Hmisc_5.2-0 sass_0.4.9
#> [15] rmarkdown_2.28 jquerylib_0.1.4
#> [17] yaml_2.3.10 RColorBrewer_1.1-3
#> [19] DBI_1.2.3 minqa_1.2.8
#> [21] abind_1.4-8 zlibbioc_1.52.0
#> [23] purrr_1.0.2 yulab.utils_0.1.7
#> [25] nnet_7.3-19 tweenr_2.0.3
#> [27] sandwich_3.1-1 GenomeInfoDbData_1.2.13
#> [29] ggrepel_0.9.6 irlba_2.3.5.1
#> [31] tokenizers_0.3.0 tidytree_0.4.6
#> [33] vegan_2.6-8 rbiom_1.0.3
#> [35] pkgdown_2.1.1 permute_0.9-7
#> [37] DelayedMatrixStats_1.28.0 codetools_0.2-20
#> [39] DelayedArray_0.32.0 ggforce_0.4.2
#> [41] tidyselect_1.2.1 aplot_0.2.3
#> [43] UCSC.utils_1.2.0 farver_2.1.2
#> [45] lme4_1.1-35.5 ScaledMatrix_1.14.0
#> [47] viridis_0.6.5 base64enc_0.1-3
#> [49] jsonlite_1.8.9 BiocNeighbors_2.0.0
#> [51] decontam_1.26.0 tidygraph_1.3.1
#> [53] Formula_1.2-5 systemfonts_1.1.0
#> [55] tools_4.4.1 ggnewscale_0.5.0
#> [57] treeio_1.30.0 ragg_1.3.3
#> [59] Rcpp_1.0.13 glue_1.8.0
#> [61] gridExtra_2.3 SparseArray_1.6.0
#> [63] xfun_0.48 mgcv_1.9-1
#> [65] dplyr_1.1.4 withr_3.0.2
#> [67] BiocManager_1.30.25 fastmap_1.2.0
#> [69] boot_1.3-31 bluster_1.16.0
#> [71] fansi_1.0.6 digest_0.6.37
#> [73] rsvd_1.0.5 R6_2.5.1
#> [75] gridGraphics_0.5-1 textshaping_0.4.0
#> [77] colorspace_2.1-1 lpSolve_5.6.21
#> [79] utf8_1.2.4 tidyr_1.3.1
#> [81] generics_0.1.3 data.table_1.16.2
#> [83] DECIPHER_3.2.0 graphlayouts_1.2.0
#> [85] httr_1.4.7 htmlwidgets_1.6.4
#> [87] S4Arrays_1.6.0 pkgconfig_2.0.3
#> [89] gtable_0.3.6 janeaustenr_1.0.0
#> [91] htmltools_0.5.8.1 bookdown_0.41
#> [93] scales_1.3.0 ggfun_0.1.7
#> [95] knitr_1.48 rstudioapi_0.17.1
#> [97] reshape2_1.4.4 checkmate_2.3.2
#> [99] nlme_3.1-166 nloptr_2.1.1
#> [101] cachem_1.1.0 zoo_1.8-12
#> [103] stringr_1.5.1 parallel_4.4.1
#> [105] vipor_0.4.7 foreign_0.8-87
#> [107] desc_1.4.3 pillar_1.9.0
#> [109] grid_4.4.1 vctrs_0.6.5
#> [111] slam_0.1-54 BiocSingular_1.22.0
#> [113] beachmat_2.22.0 cluster_2.1.6
#> [115] beeswarm_0.4.0 htmlTable_2.4.3
#> [117] evaluate_1.0.1 mvtnorm_1.3-1
#> [119] cli_3.6.3 compiler_4.4.1
#> [121] rlang_1.1.4 crayon_1.5.3
#> [123] tidytext_0.4.2 labeling_0.4.3
#> [125] mediation_4.5.0 plyr_1.8.9
#> [127] fs_1.6.5 ggbeeswarm_0.7.2
#> [129] stringi_1.8.4 viridisLite_0.4.2
#> [131] BiocParallel_1.40.0 munsell_0.5.1
#> [133] lazyeval_0.2.2 Matrix_1.7-1
#> [135] sparseMatrixStats_1.18.0 highr_0.11
#> [137] igraph_2.1.1 memoise_2.0.1
#> [139] RcppParallel_5.1.9 bslib_0.8.0
#> [141] ggtree_3.14.0 ape_5.8