miaTime
implements tools for time series manipulation
based on the TreeSummarizedExperiment
(TreeSE
) data container. Much of the functionality is also
applicable to the SummarizedExperiment
data objects. This tutorial shows how to use miaTime
methods as well as the broader R/Bioconductor ecosystem to manipulate
time series data.
Check also the related package TimeSeriesExperiment.
miaTime
is hosted on Bioconductor, and can be installed
using via BiocManager
.
BiocManager::install("miaTime")
Once installed, miaTime
is made available in the usual
way.
library(miaTime)
#> Loading required package: mia
#> Loading required package: MultiAssayExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
#> table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
#> Loading required package: SingleCellExperiment
#> Loading required package: TreeSummarizedExperiment
#> Loading required package: Biostrings
#> Loading required package: XVector
#>
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#>
#> strsplit
#> This is mia version 1.15.3
#> - Online documentation and vignettes: https://microbiome.github.io/mia/
#> - Online book 'Orchestrating Microbiome Analysis (OMA)': https://microbiome.github.io/OMA/docs/devel/
To sort data based on subject and time point in base R, you can use
the order()
function.
period
class
miaTime
utilizes the functions available in the package
lubridate
to convert time series field to
period
class object. This gives access to a number of
readily available time
series manipulation tools.
Load example data:
# Load packages
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:Biostrings':
#>
#> intersect, setdiff, union
#> The following objects are masked from 'package:GenomicRanges':
#>
#> intersect, setdiff, union
#> The following object is masked from 'package:GenomeInfoDb':
#>
#> intersect
#> The following objects are masked from 'package:IRanges':
#>
#> %within%, intersect, setdiff, union
#> The following objects are masked from 'package:S4Vectors':
#>
#> intersect, second, second<-, setdiff, union
#> The following objects are masked from 'package:BiocGenerics':
#>
#> intersect, setdiff, union
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
# Time is given in days in the demo data.
# Convert days to seconds
time_in_seconds <- 60*60*24*tse[["time"]]
# Convert the time data to period class
seconds <- as.period(time_in_seconds, unit = "sec")
# Check the output
seconds |> tail()
#> [1] "0S" "0S" "0S" "0S" "0S" "0S"
The time field in days is now shown in seconds. It can then be
converted to many different units using the lubridate
package.
The updated time information can then be added to the
SummarizedExperiment
data object as a new
colData
(sample data) field.
colData(tse)$time_sec <- seconds
colData(tse)
#> DataFrame with 1151 rows and 11 columns
#> age sex nationality DNA_extraction_method project
#> <integer> <factor> <factor> <factor> <factor>
#> Sample-1 28 male US NA 1
#> Sample-2 24 female US NA 1
#> Sample-3 52 male US NA 1
#> Sample-4 22 female US NA 1
#> Sample-5 25 female US NA 1
#> ... ... ... ... ... ...
#> Sample-1002 40 male UKIE r 28
#> Sample-1003 26 female UKIE r 28
#> Sample-1004 45 male UKIE r 28
#> Sample-1005 26 male UKIE r 28
#> Sample-1006 23 female UKIE r 28
#> diversity bmi_group subject time sample time_sec
#> <numeric> <factor> <factor> <numeric> <character> <Period>
#> Sample-1 5.76 severeobese 1 0 Sample-1 0S
#> Sample-2 6.06 obese 2 0 Sample-2 0S
#> Sample-3 5.50 lean 3 0 Sample-3 0S
#> Sample-4 5.87 underweight 4 0 Sample-4 0S
#> Sample-5 5.89 lean 5 0 Sample-5 0S
#> ... ... ... ... ... ... ...
#> Sample-1002 5.87 lean 1002 0 Sample-1002 0S
#> Sample-1003 6.02 severeobese 1003 0 Sample-1003 0S
#> Sample-1004 5.85 lean 1004 0 Sample-1004 0S
#> Sample-1005 5.71 overweight 1005 0 Sample-1005 0S
#> Sample-1006 6.07 lean 1006 0 Sample-1006 0S
The lubridate::as.duration()
function helps to specify
time points as duration.
duration <- as.duration(seconds)
duration |> tail()
#> [1] "0s" "0s" "0s" "0s" "0s" "0s"
The difference between subsequent time points can then be calculated.
The time difference from a selected point to the other time points can be calculated as follows.
Rank of the time points can be calculated by rank
function provided in base R.
tse[["time"]] <- rank(tse[["time"]])
colData(tse)
#> DataFrame with 1151 rows and 11 columns
#> age sex nationality DNA_extraction_method project
#> <integer> <factor> <factor> <factor> <factor>
#> Sample-1 28 male US NA 1
#> Sample-2 24 female US NA 1
#> Sample-3 52 male US NA 1
#> Sample-4 22 female US NA 1
#> Sample-5 25 female US NA 1
#> ... ... ... ... ... ...
#> Sample-1002 40 male UKIE r 28
#> Sample-1003 26 female UKIE r 28
#> Sample-1004 45 male UKIE r 28
#> Sample-1005 26 male UKIE r 28
#> Sample-1006 23 female UKIE r 28
#> diversity bmi_group subject time sample time_sec
#> <numeric> <factor> <factor> <numeric> <character> <Period>
#> Sample-1 5.76 severeobese 1 503.5 Sample-1 0S
#> Sample-2 6.06 obese 2 503.5 Sample-2 0S
#> Sample-3 5.50 lean 3 503.5 Sample-3 0S
#> Sample-4 5.87 underweight 4 503.5 Sample-4 0S
#> Sample-5 5.89 lean 5 503.5 Sample-5 0S
#> ... ... ... ... ... ... ...
#> Sample-1002 5.87 lean 1002 503.5 Sample-1002 0S
#> Sample-1003 6.02 severeobese 1003 503.5 Sample-1003 0S
#> Sample-1004 5.85 lean 1004 503.5 Sample-1004 0S
#> Sample-1005 5.71 overweight 1005 503.5 Sample-1005 0S
#> Sample-1006 6.07 lean 1006 503.5 Sample-1006 0S
Sometimes we need to operate on time series per unit (subject, reaction chamber, sampling location, …).
Add time point rank per subject.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:mia':
#>
#> full_join, inner_join, left_join, right_join
#> The following objects are masked from 'package:Biostrings':
#>
#> collapse, intersect, setdiff, setequal, union
#> The following object is masked from 'package:XVector':
#>
#> slice
#> The following object is masked from 'package:Biobase':
#>
#> combine
#> The following objects are masked from 'package:GenomicRanges':
#>
#> intersect, setdiff, union
#> The following object is masked from 'package:GenomeInfoDb':
#>
#> intersect
#> The following objects are masked from 'package:IRanges':
#>
#> collapse, desc, intersect, setdiff, slice, union
#> The following objects are masked from 'package:S4Vectors':
#>
#> first, intersect, rename, setdiff, setequal, union
#> The following objects are masked from 'package:BiocGenerics':
#>
#> combine, intersect, setdiff, union
#> The following object is masked from 'package:matrixStats':
#>
#> count
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
colData(tse) <- colData(tse) |>
as.data.frame() |>
group_by(subject) |>
mutate(rank = rank(time, ties.method = "average")) |>
DataFrame()
TreeSE
consists of rows for features and columns for
samples. If we are specifically interested in baseline samples, we can
easily subset the data as follows.
tse <- tse[, tse$time==0]
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 lubridate_1.9.3
#> [3] miaTime_0.99.0 mia_1.15.3
#> [5] TreeSummarizedExperiment_2.14.0 Biostrings_2.74.0
#> [7] XVector_0.46.0 SingleCellExperiment_1.28.0
#> [9] MultiAssayExperiment_1.32.0 SummarizedExperiment_1.36.0
#> [11] Biobase_2.66.0 GenomicRanges_1.58.0
#> [13] GenomeInfoDb_1.42.0 IRanges_2.40.0
#> [15] S4Vectors_0.44.0 BiocGenerics_0.52.0
#> [17] MatrixGenerics_1.18.0 matrixStats_1.4.1
#> [19] knitr_1.48 BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.17.1 jsonlite_1.8.9
#> [3] magrittr_2.0.3 ggbeeswarm_0.7.2
#> [5] nloptr_2.1.1 rmarkdown_2.28
#> [7] fs_1.6.4 zlibbioc_1.52.0
#> [9] ragg_1.3.3 vctrs_0.6.5
#> [11] minqa_1.2.8 DelayedMatrixStats_1.28.0
#> [13] base64enc_0.1-3 htmltools_0.5.8.1
#> [15] S4Arrays_1.6.0 BiocNeighbors_2.0.0
#> [17] SparseArray_1.6.0 Formula_1.2-5
#> [19] sass_0.4.9 bslib_0.8.0
#> [21] htmlwidgets_1.6.4 desc_1.4.3
#> [23] sandwich_3.1-1 plyr_1.8.9
#> [25] DECIPHER_3.2.0 zoo_1.8-12
#> [27] cachem_1.1.0 igraph_2.1.1
#> [29] lifecycle_1.0.4 pkgconfig_2.0.3
#> [31] rsvd_1.0.5 Matrix_1.7-1
#> [33] R6_2.5.1 fastmap_1.2.0
#> [35] GenomeInfoDbData_1.2.13 digest_0.6.37
#> [37] colorspace_2.1-1 scater_1.34.0
#> [39] irlba_2.3.5.1 textshaping_0.4.0
#> [41] Hmisc_5.2-0 vegan_2.6-8
#> [43] beachmat_2.22.0 timechange_0.3.0
#> [45] fansi_1.0.6 mgcv_1.9-1
#> [47] httr_1.4.7 abind_1.4-8
#> [49] compiler_4.4.1 htmlTable_2.4.3
#> [51] backports_1.5.0 BiocParallel_1.40.0
#> [53] viridis_0.6.5 DBI_1.2.3
#> [55] MASS_7.3-61 DelayedArray_0.32.0
#> [57] bluster_1.16.0 permute_0.9-7
#> [59] tools_4.4.1 vipor_0.4.7
#> [61] foreign_0.8-87 beeswarm_0.4.0
#> [63] ape_5.8 nnet_7.3-19
#> [65] glue_1.8.0 nlme_3.1-166
#> [67] grid_4.4.1 checkmate_2.3.2
#> [69] cluster_2.1.6 reshape2_1.4.4
#> [71] lpSolve_5.6.21 generics_0.1.3
#> [73] gtable_0.3.6 mediation_4.5.0
#> [75] tidyr_1.3.1 data.table_1.16.2
#> [77] BiocSingular_1.22.0 ScaledMatrix_1.14.0
#> [79] utf8_1.2.4 ggrepel_0.9.6
#> [81] pillar_1.9.0 stringr_1.5.1
#> [83] yulab.utils_0.1.7 splines_4.4.1
#> [85] treeio_1.30.0 lattice_0.22-6
#> [87] tidyselect_1.2.1 DirichletMultinomial_1.48.0
#> [89] scuttle_1.16.0 gridExtra_2.3
#> [91] bookdown_0.41 xfun_0.48
#> [93] rbiom_1.0.3 stringi_1.8.4
#> [95] UCSC.utils_1.2.0 lazyeval_0.2.2
#> [97] yaml_2.3.10 boot_1.3-31
#> [99] evaluate_1.0.1 codetools_0.2-20
#> [101] tibble_3.2.1 BiocManager_1.30.25
#> [103] cli_3.6.3 RcppParallel_5.1.9
#> [105] rpart_4.1.23 systemfonts_1.1.0
#> [107] munsell_0.5.1 jquerylib_0.1.4
#> [109] Rcpp_1.0.13 parallel_4.4.1
#> [111] pkgdown_2.1.1 ggplot2_3.5.1
#> [113] sparseMatrixStats_1.18.0 lme4_1.1-35.5
#> [115] slam_0.1-54 mvtnorm_1.3-1
#> [117] decontam_1.26.0 viridisLite_0.4.2
#> [119] tidytree_0.4.6 scales_1.3.0
#> [121] purrr_1.0.2 crayon_1.5.3
#> [123] rlang_1.1.4