Introduction

miaTime implements tools for time series manipulation based on the TreeSummarizedExperiment (TreeSE) data container. Much of the functionality is also applicable to the SummarizedExperiment data objects. This tutorial shows how to use miaTime methods as well as the broader R/Bioconductor ecosystem to manipulate time series data.

Check also the related package TimeSeriesExperiment.

Installation

miaTime is hosted on Bioconductor, and can be installed using via BiocManager.

BiocManager::install("miaTime")

Once installed, miaTime is made available in the usual way.

library(miaTime)
#> Loading required package: mia
#> Loading required package: MultiAssayExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
#>     table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
#> Loading required package: SingleCellExperiment
#> Loading required package: TreeSummarizedExperiment
#> Loading required package: Biostrings
#> Loading required package: XVector
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
#> This is mia version 1.15.3
#> - Online documentation and vignettes: https://microbiome.github.io/mia/
#> - Online book 'Orchestrating Microbiome Analysis (OMA)': https://microbiome.github.io/OMA/docs/devel/

Sorting samples

To sort data based on subject and time point in base R, you can use the order() function.

data(hitchip1006)
tse <- hitchip1006

index <- order(tse[["subject"]], tse[["time"]])
tse <- tse[ , index]

Storing time information with period class

miaTime utilizes the functions available in the package lubridate to convert time series field to period class object. This gives access to a number of readily available time series manipulation tools.

Load example data:

# Load packages
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:Biostrings':
#> 
#>     intersect, setdiff, union
#> The following objects are masked from 'package:GenomicRanges':
#> 
#>     intersect, setdiff, union
#> The following object is masked from 'package:GenomeInfoDb':
#> 
#>     intersect
#> The following objects are masked from 'package:IRanges':
#> 
#>     %within%, intersect, setdiff, union
#> The following objects are masked from 'package:S4Vectors':
#> 
#>     intersect, second, second<-, setdiff, union
#> The following objects are masked from 'package:BiocGenerics':
#> 
#>     intersect, setdiff, union
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# Time is given in days in the demo data.
# Convert days to seconds
time_in_seconds <- 60*60*24*tse[["time"]]
# Convert the time data to period class
seconds <- as.period(time_in_seconds, unit = "sec")
# Check the output
seconds |> tail()
#> [1] "0S" "0S" "0S" "0S" "0S" "0S"

Conversion between time units

The time field in days is now shown in seconds. It can then be converted to many different units using the lubridate package.

hours <- as.period(seconds, unit = "hour")
hours |> tail()
#> [1] "0S" "0S" "0S" "0S" "0S" "0S"

The updated time information can then be added to the SummarizedExperiment data object as a new colData (sample data) field.

colData(tse)$time_sec <- seconds
colData(tse)
#> DataFrame with 1151 rows and 11 columns
#>                   age      sex nationality DNA_extraction_method  project
#>             <integer> <factor>    <factor>              <factor> <factor>
#> Sample-1           28   male            US                    NA        1
#> Sample-2           24   female          US                    NA        1
#> Sample-3           52   male            US                    NA        1
#> Sample-4           22   female          US                    NA        1
#> Sample-5           25   female          US                    NA        1
#> ...               ...      ...         ...                   ...      ...
#> Sample-1002        40   male          UKIE                     r       28
#> Sample-1003        26   female        UKIE                     r       28
#> Sample-1004        45   male          UKIE                     r       28
#> Sample-1005        26   male          UKIE                     r       28
#> Sample-1006        23   female        UKIE                     r       28
#>             diversity   bmi_group  subject      time      sample time_sec
#>             <numeric>    <factor> <factor> <numeric> <character> <Period>
#> Sample-1         5.76 severeobese        1         0    Sample-1       0S
#> Sample-2         6.06 obese              2         0    Sample-2       0S
#> Sample-3         5.50 lean               3         0    Sample-3       0S
#> Sample-4         5.87 underweight        4         0    Sample-4       0S
#> Sample-5         5.89 lean               5         0    Sample-5       0S
#> ...               ...         ...      ...       ...         ...      ...
#> Sample-1002      5.87 lean            1002         0 Sample-1002       0S
#> Sample-1003      6.02 severeobese     1003         0 Sample-1003       0S
#> Sample-1004      5.85 lean            1004         0 Sample-1004       0S
#> Sample-1005      5.71 overweight      1005         0 Sample-1005       0S
#> Sample-1006      6.07 lean            1006         0 Sample-1006       0S

Calculating time differences

The lubridate::as.duration() function helps to specify time points as duration.

duration <- as.duration(seconds)
duration |> tail()
#> [1] "0s" "0s" "0s" "0s" "0s" "0s"

The difference between subsequent time points can then be calculated.

time_diff <- diff(duration)
time_diff <- c(NA, time_diff)
time_diff |> tail()
#> [1] 0 0 0 0 0 0

The time difference from a selected point to the other time points can be calculated as follows.

# Difference from second time point
time_diff <- hours - sort(unique(hours))[[2]]
time_diff |> tail()
#> [1] "-1.16415321826935e-10S" "-1.16415321826935e-10S" "-1.16415321826935e-10S"
#> [4] "-1.16415321826935e-10S" "-1.16415321826935e-10S" "-1.16415321826935e-10S"

Time point rank

Rank of the time points can be calculated by rank function provided in base R.

tse[["time"]] <- rank(tse[["time"]])
colData(tse)
#> DataFrame with 1151 rows and 11 columns
#>                   age      sex nationality DNA_extraction_method  project
#>             <integer> <factor>    <factor>              <factor> <factor>
#> Sample-1           28   male            US                    NA        1
#> Sample-2           24   female          US                    NA        1
#> Sample-3           52   male            US                    NA        1
#> Sample-4           22   female          US                    NA        1
#> Sample-5           25   female          US                    NA        1
#> ...               ...      ...         ...                   ...      ...
#> Sample-1002        40   male          UKIE                     r       28
#> Sample-1003        26   female        UKIE                     r       28
#> Sample-1004        45   male          UKIE                     r       28
#> Sample-1005        26   male          UKIE                     r       28
#> Sample-1006        23   female        UKIE                     r       28
#>             diversity   bmi_group  subject      time      sample time_sec
#>             <numeric>    <factor> <factor> <numeric> <character> <Period>
#> Sample-1         5.76 severeobese        1     503.5    Sample-1       0S
#> Sample-2         6.06 obese              2     503.5    Sample-2       0S
#> Sample-3         5.50 lean               3     503.5    Sample-3       0S
#> Sample-4         5.87 underweight        4     503.5    Sample-4       0S
#> Sample-5         5.89 lean               5     503.5    Sample-5       0S
#> ...               ...         ...      ...       ...         ...      ...
#> Sample-1002      5.87 lean            1002     503.5 Sample-1002       0S
#> Sample-1003      6.02 severeobese     1003     503.5 Sample-1003       0S
#> Sample-1004      5.85 lean            1004     503.5 Sample-1004       0S
#> Sample-1005      5.71 overweight      1005     503.5 Sample-1005       0S
#> Sample-1006      6.07 lean            1006     503.5 Sample-1006       0S

Operations per unit

Sometimes we need to operate on time series per unit (subject, reaction chamber, sampling location, …).

Add time point rank per subject.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:mia':
#> 
#>     full_join, inner_join, left_join, right_join
#> The following objects are masked from 'package:Biostrings':
#> 
#>     collapse, intersect, setdiff, setequal, union
#> The following object is masked from 'package:XVector':
#> 
#>     slice
#> The following object is masked from 'package:Biobase':
#> 
#>     combine
#> The following objects are masked from 'package:GenomicRanges':
#> 
#>     intersect, setdiff, union
#> The following object is masked from 'package:GenomeInfoDb':
#> 
#>     intersect
#> The following objects are masked from 'package:IRanges':
#> 
#>     collapse, desc, intersect, setdiff, slice, union
#> The following objects are masked from 'package:S4Vectors':
#> 
#>     first, intersect, rename, setdiff, setequal, union
#> The following objects are masked from 'package:BiocGenerics':
#> 
#>     combine, intersect, setdiff, union
#> The following object is masked from 'package:matrixStats':
#> 
#>     count
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

colData(tse) <- colData(tse) |>
   as.data.frame() |>
   group_by(subject) |>
   mutate(rank = rank(time, ties.method = "average")) |>
   DataFrame()

Subset to baseline samples

TreeSE consists of rows for features and columns for samples. If we are specifically interested in baseline samples, we can easily subset the data as follows.

tse <- tse[, tse$time==0]

Session info

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] dplyr_1.1.4                     lubridate_1.9.3                
#>  [3] miaTime_0.99.0                  mia_1.15.3                     
#>  [5] TreeSummarizedExperiment_2.14.0 Biostrings_2.74.0              
#>  [7] XVector_0.46.0                  SingleCellExperiment_1.28.0    
#>  [9] MultiAssayExperiment_1.32.0     SummarizedExperiment_1.36.0    
#> [11] Biobase_2.66.0                  GenomicRanges_1.58.0           
#> [13] GenomeInfoDb_1.42.0             IRanges_2.40.0                 
#> [15] S4Vectors_0.44.0                BiocGenerics_0.52.0            
#> [17] MatrixGenerics_1.18.0           matrixStats_1.4.1              
#> [19] knitr_1.48                      BiocStyle_2.34.0               
#> 
#> loaded via a namespace (and not attached):
#>   [1] rstudioapi_0.17.1           jsonlite_1.8.9             
#>   [3] magrittr_2.0.3              ggbeeswarm_0.7.2           
#>   [5] nloptr_2.1.1                rmarkdown_2.28             
#>   [7] fs_1.6.4                    zlibbioc_1.52.0            
#>   [9] ragg_1.3.3                  vctrs_0.6.5                
#>  [11] minqa_1.2.8                 DelayedMatrixStats_1.28.0  
#>  [13] base64enc_0.1-3             htmltools_0.5.8.1          
#>  [15] S4Arrays_1.6.0              BiocNeighbors_2.0.0        
#>  [17] SparseArray_1.6.0           Formula_1.2-5              
#>  [19] sass_0.4.9                  bslib_0.8.0                
#>  [21] htmlwidgets_1.6.4           desc_1.4.3                 
#>  [23] sandwich_3.1-1              plyr_1.8.9                 
#>  [25] DECIPHER_3.2.0              zoo_1.8-12                 
#>  [27] cachem_1.1.0                igraph_2.1.1               
#>  [29] lifecycle_1.0.4             pkgconfig_2.0.3            
#>  [31] rsvd_1.0.5                  Matrix_1.7-1               
#>  [33] R6_2.5.1                    fastmap_1.2.0              
#>  [35] GenomeInfoDbData_1.2.13     digest_0.6.37              
#>  [37] colorspace_2.1-1            scater_1.34.0              
#>  [39] irlba_2.3.5.1               textshaping_0.4.0          
#>  [41] Hmisc_5.2-0                 vegan_2.6-8                
#>  [43] beachmat_2.22.0             timechange_0.3.0           
#>  [45] fansi_1.0.6                 mgcv_1.9-1                 
#>  [47] httr_1.4.7                  abind_1.4-8                
#>  [49] compiler_4.4.1              htmlTable_2.4.3            
#>  [51] backports_1.5.0             BiocParallel_1.40.0        
#>  [53] viridis_0.6.5               DBI_1.2.3                  
#>  [55] MASS_7.3-61                 DelayedArray_0.32.0        
#>  [57] bluster_1.16.0              permute_0.9-7              
#>  [59] tools_4.4.1                 vipor_0.4.7                
#>  [61] foreign_0.8-87              beeswarm_0.4.0             
#>  [63] ape_5.8                     nnet_7.3-19                
#>  [65] glue_1.8.0                  nlme_3.1-166               
#>  [67] grid_4.4.1                  checkmate_2.3.2            
#>  [69] cluster_2.1.6               reshape2_1.4.4             
#>  [71] lpSolve_5.6.21              generics_0.1.3             
#>  [73] gtable_0.3.6                mediation_4.5.0            
#>  [75] tidyr_1.3.1                 data.table_1.16.2          
#>  [77] BiocSingular_1.22.0         ScaledMatrix_1.14.0        
#>  [79] utf8_1.2.4                  ggrepel_0.9.6              
#>  [81] pillar_1.9.0                stringr_1.5.1              
#>  [83] yulab.utils_0.1.7           splines_4.4.1              
#>  [85] treeio_1.30.0               lattice_0.22-6             
#>  [87] tidyselect_1.2.1            DirichletMultinomial_1.48.0
#>  [89] scuttle_1.16.0              gridExtra_2.3              
#>  [91] bookdown_0.41               xfun_0.48                  
#>  [93] rbiom_1.0.3                 stringi_1.8.4              
#>  [95] UCSC.utils_1.2.0            lazyeval_0.2.2             
#>  [97] yaml_2.3.10                 boot_1.3-31                
#>  [99] evaluate_1.0.1              codetools_0.2-20           
#> [101] tibble_3.2.1                BiocManager_1.30.25        
#> [103] cli_3.6.3                   RcppParallel_5.1.9         
#> [105] rpart_4.1.23                systemfonts_1.1.0          
#> [107] munsell_0.5.1               jquerylib_0.1.4            
#> [109] Rcpp_1.0.13                 parallel_4.4.1             
#> [111] pkgdown_2.1.1               ggplot2_3.5.1              
#> [113] sparseMatrixStats_1.18.0    lme4_1.1-35.5              
#> [115] slam_0.1-54                 mvtnorm_1.3-1              
#> [117] decontam_1.26.0             viridisLite_0.4.2          
#> [119] tidytree_0.4.6              scales_1.3.0               
#> [121] purrr_1.0.2                 crayon_1.5.3               
#> [123] rlang_1.1.4