Introduction

miaTime implements tools for time series manipulation based on the TreeSummarizedExperiment (Huang 2021) data container. Much of the functionality is also applicable to the SummarizedExperiment (Morgan et al. 2020) data objects. This tutorial shows how to use miaTime methods as well as the broader R/Bioconductor ecosystem to manipulate time series data.

Installation

Installing the latest development version in R.

library(devtools)
devtools::install_github("microbiome/miaTime")

Loading the package:

Sorting samples

The tidySingleCellExperiment package provides handy functions for (Tree)SE manipulation.

library("tidySingleCellExperiment")
data("hitchip1006")
tse <- hitchip1006
tse2 <- tse %>% tidySingleCellExperiment::arrange(subject, time)

Storing time information with Period class

miaTime utilizes the functions available in the package lubridate to convert time series field to “Period” class object. This gives access to a number of readily available time series manipulation tools.

Load example data:

# Load packages
library(miaTime)
library(lubridate)
library(SummarizedExperiment)

# Load demo data
data(hitchip1006)
tse <- hitchip1006

# Time is given in days in the demo data.
# Convert days to seconds
time_in_seconds <- 60*60*24*colData(tse)[,"time"]
# Convert the time data to period class
Seconds <- as.period(time_in_seconds, unit="sec")
# Check the output
Seconds[1140:1151]
##  [1] "492480S" "198720S" "673920S" "198720S" "198720S" "708480S" "198720S"
##  [8] "699840S" "198720S" "708480S" "181440S" "682560S"

Conversion between time units

The time field in days is now shown in seconds. It can then be converted to many different units using the lubridate package.

Hours <- as.period(Seconds, unit = "hour")
Hours[1140:1151]
##  [1] "136H 48M 0S"                "55H 11M 59.9999999999709S" 
##  [3] "187H 12M 0S"                "55H 11M 59.9999999999709S" 
##  [5] "55H 11M 59.9999999999709S"  "196H 47M 59.9999999998836S"
##  [7] "55H 11M 59.9999999999709S"  "194H 24M 0S"               
##  [9] "55H 11M 59.9999999999709S"  "196H 47M 59.9999999998836S"
## [11] "50H 24M 0S"                 "189H 36M 0S"

The updated time information can then be added to the SummarizedExperiment data object as a new colData (sample data) field.

colData(tse)$timeSec <- Seconds
colData(tse)
## DataFrame with 1151 rows and 11 columns
##                   age      sex nationality DNA_extraction_method  project
##             <integer> <factor>    <factor>              <factor> <factor>
## Sample-1           28   male            US                    NA        1
## Sample-2           24   female          US                    NA        1
## Sample-3           52   male            US                    NA        1
## Sample-4           22   female          US                    NA        1
## Sample-5           25   female          US                    NA        1
## ...               ...      ...         ...                   ...      ...
## Sample-1168        50   female Scandinavia                     r       40
## Sample-1169        31   female Scandinavia                     r       40
## Sample-1170        31   female Scandinavia                     r       40
## Sample-1171        52   male   Scandinavia                     r       40
## Sample-1172        52   male   Scandinavia                     r       40
##             diversity   bmi_group  subject      time      sample  timeSec
##             <numeric>    <factor> <factor> <numeric> <character> <Period>
## Sample-1         5.76 severeobese        1         0    Sample-1       0S
## Sample-2         6.06 obese              2         0    Sample-2       0S
## Sample-3         5.50 lean               3         0    Sample-3       0S
## Sample-4         5.87 underweight        4         0    Sample-4       0S
## Sample-5         5.89 lean               5         0    Sample-5       0S
## ...               ...         ...      ...       ...         ...      ...
## Sample-1168      5.87 severeobese      244       8.1 Sample-1168  699840S
## Sample-1169      5.87 overweight       245       2.3 Sample-1169  198720S
## Sample-1170      5.92 overweight       245       8.2 Sample-1170  708480S
## Sample-1171      6.04 overweight       246       2.1 Sample-1171  181440S
## Sample-1172      5.74 overweight       246       7.9 Sample-1172  682560S

Calculating time differences

The function helps to specify time points as durations.

Duration <- as.duration(Seconds)
Duration[1140:1151]
##  [1] "492480s (~5.7 days)"   "198720s (~2.3 days)"   "673920s (~1.11 weeks)"
##  [4] "198720s (~2.3 days)"   "198720s (~2.3 days)"   "708480s (~1.17 weeks)"
##  [7] "198720s (~2.3 days)"   "699840s (~1.16 weeks)" "198720s (~2.3 days)"  
## [10] "708480s (~1.17 weeks)" "181440s (~2.1 days)"   "682560s (~1.13 weeks)"

The difference between subsequent time points can then be calculated.

Timediff <- diff(Duration)
Timediff <- c(NA, Timediff)
Timediff[1140:1151]
##  [1] -216000 -293760  475200 -475200       0  509760 -509760  501120 -501120
## [10]  509760 -527040  501120

The time difference from a selected point to the other time points can be calculated as follows.

base <- Hours - Hours[1] #distance from starting point
base[1140:1151]
##  [1] "136H 48M 0S"                "55H 11M 59.9999999999709S" 
##  [3] "187H 12M 0S"                "55H 11M 59.9999999999709S" 
##  [5] "55H 11M 59.9999999999709S"  "196H 47M 59.9999999998836S"
##  [7] "55H 11M 59.9999999999709S"  "194H 24M 0S"               
##  [9] "55H 11M 59.9999999999709S"  "196H 47M 59.9999999998836S"
## [11] "50H 24M 0S"                 "189H 36M 0S"
base_1140 <- Seconds - Seconds[1140]
base_1140[1140:1151]
##  [1] "0S"       "-293760S" "181440S"  "-293760S" "-293760S" "216000S" 
##  [7] "-293760S" "207360S"  "-293760S" "216000S"  "-311040S" "190080S"

Time point rank

Rank of the time points can be calculated by rank function provided in base R.

colData(tse)$rank <- rank(colData(tse)$time)
colData(tse)
## DataFrame with 1151 rows and 12 columns
##                   age      sex nationality DNA_extraction_method  project
##             <integer> <factor>    <factor>              <factor> <factor>
## Sample-1           28   male            US                    NA        1
## Sample-2           24   female          US                    NA        1
## Sample-3           52   male            US                    NA        1
## Sample-4           22   female          US                    NA        1
## Sample-5           25   female          US                    NA        1
## ...               ...      ...         ...                   ...      ...
## Sample-1168        50   female Scandinavia                     r       40
## Sample-1169        31   female Scandinavia                     r       40
## Sample-1170        31   female Scandinavia                     r       40
## Sample-1171        52   male   Scandinavia                     r       40
## Sample-1172        52   male   Scandinavia                     r       40
##             diversity   bmi_group  subject      time      sample  timeSec
##             <numeric>    <factor> <factor> <numeric> <character> <Period>
## Sample-1         5.76 severeobese        1         0    Sample-1       0S
## Sample-2         6.06 obese              2         0    Sample-2       0S
## Sample-3         5.50 lean               3         0    Sample-3       0S
## Sample-4         5.87 underweight        4         0    Sample-4       0S
## Sample-5         5.89 lean               5         0    Sample-5       0S
## ...               ...         ...      ...       ...         ...      ...
## Sample-1168      5.87 severeobese      244       8.1 Sample-1168  699840S
## Sample-1169      5.87 overweight       245       2.3 Sample-1169  198720S
## Sample-1170      5.92 overweight       245       8.2 Sample-1170  708480S
## Sample-1171      6.04 overweight       246       2.1 Sample-1171  181440S
## Sample-1172      5.74 overweight       246       7.9 Sample-1172  682560S
##                  rank
##             <numeric>
## Sample-1        503.5
## Sample-2        503.5
## Sample-3        503.5
## Sample-4        503.5
## Sample-5        503.5
## ...               ...
## Sample-1168    1128.5
## Sample-1169    1077.0
## Sample-1170    1132.5
## Sample-1171    1067.0
## Sample-1172    1127.0

Operations per unit

Sometimes we need to operate on time series per unit (subject, reaction chamber, sampling location, …).

Add time point rank per subject.

library(dplyr)
colData(tse) <- colData(tse) %>%
   as.data.frame() %>%
   group_by(subject) %>%
   mutate(rank = rank(time, ties.method="average")) %>%
   DataFrame()

Subset to baseline samples

## 
## Attaching package: 'tidySummarizedExperiment'
## The following objects are masked from 'package:tidySingleCellExperiment':
## 
##     bind_cols, bind_rows, count, plot_ly, tidy
## The following objects are masked from 'package:ttservice':
## 
##     bind_cols, bind_rows
## The following objects are masked from 'package:dplyr':
## 
##     bind_cols, bind_rows, count
## The following objects are masked from 'package:mia':
## 
##     full_join, inner_join, left_join, right_join
## The following object is masked from 'package:XVector':
## 
##     slice
## The following object is masked from 'package:IRanges':
## 
##     slice
## The following object is masked from 'package:S4Vectors':
## 
##     rename
## The following object is masked from 'package:matrixStats':
## 
##     count
## The following object is masked from 'package:stats':
## 
##     filter
# Pick samples with time point 0
tse <- hitchip1006 |> filter(time == 0)
# Or: tse <- tse[, tse$time==0]

# Sample with the smallest time point within each subject
colData(tse) <- colData(tse) %>%
   as.data.frame() %>%
   group_by(subject) %>%
   mutate(rank = rank(time, ties.method="average")) %>%
   DataFrame

# Pick the subset including first time point per subject
tse1 <- tse[, tse$rank == 1]

Session info

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] tidySummarizedExperiment_1.7.0 tidySingleCellExperiment_1.7.0
##  [3] ttservice_0.2.2                lubridate_1.8.0               
##  [5] dplyr_1.0.10                   miaTime_0.1.15                
##  [7] mia_1.5.17                     MultiAssayExperiment_1.23.9   
##  [9] TreeSummarizedExperiment_2.1.4 Biostrings_2.65.6             
## [11] XVector_0.37.1                 SingleCellExperiment_1.19.1   
## [13] SummarizedExperiment_1.27.3    Biobase_2.57.1                
## [15] GenomicRanges_1.49.1           GenomeInfoDb_1.33.8           
## [17] IRanges_2.31.2                 S4Vectors_0.35.4              
## [19] BiocGenerics_0.43.4            MatrixGenerics_1.9.1          
## [21] matrixStats_0.62.0             BiocStyle_2.25.0              
## 
## loaded via a namespace (and not attached):
##   [1] circlize_0.4.15             systemfonts_1.0.4          
##   [3] plyr_1.8.7                  lazyeval_0.2.2             
##   [5] splines_4.2.1               BiocParallel_1.31.13       
##   [7] ggplot2_3.3.6               scater_1.25.7              
##   [9] sva_3.45.0                  digest_0.6.29              
##  [11] foreach_1.5.2               yulab.utils_0.0.5          
##  [13] htmltools_0.5.3             viridis_0.6.2              
##  [15] fansi_1.0.3                 magrittr_2.0.3             
##  [17] memoise_2.0.1               ScaledMatrix_1.5.1         
##  [19] cluster_2.1.4               doParallel_1.0.17          
##  [21] DECIPHER_2.25.3             openxlsx_4.2.5             
##  [23] limma_3.53.10               annotate_1.75.0            
##  [25] ComplexHeatmap_2.13.1       SEtools_1.11.1             
##  [27] pkgdown_2.0.6               colorspace_2.0-3           
##  [29] blob_1.2.3                  ggrepel_0.9.1              
##  [31] textshaping_0.3.6           xfun_0.33                  
##  [33] crayon_1.5.2                RCurl_1.98-1.9             
##  [35] jsonlite_1.8.2              genefilter_1.79.0          
##  [37] survival_3.4-0              iterators_1.0.14           
##  [39] ape_5.6-2                   glue_1.6.2                 
##  [41] registry_0.5-1              gtable_0.3.1               
##  [43] zlibbioc_1.43.0             GetoptLong_1.0.5           
##  [45] DelayedArray_0.23.2         V8_4.2.1                   
##  [47] BiocSingular_1.13.1         shape_1.4.6                
##  [49] scales_1.2.1                pheatmap_1.0.12            
##  [51] edgeR_3.39.6                DBI_1.1.3                  
##  [53] randomcoloR_1.1.0.1         Rcpp_1.0.9                 
##  [55] xtable_1.8-4                viridisLite_0.4.1          
##  [57] decontam_1.17.0             clue_0.3-61                
##  [59] tidytree_0.4.1              bit_4.0.4                  
##  [61] rsvd_1.0.5                  htmlwidgets_1.5.4          
##  [63] httr_1.4.4                  RColorBrewer_1.1-3         
##  [65] ellipsis_0.3.2              XML_3.99-0.11              
##  [67] pkgconfig_2.0.3             scuttle_1.7.4              
##  [69] sass_0.4.2                  locfit_1.5-9.6             
##  [71] utf8_1.2.2                  AnnotationDbi_1.59.1       
##  [73] tidyselect_1.2.0            rlang_1.0.6                
##  [75] reshape2_1.4.4              munsell_0.5.0              
##  [77] tools_4.2.1                 cachem_1.0.6               
##  [79] cli_3.4.1                   DirichletMultinomial_1.39.0
##  [81] generics_0.1.3              RSQLite_2.2.18             
##  [83] evaluate_0.17               stringr_1.4.1              
##  [85] fastmap_1.1.0               yaml_2.3.5                 
##  [87] ragg_1.2.3                  knitr_1.40                 
##  [89] bit64_4.0.5                 fs_1.5.2                   
##  [91] zip_2.2.1                   purrr_0.3.5                
##  [93] KEGGREST_1.37.3             nlme_3.1-159               
##  [95] sparseMatrixStats_1.9.0     compiler_4.2.1             
##  [97] plotly_4.10.0               beeswarm_0.4.0             
##  [99] curl_4.3.3                  png_0.1-7                  
## [101] treeio_1.21.2               geneplotter_1.75.0         
## [103] tibble_3.1.8                bslib_0.4.0                
## [105] stringi_1.7.8               desc_1.4.2                 
## [107] lattice_0.20-45             Matrix_1.5-1               
## [109] sechm_1.5.1                 vegan_2.6-5                
## [111] permute_0.9-7               vctrs_0.4.2                
## [113] pillar_1.8.1                lifecycle_1.0.3            
## [115] BiocManager_1.30.18         jquerylib_0.1.4            
## [117] GlobalOptions_0.1.2         BiocNeighbors_1.15.1       
## [119] data.table_1.14.2           bitops_1.0-7               
## [121] irlba_2.3.5.1               seriation_1.3.6            
## [123] R6_2.5.1                    bookdown_0.29              
## [125] TSP_1.2-1                   gridExtra_2.3              
## [127] vipor_0.4.5                 codetools_0.2-18           
## [129] MASS_7.3-58.1               DESeq2_1.37.6              
## [131] rprojroot_2.0.3             rjson_0.2.21               
## [133] withr_2.5.0                 GenomeInfoDbData_1.2.9     
## [135] mgcv_1.8-40                 parallel_4.2.1             
## [137] grid_4.2.1                  beachmat_2.13.4            
## [139] tidyr_1.2.1                 rmarkdown_2.17             
## [141] DelayedMatrixStats_1.19.2   Rtsne_0.16                 
## [143] ggbeeswarm_0.6.0
Huang, Ruizhu. 2021. TreeSummarizedExperiment: TreeSummarizedExperiment: A S4 Class for Data with Tree Structures.
Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2020. SummarizedExperiment: SummarizedExperiment Container. https://bioconductor.org/packages/SummarizedExperiment.