vignettes/articles/manipulation.Rmd
manipulation.Rmd
miaTime
implements tools for time series manipulation
based on the TreeSummarizedExperiment
(Huang 2021) data container. Much of the
functionality is also applicable to the
SummarizedExperiment
(Morgan et al.
2020) data objects. This tutorial shows how to use
miaTime
methods as well as the broader R/Bioconductor
ecosystem to manipulate time series data.
Installing the latest development version in R.
library(devtools)
devtools::install_github("microbiome/miaTime")
Loading the package:
The tidySingleCellExperiment package provides handy functions for (Tree)SE manipulation.
library("tidySingleCellExperiment")
data("hitchip1006")
tse <- hitchip1006
tse2 <- tse %>% tidySingleCellExperiment::arrange(subject, time)
miaTime
utilizes the functions available in the package
lubridate
to convert time series field to “Period” class
object. This gives access to a number of readily available time
series manipulation tools.
Load example data:
# Load packages
library(miaTime)
library(lubridate)
library(SummarizedExperiment)
# Load demo data
data(hitchip1006)
tse <- hitchip1006
# Time is given in days in the demo data.
# Convert days to seconds
time_in_seconds <- 60*60*24*colData(tse)[,"time"]
# Convert the time data to period class
Seconds <- as.period(time_in_seconds, unit="sec")
# Check the output
Seconds[1140:1151]
## [1] "492480S" "198720S" "673920S" "198720S" "198720S" "708480S" "198720S"
## [8] "699840S" "198720S" "708480S" "181440S" "682560S"
The time field in days is now shown in seconds. It can then be converted to many different units using the lubridate package.
Hours <- as.period(Seconds, unit = "hour")
Hours[1140:1151]
## [1] "136H 48M 0S" "55H 11M 59.9999999999709S"
## [3] "187H 12M 0S" "55H 11M 59.9999999999709S"
## [5] "55H 11M 59.9999999999709S" "196H 47M 59.9999999998836S"
## [7] "55H 11M 59.9999999999709S" "194H 24M 0S"
## [9] "55H 11M 59.9999999999709S" "196H 47M 59.9999999998836S"
## [11] "50H 24M 0S" "189H 36M 0S"
The updated time information can then be added to the
SummarizedExperiment
data object as a new
colData
(sample data) field.
## DataFrame with 1151 rows and 11 columns
## age sex nationality DNA_extraction_method project
## <integer> <factor> <factor> <factor> <factor>
## Sample-1 28 male US NA 1
## Sample-2 24 female US NA 1
## Sample-3 52 male US NA 1
## Sample-4 22 female US NA 1
## Sample-5 25 female US NA 1
## ... ... ... ... ... ...
## Sample-1168 50 female Scandinavia r 40
## Sample-1169 31 female Scandinavia r 40
## Sample-1170 31 female Scandinavia r 40
## Sample-1171 52 male Scandinavia r 40
## Sample-1172 52 male Scandinavia r 40
## diversity bmi_group subject time sample timeSec
## <numeric> <factor> <factor> <numeric> <character> <Period>
## Sample-1 5.76 severeobese 1 0 Sample-1 0S
## Sample-2 6.06 obese 2 0 Sample-2 0S
## Sample-3 5.50 lean 3 0 Sample-3 0S
## Sample-4 5.87 underweight 4 0 Sample-4 0S
## Sample-5 5.89 lean 5 0 Sample-5 0S
## ... ... ... ... ... ... ...
## Sample-1168 5.87 severeobese 244 8.1 Sample-1168 699840S
## Sample-1169 5.87 overweight 245 2.3 Sample-1169 198720S
## Sample-1170 5.92 overweight 245 8.2 Sample-1170 708480S
## Sample-1171 6.04 overweight 246 2.1 Sample-1171 181440S
## Sample-1172 5.74 overweight 246 7.9 Sample-1172 682560S
The function helps to specify time points as durations.
Duration <- as.duration(Seconds)
Duration[1140:1151]
## [1] "492480s (~5.7 days)" "198720s (~2.3 days)" "673920s (~1.11 weeks)"
## [4] "198720s (~2.3 days)" "198720s (~2.3 days)" "708480s (~1.17 weeks)"
## [7] "198720s (~2.3 days)" "699840s (~1.16 weeks)" "198720s (~2.3 days)"
## [10] "708480s (~1.17 weeks)" "181440s (~2.1 days)" "682560s (~1.13 weeks)"
The difference between subsequent time points can then be calculated.
## [1] -216000 -293760 475200 -475200 0 509760 -509760 501120 -501120
## [10] 509760 -527040 501120
The time difference from a selected point to the other time points can be calculated as follows.
base <- Hours - Hours[1] #distance from starting point
base[1140:1151]
## [1] "136H 48M 0S" "55H 11M 59.9999999999709S"
## [3] "187H 12M 0S" "55H 11M 59.9999999999709S"
## [5] "55H 11M 59.9999999999709S" "196H 47M 59.9999999998836S"
## [7] "55H 11M 59.9999999999709S" "194H 24M 0S"
## [9] "55H 11M 59.9999999999709S" "196H 47M 59.9999999998836S"
## [11] "50H 24M 0S" "189H 36M 0S"
base_1140 <- Seconds - Seconds[1140]
base_1140[1140:1151]
## [1] "0S" "-293760S" "181440S" "-293760S" "-293760S" "216000S"
## [7] "-293760S" "207360S" "-293760S" "216000S" "-311040S" "190080S"
Rank of the time points can be calculated by rank
function provided in base R.
## DataFrame with 1151 rows and 12 columns
## age sex nationality DNA_extraction_method project
## <integer> <factor> <factor> <factor> <factor>
## Sample-1 28 male US NA 1
## Sample-2 24 female US NA 1
## Sample-3 52 male US NA 1
## Sample-4 22 female US NA 1
## Sample-5 25 female US NA 1
## ... ... ... ... ... ...
## Sample-1168 50 female Scandinavia r 40
## Sample-1169 31 female Scandinavia r 40
## Sample-1170 31 female Scandinavia r 40
## Sample-1171 52 male Scandinavia r 40
## Sample-1172 52 male Scandinavia r 40
## diversity bmi_group subject time sample timeSec
## <numeric> <factor> <factor> <numeric> <character> <Period>
## Sample-1 5.76 severeobese 1 0 Sample-1 0S
## Sample-2 6.06 obese 2 0 Sample-2 0S
## Sample-3 5.50 lean 3 0 Sample-3 0S
## Sample-4 5.87 underweight 4 0 Sample-4 0S
## Sample-5 5.89 lean 5 0 Sample-5 0S
## ... ... ... ... ... ... ...
## Sample-1168 5.87 severeobese 244 8.1 Sample-1168 699840S
## Sample-1169 5.87 overweight 245 2.3 Sample-1169 198720S
## Sample-1170 5.92 overweight 245 8.2 Sample-1170 708480S
## Sample-1171 6.04 overweight 246 2.1 Sample-1171 181440S
## Sample-1172 5.74 overweight 246 7.9 Sample-1172 682560S
## rank
## <numeric>
## Sample-1 503.5
## Sample-2 503.5
## Sample-3 503.5
## Sample-4 503.5
## Sample-5 503.5
## ... ...
## Sample-1168 1128.5
## Sample-1169 1077.0
## Sample-1170 1132.5
## Sample-1171 1067.0
## Sample-1172 1127.0
Sometimes we need to operate on time series per unit (subject, reaction chamber, sampling location, …).
Add time point rank per subject.
##
## Attaching package: 'tidySummarizedExperiment'
## The following objects are masked from 'package:tidySingleCellExperiment':
##
## bind_cols, bind_rows, count, plot_ly, tidy
## The following objects are masked from 'package:ttservice':
##
## bind_cols, bind_rows
## The following objects are masked from 'package:dplyr':
##
## bind_cols, bind_rows, count
## The following objects are masked from 'package:mia':
##
## full_join, inner_join, left_join, right_join
## The following object is masked from 'package:XVector':
##
## slice
## The following object is masked from 'package:IRanges':
##
## slice
## The following object is masked from 'package:S4Vectors':
##
## rename
## The following object is masked from 'package:matrixStats':
##
## count
## The following object is masked from 'package:stats':
##
## filter
# Pick samples with time point 0
tse <- hitchip1006 |> filter(time == 0)
# Or: tse <- tse[, tse$time==0]
# Sample with the smallest time point within each subject
colData(tse) <- colData(tse) %>%
as.data.frame() %>%
group_by(subject) %>%
mutate(rank = rank(time, ties.method="average")) %>%
DataFrame
# Pick the subset including first time point per subject
tse1 <- tse[, tse$rank == 1]
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] tidySummarizedExperiment_1.7.0 tidySingleCellExperiment_1.7.0
## [3] ttservice_0.2.2 lubridate_1.8.0
## [5] dplyr_1.0.10 miaTime_0.1.15
## [7] mia_1.5.17 MultiAssayExperiment_1.23.9
## [9] TreeSummarizedExperiment_2.1.4 Biostrings_2.65.6
## [11] XVector_0.37.1 SingleCellExperiment_1.19.1
## [13] SummarizedExperiment_1.27.3 Biobase_2.57.1
## [15] GenomicRanges_1.49.1 GenomeInfoDb_1.33.8
## [17] IRanges_2.31.2 S4Vectors_0.35.4
## [19] BiocGenerics_0.43.4 MatrixGenerics_1.9.1
## [21] matrixStats_0.62.0 BiocStyle_2.25.0
##
## loaded via a namespace (and not attached):
## [1] circlize_0.4.15 systemfonts_1.0.4
## [3] plyr_1.8.7 lazyeval_0.2.2
## [5] splines_4.2.1 BiocParallel_1.31.13
## [7] ggplot2_3.3.6 scater_1.25.7
## [9] sva_3.45.0 digest_0.6.29
## [11] foreach_1.5.2 yulab.utils_0.0.5
## [13] htmltools_0.5.3 viridis_0.6.2
## [15] fansi_1.0.3 magrittr_2.0.3
## [17] memoise_2.0.1 ScaledMatrix_1.5.1
## [19] cluster_2.1.4 doParallel_1.0.17
## [21] DECIPHER_2.25.3 openxlsx_4.2.5
## [23] limma_3.53.10 annotate_1.75.0
## [25] ComplexHeatmap_2.13.1 SEtools_1.11.1
## [27] pkgdown_2.0.6 colorspace_2.0-3
## [29] blob_1.2.3 ggrepel_0.9.1
## [31] textshaping_0.3.6 xfun_0.33
## [33] crayon_1.5.2 RCurl_1.98-1.9
## [35] jsonlite_1.8.2 genefilter_1.79.0
## [37] survival_3.4-0 iterators_1.0.14
## [39] ape_5.6-2 glue_1.6.2
## [41] registry_0.5-1 gtable_0.3.1
## [43] zlibbioc_1.43.0 GetoptLong_1.0.5
## [45] DelayedArray_0.23.2 V8_4.2.1
## [47] BiocSingular_1.13.1 shape_1.4.6
## [49] scales_1.2.1 pheatmap_1.0.12
## [51] edgeR_3.39.6 DBI_1.1.3
## [53] randomcoloR_1.1.0.1 Rcpp_1.0.9
## [55] xtable_1.8-4 viridisLite_0.4.1
## [57] decontam_1.17.0 clue_0.3-61
## [59] tidytree_0.4.1 bit_4.0.4
## [61] rsvd_1.0.5 htmlwidgets_1.5.4
## [63] httr_1.4.4 RColorBrewer_1.1-3
## [65] ellipsis_0.3.2 XML_3.99-0.11
## [67] pkgconfig_2.0.3 scuttle_1.7.4
## [69] sass_0.4.2 locfit_1.5-9.6
## [71] utf8_1.2.2 AnnotationDbi_1.59.1
## [73] tidyselect_1.2.0 rlang_1.0.6
## [75] reshape2_1.4.4 munsell_0.5.0
## [77] tools_4.2.1 cachem_1.0.6
## [79] cli_3.4.1 DirichletMultinomial_1.39.0
## [81] generics_0.1.3 RSQLite_2.2.18
## [83] evaluate_0.17 stringr_1.4.1
## [85] fastmap_1.1.0 yaml_2.3.5
## [87] ragg_1.2.3 knitr_1.40
## [89] bit64_4.0.5 fs_1.5.2
## [91] zip_2.2.1 purrr_0.3.5
## [93] KEGGREST_1.37.3 nlme_3.1-159
## [95] sparseMatrixStats_1.9.0 compiler_4.2.1
## [97] plotly_4.10.0 beeswarm_0.4.0
## [99] curl_4.3.3 png_0.1-7
## [101] treeio_1.21.2 geneplotter_1.75.0
## [103] tibble_3.1.8 bslib_0.4.0
## [105] stringi_1.7.8 desc_1.4.2
## [107] lattice_0.20-45 Matrix_1.5-1
## [109] sechm_1.5.1 vegan_2.6-5
## [111] permute_0.9-7 vctrs_0.4.2
## [113] pillar_1.8.1 lifecycle_1.0.3
## [115] BiocManager_1.30.18 jquerylib_0.1.4
## [117] GlobalOptions_0.1.2 BiocNeighbors_1.15.1
## [119] data.table_1.14.2 bitops_1.0-7
## [121] irlba_2.3.5.1 seriation_1.3.6
## [123] R6_2.5.1 bookdown_0.29
## [125] TSP_1.2-1 gridExtra_2.3
## [127] vipor_0.4.5 codetools_0.2-18
## [129] MASS_7.3-58.1 DESeq2_1.37.6
## [131] rprojroot_2.0.3 rjson_0.2.21
## [133] withr_2.5.0 GenomeInfoDbData_1.2.9
## [135] mgcv_1.8-40 parallel_4.2.1
## [137] grid_4.2.1 beachmat_2.13.4
## [139] tidyr_1.2.1 rmarkdown_2.17
## [141] DelayedMatrixStats_1.19.2 Rtsne_0.16
## [143] ggbeeswarm_0.6.0