Chapter 11 Miscellaneous material

11.1 Shapiro-Wilk test

If necessary, it is possible to assess normality of the data with Shapiro-Wilk test.

# Does Shapiro-Wilk test. Does it only for columns that contain abundances, not for
# column that contain Groups.

normality_test_p <- c()

for (column in 
     abundance_analysis_data[, !names(abundance_analysis_data) %in% "patient_status"]){
  # Does Shapiro-Wilk test
  result <- shapiro.test(column)
  
  # Stores p-value to vector
  normality_test_p <- c(normality_test_p, result$p.value)
}

print(paste0("P-values over 0.05: ", sum(normality_test_p>0.05), "/", 
             length(normality_test_p)))
## [1] "P-values over 0.05: 7/54"

11.2 Deseq details

  1. Raw counts are normalized by log-based scaling.
  2. Taxa-wise variance is estimated. These values tell how much each taxa varies between samples.
  3. A curve is fitted over all those taxa-wise variance estimates that we got in the last step.
    This model tells how big the variance is in a specific abundance level.
  4. The model is used to shrink those individual variance estimates to avoid the effect of, e.g., small sample size and higher variance. This reduces the likelihood to get false positives.
  5. Variance estimates are used to compare different groups. We receive a result that shows whether the variance is explained by groups.