Chapter 11 Miscellaneous material
11.1 Shapiro-Wilk test
If necessary, it is possible to assess normality of the data with Shapiro-Wilk test.
# Does Shapiro-Wilk test. Does it only for columns that contain abundances, not for
# column that contain Groups.
normality_test_p <- c()
for (column in
abundance_analysis_data[, !names(abundance_analysis_data) %in% "patient_status"]){
# Does Shapiro-Wilk test
result <- shapiro.test(column)
# Stores p-value to vector
normality_test_p <- c(normality_test_p, result$p.value)
}
print(paste0("P-values over 0.05: ", sum(normality_test_p>0.05), "/",
length(normality_test_p)))## [1] "P-values over 0.05: 7/54"
11.2 Deseq details
- Raw counts are normalized by log-based scaling.
- Taxa-wise variance is estimated. These values tell how much each taxa varies between samples.
- A curve is fitted over all those taxa-wise variance estimates that we got in the last step.
This model tells how big the variance is in a specific abundance level. - The model is used to shrink those individual variance estimates to avoid the effect of,
e.g., small sample size and higher variance. This reduces the likelihood to get
false positives.
- Variance estimates are used to compare different groups. We receive a result that shows whether the variance is explained by groups.