Chapter 11 Miscellaneous material
11.1 Shapiro-Wilk test
If necessary, it is possible to assess normality of the data with Shapiro-Wilk test.
# Does Shapiro-Wilk test. Does it only for columns that contain abundances, not for
# column that contain Groups.
<- c()
normality_test_p
for (column in
!names(abundance_analysis_data) %in% "patient_status"]){
abundance_analysis_data[, # Does Shapiro-Wilk test
<- shapiro.test(column)
result
# Stores p-value to vector
<- c(normality_test_p, result$p.value)
normality_test_p
}
print(paste0("P-values over 0.05: ", sum(normality_test_p>0.05), "/",
length(normality_test_p)))
## [1] "P-values over 0.05: 7/54"
11.2 Deseq details
- Raw counts are normalized by log-based scaling.
- Taxa-wise variance is estimated. These values tell how much each taxa varies between samples.
- A curve is fitted over all those taxa-wise variance estimates that we got in the last step.
This model tells how big the variance is in a specific abundance level. - The model is used to shrink those individual variance estimates to avoid the effect of,
e.g., small sample size and higher variance. This reduces the likelihood to get
false positives.
- Variance estimates are used to compare different groups. We receive a result that shows whether the variance is explained by groups.