Written by
Seewoo Li
on
on
-Data-Analysis-Investigating-Relationships-Between-Variables
Selecting Model
1. Making Table
## freq S E I
## 1 14 1 1 1
## 2 483 1 0 1
## 3 497 0 1 1
## 4 1008 0 0 1
## 5 1105 1 1 0
## 6 411111 1 0 0
## 7 4624 0 1 0
## 8 157342 0 0 0
2. Fitting All Possible Models
fit1 <- glm(freq~S*E*I, family=poisson(link='log'), data=tab)
fit2 <- glm(freq~S+E+I+S:E+S:I+E:I, family=poisson(link='log'), data=tab)
fit3 <- glm(freq~S+E+I+S:E+S:I, family=poisson(link='log'), data=tab)
fit4 <- glm(freq~S+E+I+S:E+E:I, family=poisson(link='log'), data=tab)
fit5 <- glm(freq~S+E+I+S:I+E:I, family=poisson(link='log'), data=tab)
fit6 <- glm(freq~S+E+I+S:E, family=poisson(link='log'), data=tab)
fit7 <- glm(freq~S+E+I+S:I, family=poisson(link='log'), data=tab)
fit8 <- glm(freq~S+E+I+E:I, family=poisson(link='log'), data=tab)
fit9 <- glm(freq~S+E+I, family=poisson(link='log'), data=tab)
3. Deviance and AIC for Selecting the Best Model
devi <- c()
aic <- c()
for(i in 1:9){
devi[i] <- round(eval(parse(text=paste('fit',i,'$deviance', sep=''))),2)
aic[i] <- round(eval(parse(text=paste('fit',i,'$aic', sep=''))),2)
}
model <- c('saturated','homogeneous', 'SE_SI', 'SE_EI', 'SI_EI', 'SE', 'SI', 'EI','Mutual indep')
model_selec <- cbind(model, devi, aic)
model_selec
## model devi aic
## [1,] "saturated" "0" "93"
## [2,] "homogeneous" "2.85" "93.85"
## [3,] "SE_SI" "1680.41" "1769.41"
## [4,] "SE_EI" "1144.64" "1233.64"
## [5,] "SI_EI" "7133.98" "7222.98"
## [6,] "SE" "3567.72" "3654.72"
## [7,] "SI" "9557.06" "9644.06"
## [8,] "EI" "9021.29" "9108.29"
## [9,] "Mutual indep" "11444.38" "11529.37"
4. Model Selection
- Although saturated model has the least AIC among others, it is possible for the model which has homogeneous association being selected for the convenience in interpretation.
- The increment of deviance by 2.85 can be accepted in return for the decrement of degrees of freedom by 1.
Model:logμijk=λ+λSi+λEj+λIk+λSEij+λSIik+λEIjk
Interpretation
The model has homogeneous association.
logOddsRatioSE(k)=logμ11(k)μ00(k)μ10(k)μ01(k)=λSE11, for all k=0,1
logOddsRatioSI(j)=logμ1(j)1μ0(j)0μ1(j)0μ0(j)1=λSI11, for all j=0,1
logOddsRatioEI(i)=logμ(i)11μ(i)00μ(i)10μ(i)01=λEI11, for all i=0,1
Each pair of variables S, E, I is not independent, but its relation is not affected by the leftover.