Written by
Seewoo Li
on
on
-Data-Analysis-Investigating-Relationships-Between-Variables
Selecting Model
1. Making Table
## freq S E I
## 1 14 1 1 1
## 2 483 1 0 1
## 3 497 0 1 1
## 4 1008 0 0 1
## 5 1105 1 1 0
## 6 411111 1 0 0
## 7 4624 0 1 0
## 8 157342 0 0 0
2. Fitting All Possible Models
fit1 <- glm(freq~S*E*I, family=poisson(link='log'), data=tab)
fit2 <- glm(freq~S+E+I+S:E+S:I+E:I, family=poisson(link='log'), data=tab)
fit3 <- glm(freq~S+E+I+S:E+S:I, family=poisson(link='log'), data=tab)
fit4 <- glm(freq~S+E+I+S:E+E:I, family=poisson(link='log'), data=tab)
fit5 <- glm(freq~S+E+I+S:I+E:I, family=poisson(link='log'), data=tab)
fit6 <- glm(freq~S+E+I+S:E, family=poisson(link='log'), data=tab)
fit7 <- glm(freq~S+E+I+S:I, family=poisson(link='log'), data=tab)
fit8 <- glm(freq~S+E+I+E:I, family=poisson(link='log'), data=tab)
fit9 <- glm(freq~S+E+I, family=poisson(link='log'), data=tab)
3. Deviance and AIC for Selecting the Best Model
devi <- c()
aic <- c()
for(i in 1:9){
devi[i] <- round(eval(parse(text=paste('fit',i,'$deviance', sep=''))),2)
aic[i] <- round(eval(parse(text=paste('fit',i,'$aic', sep=''))),2)
}
model <- c('saturated','homogeneous', 'SE_SI', 'SE_EI', 'SI_EI', 'SE', 'SI', 'EI','Mutual indep')
model_selec <- cbind(model, devi, aic)
model_selec
## model devi aic
## [1,] "saturated" "0" "93"
## [2,] "homogeneous" "2.85" "93.85"
## [3,] "SE_SI" "1680.41" "1769.41"
## [4,] "SE_EI" "1144.64" "1233.64"
## [5,] "SI_EI" "7133.98" "7222.98"
## [6,] "SE" "3567.72" "3654.72"
## [7,] "SI" "9557.06" "9644.06"
## [8,] "EI" "9021.29" "9108.29"
## [9,] "Mutual indep" "11444.38" "11529.37"
4. Model Selection
- Although saturated model has the least AIC among others, it is possible for the model which has homogeneous association being selected for the convenience in interpretation.
- The increment of deviance by 2.85 can be accepted in return for the decrement of degrees of freedom by 1. \[Model : log{\mu_{ijk}}=\lambda+\lambda_i^S+\lambda_j^E+\lambda_k^I+\lambda_{ij}^{SE}+\lambda_{ik}^{SI}+\lambda_{jk}^{EI}\]
Interpretation
The model has homogeneous association.
\(log{OddsRatio_{SE(k)}}=log{\frac{\mu_{11(k)}\mu_{00(k)}}{\mu_{10(k)}\mu_{01(k)}}}=\lambda_{11}^{SE}\), for all k=0,1
\(log{OddsRatio_{SI(j)}}=log{\frac{\mu_{1(j)1}\mu_{0(j)0}}{\mu_{1(j)0}\mu_{0(j)1}}}=\lambda_{11}^{SI}\), for all j=0,1
\(log{OddsRatio_{EI(i)}}=log{\frac{\mu_{(i)11}\mu_{(i)00}}{\mu_{(i)10}\mu_{(i)01}}}=\lambda_{11}^{EI}\), for all i=0,1
Each pair of variables S, E, I is not independent, but its relation is not affected by the leftover.