-Data-Analysis-Investigating-Relationships-Between-Variables

Selecting Model

1. Making Table

##     freq S E I
## 1     14 1 1 1
## 2    483 1 0 1
## 3    497 0 1 1
## 4   1008 0 0 1
## 5   1105 1 1 0
## 6 411111 1 0 0
## 7   4624 0 1 0
## 8 157342 0 0 0

2. Fitting All Possible Models

fit1 <- glm(freq~S*E*I, family=poisson(link='log'), data=tab)
fit2 <- glm(freq~S+E+I+S:E+S:I+E:I, family=poisson(link='log'), data=tab)
fit3 <- glm(freq~S+E+I+S:E+S:I, family=poisson(link='log'), data=tab)
fit4 <- glm(freq~S+E+I+S:E+E:I, family=poisson(link='log'), data=tab)
fit5 <- glm(freq~S+E+I+S:I+E:I, family=poisson(link='log'), data=tab)
fit6 <- glm(freq~S+E+I+S:E, family=poisson(link='log'), data=tab)
fit7 <- glm(freq~S+E+I+S:I, family=poisson(link='log'), data=tab)
fit8 <- glm(freq~S+E+I+E:I, family=poisson(link='log'), data=tab)
fit9 <- glm(freq~S+E+I, family=poisson(link='log'), data=tab)

3. Deviance and AIC for Selecting the Best Model

devi <- c()
aic <- c()
for(i in 1:9){
  devi[i] <- round(eval(parse(text=paste('fit',i,'$deviance', sep=''))),2)
  aic[i] <- round(eval(parse(text=paste('fit',i,'$aic', sep=''))),2)
}
model <- c('saturated','homogeneous', 'SE_SI', 'SE_EI', 'SI_EI', 'SE', 'SI', 'EI','Mutual indep')
model_selec <- cbind(model, devi, aic)
model_selec
##       model          devi       aic       
##  [1,] "saturated"    "0"        "93"      
##  [2,] "homogeneous"  "2.85"     "93.85"   
##  [3,] "SE_SI"        "1680.41"  "1769.41" 
##  [4,] "SE_EI"        "1144.64"  "1233.64" 
##  [5,] "SI_EI"        "7133.98"  "7222.98" 
##  [6,] "SE"           "3567.72"  "3654.72" 
##  [7,] "SI"           "9557.06"  "9644.06" 
##  [8,] "EI"           "9021.29"  "9108.29" 
##  [9,] "Mutual indep" "11444.38" "11529.37"

4. Model Selection

  • Although saturated model has the least AIC among others, it is possible for the model which has homogeneous association being selected for the convenience in interpretation.
  • The increment of deviance by 2.85 can be accepted in return for the decrement of degrees of freedom by 1. Model:logμijk=λ+λSi+λEj+λIk+λSEij+λSIik+λEIjk

Interpretation

The model has homogeneous association.

logOddsRatioSE(k)=logμ11(k)μ00(k)μ10(k)μ01(k)=λSE11, for all k=0,1

logOddsRatioSI(j)=logμ1(j)1μ0(j)0μ1(j)0μ0(j)1=λSI11, for all j=0,1

logOddsRatioEI(i)=logμ(i)11μ(i)00μ(i)10μ(i)01=λEI11, for all i=0,1

Each pair of variables S, E, I is not independent, but its relation is not affected by the leftover.

<