[R-br] glm.fit: fitted probabilities numerically 0 or 1 occurred

Marcelo Luiz de Laia marcelolaia em gmail.com
Quarta Outubro 12 19:39:36 BRT 2011


On Wed, 12 Oct 2011, Benilton Carvalho wrote:

> Isso deve-se ao fato de vc ter
> combinacoes dos fatores para as quais vc tem 20 germinacoes (ou falta
> delas)... Dai', vc tem variabilidade zero para aquele grupo e suas
> predicoes podem ser bastante viciadas. b

Então, Benilton, eu encontrei um caso em que há uma "incerteza de
predicao" da probabilidade, se é que eu entendi direito. Por favor, veja
abixo.

No entanto, eu não tenho nenhum caso de germinacao das 20 sementes. O máximo
foram 10. Por outro lado, tenho casos em que nenhuma germinou. Por ser
por isso tambem? Nenhuma germinar?

O que poderia ser feito?

No final eu colo o sumario da analise.

Obrigado

y<- c(0,0,0,0,1,1,1,1)
x1<-c(1,2,3,3,5,6,10,11)
x2<-c(3,2,-1,-1,2,4,1,0)
m1<- glm(y~ x1+x2, family=binomial)

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 

The only warning message that R gives is right after fitting the
logistic model. It  says that "fitted probabilities numerically 0 or 1
occurred". Combining this piece of information with the parameter
estimate for x1 being really large (>15), we suspect that there is a
problem of complete or quasi-complete separation. The standard errors
for the parameter estimates are way too large. This usually indicates
a convergence issue or some degree of data separation.
What is quasi-complete separation and what do some of the most
commonly used software packages do when it happens?

Quasi-complete separation in a logistic/probit regression happens when
the outcome variable separates a predictor variable or a combination
of predictor variables to certain degree. Here is an example.

    Y X1 X2
    0 1  3
    0 2  0
    0 3 -1
    0 3  4
    1 3  1
    1 4  0 
    1 5  2
    1 6  7
    1 10 3
    1 11 4

Notice that the outcome variable Y separates the predictor variable X1
pretty well except for values of X1 equal to 3. In other words, X1
predicts Y perfectly when X1 <3 (Y = 0) or X1 >3 (Y=1), leaving only
when X1 = 3 as cases with uncertainty.  In terms of expected
probabilities, we have Prob(Y=1 | X1<3) = 0 and Prob(Y=1 | X1>3) = 1,
nothing to be estimated, except for Prob(Y = 1 | X1 = 3).

What happens when we try to fit a logistic or a probit regression
model of Y on X1 and X2 using the data above? It turns out that the
maximum likelihood estimate for X1 does not exist. With this example,
the larger the parameter for X1, the larger the likelihood. In
practice, a value of 15 or larger does not make much difference and
they all basically correspond to predicted probability of 1. The
behavior of different statistical software packages differ at how they
deal with the issue of quasi-complete separation. Below is what each
package of SAS, SPSS, Stata and R does with our sample data and the
logistic regression model of Y on X1 and X2. We present these results
here in the hope that some level of understanding of the behavior of
logistic/probit regression within our familiar software package might
help us identify the problem of separation more efficiently.


summary(gm1)

Generalized linear mixed model fit by the Laplace approximation 
Formula: cbind(Germinacao, 20 - Germinacao) ~ Dia + Especie + Matriz +
     (Dia | Repeticao) 
   Data: ka.dat 
   AIC   BIC logLik deviance
 192.6 323.8  -54.3    108.6
Random effects:
 Groups    Name        Variance   Std.Dev.   Corr
    
 Repeticao (Intercept) 1.0810e-09 3.2878e-05
    
           Dia2        3.7027e-01 6.0850e-01 0.000
              
           Dia3        4.7946e-01 6.9243e-01 0.000 1.000
              
           Dia4        4.9575e-02 2.2265e-01 0.000 1.000 1.000
              
           Dia5        9.3664e-03 9.6780e-02 0.000 1.000 1.000 1.000
              
           Dia6        3.1308e-02 1.7694e-01 0.000 1.000 1.000 1.000
           1.000
           Dia7        1.4786e-02 1.2160e-01 0.000 1.000 1.000 1.000
           1.000
       
       
       
       
       
       
       
 1.000 
Number of obs: 168, groups: Repeticao, 3

Fixed effects:
                  Estimate Std. Error z value Pr(>|z|)  
(Intercept)       -22.6318  2854.0451  -0.008   0.9937  
Dia2               16.4579  2854.0452   0.006   0.9954  
Dia3               18.2293  2854.0451   0.006   0.9949  
Dia4               19.1952  2854.0451   0.007   0.9946  
Dia5               19.4628  2854.0451   0.007   0.9946  
Dia6               19.7174  2854.0451   0.007   0.9945  
Dia7               19.8392  2854.0451   0.007   0.9945  
EspecieVinhatico   17.5891  3788.0180   0.005   0.9963  
MatrizM04          -1.4221     0.6517  -2.182   0.0291 *
MatrizM05         -19.0112  3788.0181  -0.005   0.9960  
MatrizM06          -1.8314     0.7690  -2.382   0.0172 *
MatrizM1          -16.8933  3788.0181  -0.004   0.9964  
MatrizM12         -19.0114  3788.0180  -0.005   0.9960  
MatrizM15CV15     -15.2958  3788.0181  -0.004   0.9968  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Correlation of Fixed Effects:
            (Intr) Dia2   Dia3   Dia4   Dia5   Dia6   Dia7   EspcVn
            MtrM04
Dia2        -1.000
   
Dia3        -1.000  1.000
   
Dia4        -1.000  1.000  1.000
   
Dia5        -1.000  1.000  1.000  1.000
   
Dia6        -1.000  1.000  1.000  1.000  1.000
   
Dia7        -1.000  1.000  1.000  1.000  1.000  1.000
   
EspeciVnhtc  0.000  0.000  0.000  0.000  0.000  0.000  0.000
   
MatrizM04    0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
   
MatrizM05    0.000  0.000  0.000  0.000  0.000  0.000  0.000 -1.000
0.000
MatrizM06    0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
0.174
MatrizM1     0.000  0.000  0.000  0.000  0.000  0.000  0.000 -1.000
0.000
MatrizM12    0.000  0.000  0.000  0.000  0.000  0.000  0.000 -1.000
0.000
MtrzM15CV15  0.000  0.000  0.000  0.000  0.000  0.000  0.000 -1.000
0.000
            MtrM05 MtrM06 MtrzM1 MtrM12
Dia2                                   
Dia3                                   
Dia4                                   
Dia5                                   
Dia6                                   
Dia7                                   
EspeciVnhtc                            
MatrizM04                              
MatrizM05                              
MatrizM06    0.000                     
MatrizM1     1.000  0.000              
MatrizM12    1.000  0.000  1.000       
MtrzM15CV15  1.000  0.000  1.000  1.000
> 
-- 
Marcelo
Brazil
Linux user number 487797


Mais detalhes sobre a lista de discussão R-br