
On Wed, 12 Oct 2011, Benilton Carvalho wrote:
Isso deve-se ao fato de vc ter combinacoes dos fatores para as quais vc tem 20 germinacoes (ou falta delas)... Dai', vc tem variabilidade zero para aquele grupo e suas predicoes podem ser bastante viciadas. b
Então, Benilton, eu encontrei um caso em que há uma "incerteza de predicao" da probabilidade, se é que eu entendi direito. Por favor, veja abixo. No entanto, eu não tenho nenhum caso de germinacao das 20 sementes. O máximo foram 10. Por outro lado, tenho casos em que nenhuma germinou. Por ser por isso tambem? Nenhuma germinar? O que poderia ser feito? No final eu colo o sumario da analise. Obrigado y<- c(0,0,0,0,1,1,1,1) x1<-c(1,2,3,3,5,6,10,11) x2<-c(3,2,-1,-1,2,4,1,0) m1<- glm(y~ x1+x2, family=binomial) Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred The only warning message that R gives is right after fitting the logistic model. It says that "fitted probabilities numerically 0 or 1 occurred". Combining this piece of information with the parameter estimate for x1 being really large (>15), we suspect that there is a problem of complete or quasi-complete separation. The standard errors for the parameter estimates are way too large. This usually indicates a convergence issue or some degree of data separation. What is quasi-complete separation and what do some of the most commonly used software packages do when it happens? Quasi-complete separation in a logistic/probit regression happens when the outcome variable separates a predictor variable or a combination of predictor variables to certain degree. Here is an example. Y X1 X2 0 1 3 0 2 0 0 3 -1 0 3 4 1 3 1 1 4 0 1 5 2 1 6 7 1 10 3 1 11 4 Notice that the outcome variable Y separates the predictor variable X1 pretty well except for values of X1 equal to 3. In other words, X1 predicts Y perfectly when X1 <3 (Y = 0) or X1 >3 (Y=1), leaving only when X1 = 3 as cases with uncertainty. In terms of expected probabilities, we have Prob(Y=1 | X1<3) = 0 and Prob(Y=1 | X1>3) = 1, nothing to be estimated, except for Prob(Y = 1 | X1 = 3). What happens when we try to fit a logistic or a probit regression model of Y on X1 and X2 using the data above? It turns out that the maximum likelihood estimate for X1 does not exist. With this example, the larger the parameter for X1, the larger the likelihood. In practice, a value of 15 or larger does not make much difference and they all basically correspond to predicted probability of 1. The behavior of different statistical software packages differ at how they deal with the issue of quasi-complete separation. Below is what each package of SAS, SPSS, Stata and R does with our sample data and the logistic regression model of Y on X1 and X2. We present these results here in the hope that some level of understanding of the behavior of logistic/probit regression within our familiar software package might help us identify the problem of separation more efficiently. summary(gm1) Generalized linear mixed model fit by the Laplace approximation Formula: cbind(Germinacao, 20 - Germinacao) ~ Dia + Especie + Matriz + (Dia | Repeticao) Data: ka.dat AIC BIC logLik deviance 192.6 323.8 -54.3 108.6 Random effects: Groups Name Variance Std.Dev. Corr Repeticao (Intercept) 1.0810e-09 3.2878e-05 Dia2 3.7027e-01 6.0850e-01 0.000 Dia3 4.7946e-01 6.9243e-01 0.000 1.000 Dia4 4.9575e-02 2.2265e-01 0.000 1.000 1.000 Dia5 9.3664e-03 9.6780e-02 0.000 1.000 1.000 1.000 Dia6 3.1308e-02 1.7694e-01 0.000 1.000 1.000 1.000 1.000 Dia7 1.4786e-02 1.2160e-01 0.000 1.000 1.000 1.000 1.000 1.000 Number of obs: 168, groups: Repeticao, 3 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -22.6318 2854.0451 -0.008 0.9937 Dia2 16.4579 2854.0452 0.006 0.9954 Dia3 18.2293 2854.0451 0.006 0.9949 Dia4 19.1952 2854.0451 0.007 0.9946 Dia5 19.4628 2854.0451 0.007 0.9946 Dia6 19.7174 2854.0451 0.007 0.9945 Dia7 19.8392 2854.0451 0.007 0.9945 EspecieVinhatico 17.5891 3788.0180 0.005 0.9963 MatrizM04 -1.4221 0.6517 -2.182 0.0291 * MatrizM05 -19.0112 3788.0181 -0.005 0.9960 MatrizM06 -1.8314 0.7690 -2.382 0.0172 * MatrizM1 -16.8933 3788.0181 -0.004 0.9964 MatrizM12 -19.0114 3788.0180 -0.005 0.9960 MatrizM15CV15 -15.2958 3788.0181 -0.004 0.9968 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Correlation of Fixed Effects: (Intr) Dia2 Dia3 Dia4 Dia5 Dia6 Dia7 EspcVn MtrM04 Dia2 -1.000 Dia3 -1.000 1.000 Dia4 -1.000 1.000 1.000 Dia5 -1.000 1.000 1.000 1.000 Dia6 -1.000 1.000 1.000 1.000 1.000 Dia7 -1.000 1.000 1.000 1.000 1.000 1.000 EspeciVnhtc 0.000 0.000 0.000 0.000 0.000 0.000 0.000 MatrizM04 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 MatrizM05 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -1.000 0.000 MatrizM06 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.174 MatrizM1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -1.000 0.000 MatrizM12 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -1.000 0.000 MtrzM15CV15 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -1.000 0.000 MtrM05 MtrM06 MtrzM1 MtrM12 Dia2 Dia3 Dia4 Dia5 Dia6 Dia7 EspeciVnhtc MatrizM04 MatrizM05 MatrizM06 0.000 MatrizM1 1.000 0.000 MatrizM12 1.000 0.000 1.000 MtrzM15CV15 1.000 0.000 1.000 1.000
-- Marcelo Brazil Linux user number 487797