David Meyer
david.meyer at wu.ac.at
Wed Feb 8 10:30:16 CET 2012
Confirmed & fixed upstream.
Thanks,
David
On 2012-02-07 18:43, Ali Tofigh wrote:
> Hi,
>
> I'm currently using the R package e1071 to train naive bayes
> classifiers and came across a bug: When the posterior probabilities of
> all classes are small, the result from the predict.naiveBayes function
> become NaNs. This is an issue with the treatment of the
> log-transformed probabilities inside the predict.naiveBayes function.
> Here is an example to demonstrate the problem (you might need to
> increase 'nvar' depending on your machine):
>
> -------------------- 8< --------------------
> N<- 100
> nvar<- 60
> varnames<- paste("v", 1:nvar, sep="")
>
> dat<- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/2, 10, 1))})
> colnames(dat)<- varnames
>
> out<- rep(c("a","b"), each=N/2)
> names(dat)<- varnames
>
> nb<- naiveBayes(x=dat, y=out)
>
> new.dat<- t(rnorm(nvar, 5, 0.1))
> colnames(new.dat)<- varnames
>
> predict(nb, new.dat, type="raw")
> -------------------- 8< --------------------
>
> the results of the last line is usually NaNs. As for the solution:
>
> To protect agains very small numbers, the e1071:::predict.naiveBayes
> function takes the probabilities into log-space and adds instead of
> multiplying probabilities. However, when calculating the posterior
> probabilities of each class (when type = "raw"), the log of the
> probabilities are exponentiated, which defeats the purpose of the
> logspace transformation. I suggest the following change to the code:
>
> Towards the end of the predict.naiveBayes function, you currently do:
>
> L<- exp(L)
> L / sum(L) # this is what is returned
>
> you can instead use
>
> sapply(L, function(lp) {1 / sum(exp(L - lp))})
>
> the above comes from the following equality:
>
> x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) - log(x)))
>
> Best wishes,
> /Ali Tofigh
>
>
>
David Meyer
