[R] deviance vs entropy
RemoteAPL
remoteapl at obninsk.com
Fri Feb 16 01:24:46 CET 2001
Warren,
Thank you for your answer. It gave some food to my brain. Let me ask more...
> I'm not quite sure what you have in mind, but I'm inferring from your
comments that by "deviance"
> you mean:
>
> -SUM p_i log (p_i/q_i) (or -2 SUM p_i log (p_i/q_i))
I am sorry for my language. I meant in particular those deviance which is
calculated when
we select split of a node building classification tree. As far as I know it
should be:
-2 SUM n_i log (p_i)
where n_i is number of points of class c_i at this node and p_i=n_i/N, where
N is total number
of cases at this node. Probably it refers some way to what you wrote above.
May you tell more
on p_i and q_i in your formula?
> D(p_i||q_i) = - SUM p_i log p_i + SUM p_i log q_i = H(p) - H(p:q)
>
> where H(p) is entropy of p, and H(p:q) is the cross entropy. If q is the
uniform distribution, then
> the cross entropy reduces to:
I probably understand this and the next statements if I understand the first
formula.
> I'm guessing that in the things you've read, when they are talking about
deviance, q can (and
> generally is) something other than the uniform distribution. For example,
p is often the empirical
> distribution of a data sample, and q is the distribution corresponding to
some induced model. Then
> D(p||q) is a measure of how far the model is from the observed data.
It sounds interesting. May you please repeat this in terms of classification
trees? I mean what is
"induced model" and "corresponding distribution" if we are speaking on CART?
> entropy (entropy - cross_entropy, or KL-divergence). Statisticians are
interested in deviance
What "KL" stands for?
> because (with the factor of 2) it is asymptotically chi-square for many
modeling families. In
That's probably the most important argument PRO.
> information theoretic terms it's nice to think of the deviance as the
number of bits extra that it
> would take to transmit the data for a system assuming the distribution q,
relative to a system that
> had assumed p, which is the best system for transmitting that particular
data set.
Very interesting! I must think over this more.
> Then again, maybe I've misunderstood you completely. Please set me
straight if I have.
I see that you sit quite straight. I am afraid that I lie horizontally:-)
Regards,
Alexander.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list