[R] party with mob - parameter estimates not significant in terminal nodes
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Tue Oct 5 15:45:22 CEST 2010
Tudor:
> I successfully model-based partitioned several datasets through the use
> of mob from the party package (thanks Achim et al. once again !!!). At
> times, however, the partitioning leads to terminal nodes in which the
> parameter estimates of the model are not significant (although the split
> points and in general the proposed segmentation both seem reasonable).
There are two aspects to this:
(1) The algorithm just determines whether the coefficients between two
child nodes are significantly different. It may or may not be the case
that they are significantly different from zero within each node. As an
example: You may have a tree with a single split and two child nodes. In
the first child node, you have a highly significant parameter value, but
in the second node, you have no significant value.
(2) Due to partitioning, it may be the case that not all parameters of the
model are identified in all child nodes. Currently, within mob(), this is
not systematically checked. In particular, you may have (quasi-)complete
separation in binomial GLMs if a child node is particularly "pure". This
seems to have happened in your example below. From a machine learning
point of view, this is not a bad thing, you just need to interpret it
correctly.
> As I do not seem to be able to come up with an intuitive
> explanation/interpretation for this (other than that the partitioning
> model may be appropriate for parts of the dataset(s)), I wonder if any
> of you could share your thoughts on this topic with me. For your
> convenience I attached a relevant set of results below.
I guess that the variable "P" is binary and that when you cross-tabulate
it with the response for Node 3, that there are zeros in the contingency
table. I.e. you may have a perfect split in that one sub-sample.
hth,
Z
$`2`
Call:
NULL
Deviance Residuals:
Min 1Q Median
3Q Max
-2.1613499829328759 -0.1182099512510448 0.0000000000000000
0.1199438072333263 1.7963628663418680
Coefficients:
Estimate Std. Error z value
Pr(>|z|)
(Intercept) 38.6736721222665096 5.1182299436934375 7.55606
0.000000000000041545 ***
P -3.8195232976021787 0.5042297985419135 -7.57497
0.000000000000035922 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 407.0806101624161 on 293 degrees of freedom
Residual deviance: 132.0087256781199 on 292 degrees of freedom
AIC: 136.0087256781199
Number of Fisher Scoring iterations: 7
$`3`
Call:
NULL
Deviance Residuals:
Min 1Q Median
3Q Max
-0.00009134433923085110 0.00000000000000000000 0.00000000000000000000
0.00000000000000000000 0.00009204763394325872
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1755.7555999083327 601505.6700290179579 0.00292 0.99767
P -181.3394660743267 62127.5207770660636 -0.00292 0.99767
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 94.20918454290385568583588 on 67 degrees of freedom
Residual deviance: 0.00000001683616309495537 on 66 degrees of freedom
AIC: 4.000000016836163
Number of Fisher Scoring iterations: 25
More information about the R-help
mailing list