[R] tests for significance on conditional inference trees from party package

Achim Zeileis Achim.Zeileis at uibk.ac.at
Tue Dec 13 21:22:43 CET 2016


thanks for your interest.

On Tue, 13 Dec 2016, Adrian Johnson wrote:

> Dear group,
> Please allow me to ask a naive question and pardon if it is qualified
> as stupid question.
> I am using party package to classify covariates and predict distribution 
> of survival times for the classified variables. Typically I have a 
> matrix of covariates (columns) including outcome data (overall survival 
> in months, censor status) and other covariates I want to split in tree 
> (such as treatment dose etc. ) . Rows are patients (~1000 patients).
> Now similarly I have many such matrices (4K)  with completely different 
> set of covariates but identical outcome data and patients (in rows). i 
> cannot combine all data into a giant matrix,because these covariates are 
> totally independent.

If the response variable is the same and the patients are the same, then I 
don't see why - conceptionally - you couldn't combine "totally 
independent" variables in the same tree. Or maybe I misunderstand what 
"totally independent" is.

Practically - however, choosing a tree from 4,000 regressor variables will 
be challenging, especially if you want to adjust in some way for the 
multiple testing. So maybe some additional structure would help here.

> Currently I am running this model in a loop and storing the tree and
> parsing the tree structure.

Parsing the tree structure is quite cumbersome in the old "party" 
implementation. This was one of the main motivations to establish the 
reimplementation in "partykit". This has a much better and more accessible 
tree infrastructure. See the vignettes in the "partykit" package for more 
details - especially vignette("partykit", package = "partykit") gives a 
good overview of the building blocks.

Additionally, over at StackOverflow you can find various additional 
bits and pieces that may be helpful. Look for the "party" tag.

Finally, there is also a partykit support forum on R-Forge.

> My question is, is there some testing method to choose or rank these 4K 
> trees such that I can select each tree from top to bottom. I know each 
> tree is important in its own way.

It is not clear to me what/how you want to rank the results. However, 
looking at the sources of information listed above might take you a few 
steps further.

> If selection based on significance is required, then is there any other 
> way instead of conditional inference tree , that partitions data but 
> will also carry some significance to choose from.

The MOB (model-based recursive partitioning) algorithm is also based on 
significance tests and implemented in the "partykit" package. It uses 
parametric asymptotic inference rather than nonparametric conditional 
inference. Otherwise the two approaches are very similar in many respects.

Hope that helps,

More information about the R-help mailing list