[R] using xval in mvpart to specify cross validation groups

Fri Mar 12 23:05:37 CET 2010

Thank you Dennis, I've got the idea now.

However, a followup question to make sure I'm not wasting my time.

If I specify the precise CV folds to use, should I not get the same
tree every time?

e.g. here I have an hypothetical time sequence observed with error
from 3 sites 's'

If I specify to leave out 1 site each time in a 3-fold CV (leaving
aside that 3-fold cv might not be a good idea)

Should I not get the same tree each time?

library(mvpart)
library(lattice)

y <- rep(sin(seq(0.1,6, 0.1)),3)
y1 <- y+rnorm(length(y), sd=0.5)
x <- rep(1:(length(y)/3),3)
s <- rep(1:3, each=(length(y)/3))

dat <- data.frame(x,y1,s)

xyplot(y1~x|s, data=dat)

(mvpart(y1~x, data=dat, xv="1se", xval=s))

Thank you for your help.

andydolman at gmail.com

On 12 March 2010 18:03, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> See inline...
>
> On Fri, Mar 12, 2010 at 4:15 AM, Andrew Dolman <andydolman at gmail.com> wrote:
>>
>> Dear R's
>>
>> I'm trying to use specific rather than random cross-validation groups
>> in mvpart.
>>
>> The man page says:
>> xval Number of cross-validations or vector defining cross-validation
>> groups.
>>
>>
>> And I found this reply to the list by Terry Therneau from 2006
>>
>> The rpart function allows one to give the cross-validation groups
>> explicitly.
>> So if the number of observations was 10, you could use
>>   > rpart( y ~ x1 + x2, data=mydata, xval=c(1,1,2,2,3,3,1,3,2,1))
>> which causes observations 1,2,7, and 10 to be left out of the first xval
>> sample, 3,4, and 9 out of the second, etc.
>>
>>        Terry Therneau
>>
>>
>> I can't see how this string of values, c(1,1,2,2,3,3,1,3,2,1), codes
>> for observations 1,2,7,10 being left out of the 1st and so on.
>
>
>> x <- c(1,1,2,2,3,3,1,3,2,1)
>> which(x == 1)       # elements left out of the first xval sample
> [1]  1  2  7 10
>> which(x == 2)       # elements left out of the second xval sample
> [1] 3 4 9
>> which(x == 3)       # elements left out of the third xval sample
> [1] 5 6 8
>
> This vector is used to index a response vector/model matrix.
>
> To see how this is applied, consider the following. y is a vector of
> length 10, the same as x:
>> y <- rpois(10, 15)
>> y
>  [1] 12 15 17 11 14 14 12 12 16 16
>> y[x != 1]                  # first xval sample (y[1], y[2], y[7], y[10]
>> removed)
> [1] 17 11 14 14 12 16
>> y[x != 2]                  # second xval sample (y[3], y[4], y[9] removed)
> [1] 12 15 14 14 12 12 16
>> y[x != 3]                  # third xval sample (y[5], y[6], y[8] removed)
> [1] 12 15 17 11 12 16 16
>
> Indexing is one of the most important and powerful features of R.
>
> HTH,
> Dennis
>
>> Can anyone fill me in please?
>>
>> Thanks,
>>
>> andydolman at gmail.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>