[R] using xval in mvpart to specify cross validation groups
Andrew Dolman
andydolman at gmail.com
Fri Mar 12 23:05:37 CET 2010
Thank you Dennis, I've got the idea now.
However, a followup question to make sure I'm not wasting my time.
If I specify the precise CV folds to use, should I not get the same
tree every time?
e.g. here I have an hypothetical time sequence observed with error
from 3 sites 's'
If I specify to leave out 1 site each time in a 3-fold CV (leaving
aside that 3-fold cv might not be a good idea)
Should I not get the same tree each time?
library(mvpart)
library(lattice)
y <- rep(sin(seq(0.1,6, 0.1)),3)
y1 <- y+rnorm(length(y), sd=0.5)
x <- rep(1:(length(y)/3),3)
s <- rep(1:3, each=(length(y)/3))
dat <- data.frame(x,y1,s)
xyplot(y1~x|s, data=dat)
(mvpart(y1~x, data=dat, xv="1se", xval=s))
Thank you for your help.
andydolman at gmail.com
On 12 March 2010 18:03, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> See inline...
>
> On Fri, Mar 12, 2010 at 4:15 AM, Andrew Dolman <andydolman at gmail.com> wrote:
>>
>> Dear R's
>>
>> I'm trying to use specific rather than random cross-validation groups
>> in mvpart.
>>
>> The man page says:
>> xval Number of cross-validations or vector defining cross-validation
>> groups.
>>
>>
>> And I found this reply to the list by Terry Therneau from 2006
>>
>> The rpart function allows one to give the cross-validation groups
>> explicitly.
>> So if the number of observations was 10, you could use
>> > rpart( y ~ x1 + x2, data=mydata, xval=c(1,1,2,2,3,3,1,3,2,1))
>> which causes observations 1,2,7, and 10 to be left out of the first xval
>> sample, 3,4, and 9 out of the second, etc.
>>
>> Terry Therneau
>>
>>
>> I can't see how this string of values, c(1,1,2,2,3,3,1,3,2,1), codes
>> for observations 1,2,7,10 being left out of the 1st and so on.
>
>
>> x <- c(1,1,2,2,3,3,1,3,2,1)
>> which(x == 1) # elements left out of the first xval sample
> [1] 1 2 7 10
>> which(x == 2) # elements left out of the second xval sample
> [1] 3 4 9
>> which(x == 3) # elements left out of the third xval sample
> [1] 5 6 8
>
> This vector is used to index a response vector/model matrix.
>
> To see how this is applied, consider the following. y is a vector of
> length 10, the same as x:
>> y <- rpois(10, 15)
>> y
> [1] 12 15 17 11 14 14 12 12 16 16
>> y[x != 1] # first xval sample (y[1], y[2], y[7], y[10]
>> removed)
> [1] 17 11 14 14 12 16
>> y[x != 2] # second xval sample (y[3], y[4], y[9] removed)
> [1] 12 15 14 14 12 12 16
>> y[x != 3] # third xval sample (y[5], y[6], y[8] removed)
> [1] 12 15 17 11 12 16 16
>
> Indexing is one of the most important and powerful features of R.
>
> HTH,
> Dennis
>
>> Can anyone fill me in please?
>>
>> Thanks,
>>
>> andydolman at gmail.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list