[R] SVM Param Tuning with using SNOW package
raluca
ucagui at hotmail.com
Wed Nov 18 22:21:15 CET 2009
Hi David,
I have no idea what "magic" you did, but running exactly the same code as
you, I have the same problem as before, meaning that I get results that are
identical from 2 in 2, while I should get diffrent results for each value of
cost1 (which is a vector with 10 values running between 0.5 and 30)
This is the result I get.
0.2197162, 0.2197162, 0.1467448, 0.1467448, 0.2247955, 0.2247955,
0.1073280, 0.1073280 0.2332475, 0.2332475
Anyway, thanks a lot for trying.
PS. Probably I should switch to Mac :)
David Winsemius wrote:
>
> I cannot really be sure what you are trying to do, but doing a bit of
> "surgery" on your code lets it run on a multicore Mac:
>
> library(e1071)
> library(snow)
> library(pls)
>
> data(gasoline)
>
> X=gasoline$NIR
> Y=gasoline$octane
>
> NR=10
> cost1=seq(0.5,30, length=NR)
>
> sv.lin<- function(c) {
>
> for (i in 1:NR) {
>
> ind=sample(1:60,50)
> gTest<- data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
>
> svm.lin <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],
> cross=5)
> results.lin <- predict(svm.lin, gTest$X)
>
> e.test.lin <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
>
> return(e.test.lin)
> }
> }
>
> cl<- makeCluster(2, type="SOCK" )
>
> clusterEvalQ(cl, library(e1071))
> cost1=seq(0.5,30, length=NR)
>
> clusterExport(cl,c("NR","Y","X", "cost1"))
> # Pretty sure you need a copy of cost1 on each node.
>
>
> RMSEP<-clusterApply(cl, cost1, sv.lin)
> # I thought the second argument was the matrix or vector over which to
> iterate.
>
> stopCluster(cl)
>
> # Since I don't know what the model meant, I cannot determine whehter
> this result is interpretable>
> > RMSEP
> [[1]]
> [1] 0.1921887
>
> [[2]]
> [1] 0.1924917
>
> [[3]]
> [1] 0.1885066
>
> [[4]]
> [1] 0.1871466
>
> [[5]]
> [1] 0.3550932
>
> [[6]]
> [1] 0.1226460
>
> [[7]]
> [1] 0.2426345
>
> [[8]]
> [1] 0.2126299
>
> [[9]]
> [1] 0.2276286
>
> [[10]]
> [1] 0.2064534
>
> --
> David Winsemius, MD
>
> On Nov 18, 2009, at 7:09 AM, raluca wrote:
>
>>
>> Hi Charlie,
>>
>>
>> Yes, you are perfectly right, when I make the clusters I should put
>> 2, not
>> 10 (it remained 10 from previous trials with 10 slaves).
>>
>> cl<- makeCluster(2, type="SOCK" )
>>
>> To tell the truth I do not understand very well what the 2nd
>> parameter for
>> clusterApplyLB() has to be.
>>
>> If the function sv.lin has just 1 parameter, sv.lin(c), where c is
>> the cost,
>> how should I call clusterApplyLB?
>>
>>
>> ? clusterApply LB(cl, ?,sv.lin, c=cost1) ?
>>
>>
>>
>> Below, I am providing a working example, using the gasoline data
>> that comes
>> in the pls package.
>>
>> Thank you for your time!
>>
>>
>> library(e1071)
>> library(snow)
>> library(pls)
>>
>> data(gasoline)
>>
>> X=gasoline$NIR
>> Y=gasoline$octane
>>
>> NR=10
>> cost1=seq(0.5,30, length=NR)
>>
>>
>> sv.lin<- function(c) {
>>
>> for (i in 1:NR) {
>>
>> ind=sample(1:60,50)
>> gTest<- data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
>> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
>>
>> svm.lin <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],
>> cross=5)
>> results.lin <- predict(svm.lin, gTest$X)
>>
>> e.test.lin <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
>>
>> return(e.test.lin)
>> }
>> }
>>
>>
>> cl<- makeCluster(2, type="SOCK" )
>>
>>
>> clusterEvalQ(cl,library(e1071))
>>
>>
>> clusterExport(cl,c("NR","Y","X"))
>>
>>
>> RMSEP<-clusterApplyLB(cl,?,sv.lin,c=cost1)
>>
>> stopCluster(cl)
>>
>>
>>
>>
>>
>> cls59 wrote:
>>>
>>>
>>> raluca wrote:
>>>>
>>>> Hello,
>>>>
>>>> Is the first time I am using SNOW package and I am trying to tune
>>>> the
>>>> cost parameter for a linear SVM, where the cost (variable cost1)
>>>> takes 10
>>>> values between 0.5 and 30.
>>>>
>>>> I have a large dataset and a pc which is not very powerful, so I
>>>> need to
>>>> tune the parameters using both CPUs of the pc.
>>>>
>>>> Somehow I cannot manage to do it. It seems that both CPUs are
>>>> fitting the
>>>> model for the same values of cost1, I guess the first 5, but not
>>>> for the
>>>> last 5.
>>>>
>>>> Please, can anyone help me!
>>>>
>>>> Here is the code:
>>>>
>>>> data <- data.frame(Y=I(Y),X=I(X))
>>>> data.X<-data$X
>>>> data.Y<-data$Y
>>>>
>>>>
>>>
>>>
>>> Helping you will be difficult as we're only three lines into your
>>> example
>>> and already I have no idea what the data you are using looks like.
>>> Example code needs to be fully reproducible-- that means a small
>>> slice of
>>> representative data needs to be provided or faked using an
>>> appropriate
>>> random number generator.
>>>
>>> Some things did jump out at me about your approach and I've made some
>>> notes below.
>>>
>>>
>>>
>>> raluca wrote:
>>>>
>>>> NR=10
>>>> cost1=seq(0.5,30, length=NR)
>>>>
>>>> sv.lin<- function(cl,c) {
>>>>
>>>> for (i in 1:NR) {
>>>>
>>>> ind=sample(1:414,276)
>>>>
>>>> hogTest<- data.frame(Y=I(data.Y[-ind]),X=I(data.X[-ind,]))
>>>> hogTrain<- data.frame(Y=I(data.Y[ind]),X=I(data.X[ind,]))
>>>>
>>>> svm.lin <- svm(hogTrain$X,hogTrain$Y,
>>>> kernel="linear",cost=c[i],
>>>> cross=5)
>>>> results.lin <- predict(svm.lin, hogTest$X)
>>>>
>>>> e.test.lin <- sqrt(sum((results.lin-hogTest$Y)^2)/
>>>> length(hogTest$Y))
>>>>
>>>> return(e.test.lin)
>>>> }
>>>> }
>>>>
>>>> cl<- makeCluster(10, type="SOCK" )
>>>>
>>>
>>>
>>> If your machine has two cores, why are you setting up a cluster
>>> with 10
>>> nodes? Usually the number of nodes should equal the number of
>>> cores on
>>> your machine in order to keep things efficient.
>>>
>>>
>>>
>>> raluca wrote:
>>>>
>>>>
>>>> clusterEvalQ(cl,library(e1071))
>>>>
>>>> clusterExport(cl,c("data.X","data.Y","NR","cost1"))
>>>>
>>>> RMSEP<-clusterApplyLB(cl,cost1,sv.lin)
>>>>
>>>
>>>
>>> Are you sure this evaluation even produces results? sv.lin() is a
>>> function
>>> you defined above that takes two parameters-- "cl" and "c".
>>> clusterApplyLB() will feed values of cost1 into sv.lin() for the
>>> argument
>>> "cl", but it has nothing to give for "c". At the very least, it
>>> seems
>>> like you would need something like:
>>>
>>> RMSEP <- clusterApplyLB( cl, cost1, sv.lin, c = someVector )
>>>
>>>
>>>
>>> raluca wrote:
>>>>
>>>>
>>>> stopCluster(cl)
>>>>
>>>>
>>>
>>>
>>> Sorry I can't be very helpful, but with no data and no apparent way
>>> to
>>> legally call sv.lin() the way you have it set up, I can't
>>> investigate the
>>> problem to see if I get the same results you described. If you could
>>> provide a complete working example, then there's a better chance that
>>> someone on this list will be able to help you.
>>>
>>> Good luck!
>>>
>>> -Charlie
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/SVM-Param-Tuning-with-using-SNOW-package-tp26399401p26406709.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://old.nabble.com/SVM-Param-Tuning-with-using-SNOW-package-tp26399401p26415997.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list