[R] SVM Param Tuning with using SNOW package

Wed Nov 18 22:21:15 CET 2009

Hi David,

I have no idea what "magic" you did, but running exactly the same code as
you, I have the same problem as before, meaning that I get results that are
identical from 2 in 2, while I should get diffrent results for each value of
cost1 (which is a vector with 10 values running between 0.5 and 30) 
This is the result I get.

0.2197162, 0.2197162,  0.1467448,  0.1467448,  0.2247955,  0.2247955,
0.1073280, 0.1073280 0.2332475, 0.2332475

Anyway, thanks a lot for trying. 

PS. Probably I should switch to Mac :)

David Winsemius wrote:
> 
> I cannot really be sure what you are trying to do,  but doing a bit of  
> "surgery" on your code lets it run on a multicore Mac:
> 
> library(e1071)
> library(snow)
> library(pls)
> 
> data(gasoline)
> 
> X=gasoline$NIR
> Y=gasoline$octane
> 
> NR=10
> cost1=seq(0.5,30, length=NR)
> 
> sv.lin<- function(c) {
> 
> for (i in 1:NR) {
> 
> ind=sample(1:60,50)
> gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
> 
> svm.lin   	  <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],  
> cross=5)
> results.lin   <- predict(svm.lin, gTest$X)
> 
> e.test.lin     <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
> 
> return(e.test.lin)
> }
> }
> 
> cl<- makeCluster(2, type="SOCK" )
> 
> clusterEvalQ(cl, library(e1071))
> cost1=seq(0.5,30, length=NR)
> 
> clusterExport(cl,c("NR","Y","X",  "cost1"))
> # Pretty sure you need a copy of cost1 on each node.
> 
> 
> RMSEP<-clusterApply(cl, cost1, sv.lin)
> # I thought the second argument was the matrix or vector over which to  
> iterate.
> 
> stopCluster(cl)
> 
> # Since I don't know what the model meant, I cannot determine whehter  
> this result is interpretable>
>  > RMSEP
> [[1]]
> [1] 0.1921887
> 
> [[2]]
> [1] 0.1924917
> 
> [[3]]
> [1] 0.1885066
> 
> [[4]]
> [1] 0.1871466
> 
> [[5]]
> [1] 0.3550932
> 
> [[6]]
> [1] 0.1226460
> 
> [[7]]
> [1] 0.2426345
> 
> [[8]]
> [1] 0.2126299
> 
> [[9]]
> [1] 0.2276286
> 
> [[10]]
> [1] 0.2064534
> 
> -- 
> David Winsemius, MD
> 
> On Nov 18, 2009, at 7:09 AM, raluca wrote:
> 
>>
>> Hi Charlie,
>>
>>
>> Yes, you are perfectly right, when I make the clusters I should put  
>> 2, not
>> 10 (it remained 10 from previous trials with 10 slaves).
>>
>> cl<- makeCluster(2, type="SOCK" )
>>
>> To tell the truth I do not understand very well what the 2nd  
>> parameter for
>> clusterApplyLB() has to be.
>>
>> If the function sv.lin has just 1 parameter, sv.lin(c), where c is  
>> the cost,
>> how should I call clusterApplyLB?
>>
>>
>> ? clusterApply LB(cl, ?,sv.lin, c=cost1)  ?
>>
>>
>>
>> Below, I am providing a working example, using the gasoline data  
>> that comes
>> in the pls package.
>>
>> Thank you for your time!
>>
>>
>> library(e1071)
>> library(snow)
>> library(pls)
>>
>> data(gasoline)
>>
>> X=gasoline$NIR
>> Y=gasoline$octane
>>
>> NR=10
>> cost1=seq(0.5,30, length=NR)
>>
>>
>> sv.lin<- function(c) {
>>
>> for (i in 1:NR) {
>>
>> ind=sample(1:60,50)
>> gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
>> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
>>
>> svm.lin   	  <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],  
>> cross=5)
>> results.lin   <- predict(svm.lin, gTest$X)
>>
>> e.test.lin     <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
>>
>> return(e.test.lin)
>> }
>> }
>>
>>
>> cl<- makeCluster(2, type="SOCK" )
>>
>>
>> clusterEvalQ(cl,library(e1071))
>>
>>
>> clusterExport(cl,c("NR","Y","X"))
>>
>>
>> RMSEP<-clusterApplyLB(cl,?,sv.lin,c=cost1)
>>
>> stopCluster(cl)
>>
>>
>>
>>
>>
>> cls59 wrote:
>>>
>>>
>>> raluca wrote:
>>>>
>>>> Hello,
>>>>
>>>> Is the first time I am using SNOW package and I am trying to tune  
>>>> the
>>>> cost parameter for a linear SVM, where the cost (variable cost1)  
>>>> takes 10
>>>> values between 0.5 and 30.
>>>>
>>>> I have a large dataset and a pc which is not very powerful, so I  
>>>> need to
>>>> tune the parameters using both CPUs of the pc.
>>>>
>>>> Somehow I cannot manage to do it. It seems that both CPUs are  
>>>> fitting the
>>>> model for the same values of cost1, I guess the first 5, but not  
>>>> for the
>>>> last 5.
>>>>
>>>> Please, can anyone help me!
>>>>
>>>> Here is the code:
>>>>
>>>> data <- data.frame(Y=I(Y),X=I(X))
>>>> data.X<-data$X
>>>> data.Y<-data$Y
>>>>
>>>>
>>>
>>>
>>> Helping you will be difficult as we're only three lines into your  
>>> example
>>> and already I have no idea what the data you are using looks like.
>>> Example code needs to be fully reproducible-- that means a small  
>>> slice of
>>> representative data needs to be provided or faked using an  
>>> appropriate
>>> random number generator.
>>>
>>> Some things did jump out at me about your approach and I've made some
>>> notes below.
>>>
>>>
>>>
>>> raluca wrote:
>>>>
>>>> NR=10
>>>> cost1=seq(0.5,30, length=NR)
>>>>
>>>> sv.lin<- function(cl,c) {
>>>>
>>>> for (i in 1:NR) {
>>>>
>>>> ind=sample(1:414,276)
>>>>
>>>> hogTest<-  data.frame(Y=I(data.Y[-ind]),X=I(data.X[-ind,]))
>>>> hogTrain<- data.frame(Y=I(data.Y[ind]),X=I(data.X[ind,]))
>>>>
>>>> svm.lin   	  <- svm(hogTrain$X,hogTrain$Y,  
>>>> kernel="linear",cost=c[i],
>>>> cross=5)
>>>> results.lin   <- predict(svm.lin, hogTest$X)
>>>>
>>>> e.test.lin     <- sqrt(sum((results.lin-hogTest$Y)^2)/ 
>>>> length(hogTest$Y))
>>>>
>>>> return(e.test.lin)
>>>> }
>>>> }
>>>>
>>>> cl<- makeCluster(10, type="SOCK" )
>>>>
>>>
>>>
>>> If your machine has two cores, why are you setting up a cluster  
>>> with 10
>>> nodes?  Usually the number of nodes should equal the number of  
>>> cores on
>>> your machine in order to keep things efficient.
>>>
>>>
>>>
>>> raluca wrote:
>>>>
>>>>
>>>> clusterEvalQ(cl,library(e1071))
>>>>
>>>> clusterExport(cl,c("data.X","data.Y","NR","cost1"))
>>>>
>>>> RMSEP<-clusterApplyLB(cl,cost1,sv.lin)
>>>>
>>>
>>>
>>> Are you sure this evaluation even produces results? sv.lin() is a  
>>> function
>>> you defined above that takes two parameters-- "cl" and "c".
>>> clusterApplyLB() will feed values of cost1 into sv.lin() for the  
>>> argument
>>> "cl", but it has nothing to give for "c".  At the very least, it  
>>> seems
>>> like you would need something like:
>>>
>>>  RMSEP <- clusterApplyLB( cl, cost1, sv.lin, c = someVector )
>>>
>>>
>>>
>>> raluca wrote:
>>>>
>>>>
>>>> stopCluster(cl)
>>>>
>>>>
>>>
>>>
>>> Sorry I can't be very helpful, but with no data and no apparent way  
>>> to
>>> legally call sv.lin() the way you have it set up, I can't  
>>> investigate the
>>> problem to see if I get the same results you described.  If you could
>>> provide a complete working example, then there's a better chance that
>>> someone on this list will be able to help you.
>>>
>>> Good luck!
>>>
>>> -Charlie
>>>
>>
>> -- 
>> View this message in context:
>> http://old.nabble.com/SVM-Param-Tuning-with-using-SNOW-package-tp26399401p26406709.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://old.nabble.com/SVM-Param-Tuning-with-using-SNOW-package-tp26399401p26415997.html
Sent from the R help mailing list archive at Nabble.com.