[R] Help with SVM package Kernlab

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Dec 25 07:27:08 CET 2009


Hi,

On Fri, Dec 25, 2009 at 12:49 AM, Vishal Thapar <vishalthapar at gmail.com> wrote:
> Hi Steve,
>
> Thank you so much for the reply. The response to your queries are:
> What do these commands return over your data?
>
> 1. is(train500)
> -->"data.frame" "list"       "oldClass"   "mpinput"    "vector"
> 2. is(train500$class)
> --> "NULL"             "OptionalFunction" "output"
> 3. is(train500[1,5])
> -->  "factor"   "integer"  "oldClass" "output"   "numeric"  "vector"
> 4. is(testSeq)
> --> "data.frame" "list"       "oldClass"   "mpinput"    "vector"
> 5. is(testSeq[1,5])
> -->"factor"   "integer"  "oldClass" "output"   "numeric"  "vector"
> 6. is(testSeq$class)
> -->  "NULL"             "OptionalFunction" "output"
>
>
>
>> How similar are we talking -- something is (obviously) off because
>> using the promotergene dataset is quite straightforward:
>>
>> library(kernlab)
>> data(promotergene)
>> tr <- promotergene[1:90,]
>> ts <- promotergene[91:106,]
>> m <- ksvm(Class~., data=promotergene, kernel="rbfdot", kpar =
>> "automatic", C = 60, cross = 3, prob.model = TRUE)
>> p <- predict(m, ts)
>>
> Right. here is the first line from my training set:
>   Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
> V20 V21 V22 V23 V24 V25 V26 V27 V28
> 1     +  T  A  A  A  C  T  T  A  T   A   A   A   T   A   T   A   A   A   A
> C   T   T   T   T   T   A   A   T
>     V487 V488 V489 V490 V491 V492 V493 V494 V495 V496 V497 V498 V499 V500
> 1    G    A    T    T    T    C    A    T    T    T    T    G    T    T
>
> Here is the first record for the promoter gene set:
>
>   Class V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
> V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38
> 1     +  g  c  c  t  t  c  t  c   c   a   a   a   a   c   g   t   g   t
> t   t   t   t   t   g   t   t   g   t   t   a   a   t   t   c   g   g   t
>   V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51 V52 V53 V54 V55 V56
> V57 V58
> 1   g   t   a   g   a   c   t   t   g   t   a   a   a   c   c   t   a   a
> a   t

I'm guessing the factors aren't comparable? See the lower vs.
uppercase? Can you try to uppercase your data as you read it in? Eg,
you're doing this:

chr4Seq = scan(my.file,list("",""),nlines=2)

while(length(chr4Seq[[1]])>0)
{
    seqId = chr4Seq[[1]];
    testSeq = as.data.frame(t(s2c(chr4Seq[[2]])));
    testSeq=cbind(Class="-",testSeq); # this is optional, I added this
later to see if having the Class in the record removes the error.
    predictSvm1 <- (predict(modelforSVM, testSeq));
    print(predictSvm1);
    chr4Seq = scan(my.file,list("",""),nlines=2);
}

Call toupper() on the second line of the while loop:

testSeq = as.data.frame(t(s2c(toupper(chr4Seq[[2]]))))

And 2 questions:
1. What is your "s2c" function doing?
2. Why are you ending your lines with semi-colons?

Hope that helps,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




More information about the R-help mailing list