[R] Help with SVM package Kernlab
Steve Lianoglou
mailinglist.honeypot at gmail.com
Fri Dec 25 07:27:08 CET 2009
Hi,
On Fri, Dec 25, 2009 at 12:49 AM, Vishal Thapar <vishalthapar at gmail.com> wrote:
> Hi Steve,
>
> Thank you so much for the reply. The response to your queries are:
> What do these commands return over your data?
>
> 1. is(train500)
> -->"data.frame" "list" "oldClass" "mpinput" "vector"
> 2. is(train500$class)
> --> "NULL" "OptionalFunction" "output"
> 3. is(train500[1,5])
> --> "factor" "integer" "oldClass" "output" "numeric" "vector"
> 4. is(testSeq)
> --> "data.frame" "list" "oldClass" "mpinput" "vector"
> 5. is(testSeq[1,5])
> -->"factor" "integer" "oldClass" "output" "numeric" "vector"
> 6. is(testSeq$class)
> --> "NULL" "OptionalFunction" "output"
>
>
>
>> How similar are we talking -- something is (obviously) off because
>> using the promotergene dataset is quite straightforward:
>>
>> library(kernlab)
>> data(promotergene)
>> tr <- promotergene[1:90,]
>> ts <- promotergene[91:106,]
>> m <- ksvm(Class~., data=promotergene, kernel="rbfdot", kpar =
>> "automatic", C = 60, cross = 3, prob.model = TRUE)
>> p <- predict(m, ts)
>>
> Right. here is the first line from my training set:
> Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
> V20 V21 V22 V23 V24 V25 V26 V27 V28
> 1 + T A A A C T T A T A A A T A T A A A A
> C T T T T T A A T
> V487 V488 V489 V490 V491 V492 V493 V494 V495 V496 V497 V498 V499 V500
> 1 G A T T T C A T T T T G T T
>
> Here is the first record for the promoter gene set:
>
> Class V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
> V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38
> 1 + g c c t t c t c c a a a a c g t g t
> t t t t t g t t g t t a a t t c g g t
> V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51 V52 V53 V54 V55 V56
> V57 V58
> 1 g t a g a c t t g t a a a c c t a a
> a t
I'm guessing the factors aren't comparable? See the lower vs.
uppercase? Can you try to uppercase your data as you read it in? Eg,
you're doing this:
chr4Seq = scan(my.file,list("",""),nlines=2)
while(length(chr4Seq[[1]])>0)
{
seqId = chr4Seq[[1]];
testSeq = as.data.frame(t(s2c(chr4Seq[[2]])));
testSeq=cbind(Class="-",testSeq); # this is optional, I added this
later to see if having the Class in the record removes the error.
predictSvm1 <- (predict(modelforSVM, testSeq));
print(predictSvm1);
chr4Seq = scan(my.file,list("",""),nlines=2);
}
Call toupper() on the second line of the while loop:
testSeq = as.data.frame(t(s2c(toupper(chr4Seq[[2]]))))
And 2 questions:
1. What is your "s2c" function doing?
2. Why are you ending your lines with semi-colons?
Hope that helps,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the R-help
mailing list