[BioC] Re: KNN, SVM, and randomForest - How to predict testing set without known categories (affy data)

Wed Jul 28 17:04:21 CEST 2004

I do not know much about exprSet (please correct me if I am wrong) but I
think and treat exprSet as matrix. Indeed in my previous message, I was
writing in the context of matrix.

data(affybatch.example)
a <- rma(affybatch.example)
m <- exprs(a)

Then I work with 'm' which may or may not be what you want. 

If you want to force a matrix to exprSet, the examples in
help("exprSet") might be helpful.

Regards, Adai.

On Wed, 2004-07-28 at 14:09, Liu, Xin wrote:
> Thanks Tom, Sean, Xavier for the reply, and especially Adai!
> However I still have a problem. To put the microarray data into these supervised clustering, the expreSet need to be built. To build expreSet, you need to give the class of every sample. So when I predict samples with unknown classes, how to put them into the expreSet? Thank you!
> 
> Xin
> 
> 
> 
> -----Original Message-----
> From: Adaikalavan Ramasamy [mailto:ramasamy at cancer.org.uk]
> Sent: 28 July 2004 13:00
> To: Liu, Xin
> Cc: Tom R. Fahland; BioConductor mailing list
> Subject: Re: [BioC] KNN, SVM, and randomForest - How to predict
> testwithout known categories
> 
> 
> If algorithm 1 predicts "Yes", "Yes", "No", "No" for 4 samples and
> algorithm 2 predicts "Yes", "No", "Yes", "No", how do you know which one
> is the better algorithm ? So you use tests set with known classes to do
> this. You can do this by breaking your learning set (samples with know
> classes) into training and test set. Look up "cross validation".
> 
> Some example of built in cross validation
> * knn.cv() is a leave one out cross-validation of knn()
> * svm() in library(e1071) has an argument named 'cross' for cross
> validation
> In practice, I prefer to write my own wrapper for cross-validation to
> ensure that sampling method is the same across all algorithms.
> 
> Once you have determined the best algorithm and features, you then use
> predict() to predict samples with unknown classes.
> 
> Regards, Adai.
> 
> 
> 
> On Wed, 2004-07-28 at 09:18, Liu, Xin wrote:
> > In R, before using KNN, SVM, and randomForest, a expreSet is needed to build, which require the train WITH known catagories and the test WITH known catagories. However, by definition, in supervised learning you always train (with known
> > catagories), then predict the test WITHOUT known catagories. I wonder how to implement this. Thank you!
> > 
> > Xin
> > 
> > 
> > 
> > 
> > 
> > -----Original Message-----
> > From: Tom R. Fahland [mailto:tfahland at genomatica.com]
> > Sent: 27 July 2004 18:48
> > To: Liu, Xin; bioconductor at stat.math.ethz.ch
> > Subject: RE: [BioC] KNN, SVM,and randomForest - How to predict samples
> > without category 
> > 
> > 
> > By definition, in supervised learning you always train (with known
> > catagories), then run your unbiased data through for prediction. Both CV
> > and train/test partitions are good for choosing parameters and
> > optimizing the algorithms. I have just completed a study predicting dose
> > expsoure with good reasults using different algorithms. 
> > Tom
> > 
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch
> > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Liu, Xin
> > Sent: Tuesday, July 27, 2004 07:39
> > To: bioconductor at stat.math.ethz.ch
> > Subject: [BioC] KNN, SVM,and randomForest - How to predict samples
> > without category 
> > 
> > 
> > Dear all,
> > 
> > Supervised clusterings (KNN, SVM, and randomForest) use test sample set
> > and train sample set to do prediction. To create the expreSet, the
> > category is needed for each sample. However sometimes we need to predict
> > sample without its category. Anybody has some clue to do this? Thank you
> > very much!
> > 
> > Best regards,
> > Xin LIU
> > 
> > 
> > 
> > This e-mail is from ArraGen Ltd\ \ The e-mail and any files\...{{dropped}}
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> > 
> 
> 
> 
> 
> 
> This e-mail is from ArraGen Ltd
> 
> The e-mail and any files transmitted with it are confidential and privileged and intended solely for the use of the individual or entity to whom they are addressed. 
> 
> Any unauthorised direct or indirect dissemination, distribution or copying of this message and any attachments is strictly prohibited. 
> 
> If you have received the e-mail in error please notify helpdesk at arragen.com or telephone +44 28 38 363841 and delete the e-mail from your system.
> 
> E-mail and other communications sent to this company may be reviewed or read by persons other than the intended recipient.
> 
> Viruses : although we have taken steps to ensure that this e-mail and any attachments are free from any virus, you should, in keeping with good practice, ensure that they are actually virus free.
> 
> ArraGen Ltd. Registration Number NI 43067
> Registered Address :  Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD
> 
>