[R] Recursive Feature Elimination with SVM
Priyanka Purkayastha
ppurk@y@@th@2010 @ending from gm@il@com
Wed Jan 2 08:13:36 CET 2019
This is the code I tried,
library(e1071)
library(caret)
library(ROCR)
data <- read.csv("data.csv", header = TRUE)
set.seed(998)
inTraining <- createDataPartition(data$Class, p = .70, list = FALSE)
training <- data[ inTraining,]
testing <- data[-inTraining,]
while(length(data)>0){
## Building the model ####
svm.model <- svm(Class ~ ., data = training,
cross=10,metric="ROC",type="eps-regression",kernel="linear",na.action=na.omit,probability
= TRUE)
print(svm.model)
###### auc measure #######
#prediction and ROC
svm.model$index
svm.pred <- predict(svm.model, testing, probability = TRUE)
#calculating auc
c <- as.numeric(svm.pred)
c = c - 1
pred <- prediction(c, testing$Class)
perf <- performance(pred,"tpr","fpr")
plot(perf,fpr.stop=0.1)
auc <- performance(pred, measure = "auc")
auc <- auc using y.values[[1]]
print(length(data))
print(auc)
#compute the weight vector
w = t(svm.model$coefs)%*%svm.model$SV
#compute ranking criteria
weight_matrix = w * w
#rank the features
w_transpose <- t(weight_matrix)
w2 <- as.matrix(w_transpose[order(w_transpose[,1], decreasing = FALSE),])
a <- as.matrix(w2[which(w2 == max(w2)),]) #to get the rows with minimum
values
row.names(a) -> remove
training<- data[,setdiff(colnames(data),remove)]
}
On Wed, Jan 2, 2019 at 11:18 AM David Winsemius <dwinsemius using comcast.net>
wrote:
>
> On 1/1/19 5:31 PM, Priyanka Purkayastha wrote:
> > Thankyou David.. I tried the same, I gave x as the data matrix and y
> > as the class label. But it returned an empty "featureRankedList". I
> > get no output when I try the code.
>
>
> If you want people to spend time on this you should post a reproducible
> example. See the Posting Guide ... and learn to post in plain text.
>
>
> --
>
> David
>
> >
> > On Tue, 1 Jan 2019 at 11:42 PM, David Winsemius
> > <dwinsemius using comcast.net <mailto:dwinsemius using comcast.net>> wrote:
> >
> >
> > On 1/1/19 4:40 AM, Priyanka Purkayastha wrote:
> > > I have a dataset (data) with 700 rows and 7000 columns. I am
> > trying to do
> > > recursive feature selection with the SVM model. A quick google
> > search
> > > helped me get a code for a recursive search with SVM. However, I
> > am unable
> > > to understand the first part of the code, How do I introduce my
> > dataset in
> > > the code?
> >
> >
> > Generally the "labels" is given to such a machine learning device
> > as the
> > y argument, while the "features" are passed as a matrix to the x
> > argument.
> >
> >
> > --
> >
> > David.
> >
> > >
> > > If the dataset is a matrix, named data. Please give me an
> > example for
> > > recursive feature selection with SVM. Bellow is the code I got for
> > > recursive feature search.
> > >
> > > svmrfeFeatureRanking = function(x,y){
> > >
> > > #Checking for the variables
> > > stopifnot(!is.null(x) == TRUE, !is.null(y) == TRUE)
> > >
> > > n = ncol(x)
> > > survivingFeaturesIndexes = seq_len(n)
> > > featureRankedList = vector(length=n)
> > > rankedFeatureIndex = n
> > >
> > > while(length(survivingFeaturesIndexes)>0){
> > > #train the support vector machine
> > > svmModel = svm(x[, survivingFeaturesIndexes], y, cost = 10,
> > > cachesize=500,
> > > scale=FALSE, type="C-classification",
> > kernel="linear" )
> > >
> > > #compute the weight vector
> > > w = t(svmModel$coefs)%*%svmModel$SV
> > >
> > > #compute ranking criteria
> > > rankingCriteria = w * w
> > >
> > > #rank the features
> > > ranking = sort(rankingCriteria, index.return = TRUE)$ix
> > >
> > > #update feature ranked list
> > > featureRankedList[rankedFeatureIndex] =
> > > survivingFeaturesIndexes[ranking[1]]
> > > rankedFeatureIndex = rankedFeatureIndex - 1
> > >
> > > #eliminate the feature with smallest ranking criterion
> > > (survivingFeaturesIndexes =
> > survivingFeaturesIndexes[-ranking[1]])}
> > > return (featureRankedList)}
> > >
> > >
> > >
> > > I tried taking an idea from the above code and incorporate the
> > idea in my
> > > code as shown below
> > >
> > > library(e1071)
> > > library(caret)
> > >
> > > data<- read.csv("matrix.csv", header = TRUE)
> > >
> > > x <- data
> > > y <- as.factor(data$Class)
> > >
> > > svmrfeFeatureRanking = function(x,y){
> > >
> > > #Checking for the variables
> > > stopifnot(!is.null(x) == TRUE, !is.null(y) == TRUE)
> > >
> > > n = ncol(x)
> > > survivingFeaturesIndexes = seq_len(n)
> > > featureRankedList = vector(length=n)
> > > rankedFeatureIndex = n
> > >
> > > while(length(survivingFeaturesIndexes)>0){
> > > #train the support vector machine
> > > svmModel = svm(x[, survivingFeaturesIndexes], y,
> > cross=10,cost =
> > > 10, type="C-classification", kernel="linear" )
> > >
> > > #compute the weight vector
> > > w = t(svmModel$coefs)%*%svmModel$SV
> > >
> > > #compute ranking criteria
> > > rankingCriteria = w * w
> > >
> > > #rank the features
> > > ranking = sort(rankingCriteria, index.return = TRUE)$ix
> > >
> > > #update feature ranked list
> > > featureRankedList[rankedFeatureIndex] =
> > > survivingFeaturesIndexes[ranking[1]]
> > > rankedFeatureIndex = rankedFeatureIndex - 1
> > >
> > > #eliminate the feature with smallest ranking criterion
> > > (survivingFeaturesIndexes =
> > survivingFeaturesIndexes[-ranking[1]])}
> > >
> > > return (featureRankedList)}
> > >
> > > But couldn't do anything at the stage "update feature ranked list"
> > > Please guide
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
> > -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > Regards,
> >
> > Priyanka Purkayastha, M.Tech, Ph.D.,
> > SERB National Postdoctoral Researcher
> > Genomics and Systems Biology Lab,
> > Department of Chemical Engineering,
> > Indian Institute of Technology Bombay (IITB),
> > Powai, Mumbai- 400076
> >
> >
> >
>
--
Regards,
Priyanka Purkayastha, M.Tech, Ph.D.,
SERB National Postdoctoral Researcher
Genomics and Systems Biology Lab,
Department of Chemical Engineering,
Indian Institute of Technology Bombay (IITB),
Powai, Mumbai- 400076
[[alternative HTML version deleted]]
More information about the R-help
mailing list