[R] Best and worst values for each date
arun
smartpink111 at yahoo.com
Wed Sep 25 22:24:57 CEST 2013
Hi,
May be you can try this:
obj_name<- load("arun.RData")
Pred1<- get(obj_name[1])
Actual1<- get(obj_name[2])
library(reshape2)
dat<-cbind(melt(Pred1,id.vars="S1"),value2=melt(Actual1,id.vars="S1")[,3]) # to reshape to long form
colnames(dat)[3:4]<- c("Predict","Actual")
dat$variable<- as.character(dat$variable) #not that needed
dat1<- dat[!(is.na(dat$Predict)|is.na(dat$Actual)),] # removes the NA values in columns "Predict" and "Actual"
res<- do.call(rbind,lapply(split(dat1,dat1$S1),function(x){x1<-x[order(x$Predict),]
xlow<-if(sum(x1$Predict<0) <5){ #in cases where you don't have 5 negative numbers
x1[x1$Predict<0,]
}
else {
x1[x1$Predict<0,][1:5,] # select first five rows
}
xhigh<- if(sum(x1$Predict>0) <5){ #not having 5 postive numbers
x1[x1$Predict>0,]}
else {
tail(x1[x1$Predict>0,],5)
}
rbind(xhigh[rev(order(xhigh$Predict)),],xlow)})) ##reverse the order of high values
dim(res)
#[1] 480 4
A.K.
________________________________
From: Ira Sharenow <irasharenow100 at yahoo.com>
To: arun <smartpink111 at yahoo.com>
Sent: Wednesday, September 25, 2013 12:55 PM
Subject: Best and worst values for each date
Arun,
I hope you have been doing well.
I have a new problem.
I have two data frames, one for predictions and one for the actual returns.
Each day I act on the returns that have the 5 highest values and the five lowest values. I then want to compare to the actual values. So I need to subset my two original data frames so that the stocks and their prices that remain after each day are the ones I want. At the end of filtering there will be one data frame for predictions and one data frame for actual values.
Now for an enhancement. NA values cannot be part of the reduced data frames but will occur in great proportion in the original data frames. Each day I need to check that the top five are positive; otherwise I need to reduce that number as needed. Similarly I need for the bottom five are negative. At the end of 50 days each original data frame will have 5 * 2 * 50 = 500 rows, but this step may reduce that number.
I attached a smallish file with the two data frames. The real ones have hundreds of columns and over 1,000 rows.
Please aim for simplicity. If the solution is complex, please explain.
Do you want me to use a different email address?
Thanks.
Ira
Example. But the stocks are not set up this way.
The highlighted stocks are in the first data frames.
Predict Actual
1/3/2006 S1 3 -1.943
1/3/2006 S20 4 10.376
1/3/2006 S3 2 8.611
1/3/2006 S4 1 7.465
1/3/2006 S5 0 1.648
1/3/2006 S6 -1 5.36
1/3/2006 S7 -2 4.36
1/3/2006 S8 -3 3.574
1/3/2006 S9 -4 2.748
1/3/2006 S10 -5 1.933
1/3/2006 S11 -6 0.548
1/3/2006 S12 -7 -0.66
1/3/2006 S13 -8 -1.793
1/3/2006 S14 -9 -2.163
1/3/2006 S15 -10 -3.077
1/3/2006 S16 -11 -4.723
1/3/2006 S17 -12 -5.919
1/3/2006 S18 -13 -6.529
1/3/2006 S19 -14 -7.979
1/3/2006 S20 -15 -8.064
After making sure only positives are in for top 5 predictions and only negatives for the bottom 5 predictions
1/3/2006 S1 3 -1.943
1/3/2006 S20 4 10.376
1/3/2006 S3 2 8.611
1/3/2006 S4 1 7.465
1/3/2006 S16 -11 -4.723
1/3/2006 S17 -12 -5.919
1/3/2006 S18 -13 -6.529
1/3/2006 S19 -14 -7.979
1/3/2006 S20 -15 -8.064
Note that the next day different stocks may be selected. Also there cannot any NA in either the Predict or Actual columns.
More information about the R-help
mailing list