[R] R dataframe and looping help
arun
smartpink111 at yahoo.com
Mon Sep 2 23:01:36 CEST 2013
HI,
You may try this:
dat1<- read.table(text="
CustID TripDate Store Bread Butter Milk Eggs
1 2-Jan-12 a 2 0 2 1
1 6-Jan-12 c 0 3 3 0
1 9-Jan-12 a 3 3 0 0
1 31-Mar-13 a 3 0 0 0
2 31-Aug-12 a 0 3 3 0
2 24-Sep-12 a 3 3 0 0
2 25-Sep-12 b 3 0 0 0
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<- dat1[,-c(1:3)]
res<- lapply(seq_len(ncol(dat2)),function(i) {x1<-cbind(dat1[,c(1:3)],dat2[,i]);colnames(x1)[4]<- colnames(dat2)[i];x2<-x1[x1[,4]!=0,];within(x2, {daysbetweentrips<-unlist(tapply(as.Date(x2$TripDate,"%d-%b-%y"),list(x2$CustID),function(x) c(NA,as.numeric(diff(x)))));previoustripstore<-ave(x2$Store,x2$CustID,FUN=function(x) c(NA,x[-length(x)]));Nexttripstore<- ave(x2$Store,x2$CustID,FUN=function(x) c(x[-1],NA))})})
res
#[[1]]
# CustID TripDate Store Bread Nexttripstore previoustripstore daysbetweentrips
#1 1 2-Jan-12 a 2 a <NA> NA
#3 1 9-Jan-12 a 3 a a 7
#4 1 31-Mar-13 a 3 <NA> a 447
#6 2 24-Sep-12 a 3 b <NA> NA
#7 2 25-Sep-12 b 3 <NA> a 1
#[[2]]
# CustID TripDate Store Butter Nexttripstore previoustripstore
#2 1 6-Jan-12 c 3 a <NA>
#3 1 9-Jan-12 a 3 <NA> c
#5 2 31-Aug-12 a 3 a <NA>
#6 2 24-Sep-12 a 3 <NA> a
# daysbetweentrips
#2 NA
#3 3
#5 NA
#6 24
#[[3]]
# CustID TripDate Store Milk Nexttripstore previoustripstore daysbetweentrips
#1 1 2-Jan-12 a 2 c <NA> NA
#2 1 6-Jan-12 c 3 <NA> a 4
#5 2 31-Aug-12 a 3 <NA> <NA> NA
#[[4]]
# CustID TripDate Store Eggs Nexttripstore previoustripstore daysbetweentrips
#1 1 2-Jan-12 a 1 <NA> <NA> NA
A.K.
Hi, I have a very quick question.. I have a data which has sales per
category per trip of each customer at different store locations, like
below..(dataset1 frome xcel attachment) CustID TripDate Store Bread Butter Milk Eggs
1 2-Jan-12 a 2 0 2 1
1 6-Jan-12 c 0 3 3 0
1 9-Jan-12 a 3 3 0 0
1 31-Mar-13 a 3 0 0 0
2 31-Aug-12 a 0 3 3 0
2 24-Sep-12 a 3 3 0 0
2 25-Sep-12 b 3 0 0 0 Here i have shown 4 items and their sales per customer per trip at each
store... However, my data contains around 100 columns with item names..
All i need to do is following: 1. Create a separate dataframe for each item. That is, create 100
dataframs one for each item.. Within the dataframe for Butter, for
example, will be contained columns 1-3 and Butter column, specifically
filtered for rows where butter>0 in sales..(so rows 1,4,7 will be
dropped from this dataframe)..Likewise for all items...(sample output
for butter is: (dataset2) CustID TripDate Store Butter
1 6-Jan-12 c 3
1 9-Jan-12 a 3
2 31-Aug-12 a 3
2 24-Sep-12 a 3 2. In same loop, create new derived variables within each dataframe for
each item... like create a lag variable for TripDate, create lag
variable for storename in next trip, storename in previous trip etc...
and also # days between trips to each store for each customer...(an
example for Butter dataframe with new derived variables would be...)
Dataset needs to be sorted by CustID, TripDate, Store before creating
derived variables (dataset3)Book1.xlsx CustID TripDate Store Butter NextTripstore previoustripstore
daysbetweentrips
1 6-Jan-12 c 3 a - -
1 9-Jan-12 a 3 - c -
2 31-Aug-12 a 3 a - -
2 24-Sep-12 a 3 - a 24 Point of creating multiple item level dataframes is, i will use them
iteratively as i will perform some regression on these datasets, using
same set of variables each time
More information about the R-help
mailing list