[R] what is the effective method to apply the below logic for ~1.2 million records in R
David Winsemius
dwinsemius at comcast.net
Sun Sep 20 04:25:12 CEST 2015
On Sep 19, 2015, at 2:09 PM, Ravi Teja wrote:
> Hi,
>
> I am trying to apply the below logic to generate flag_1 column on a data
> set consisting of ~1.2 million records in R.
>
> Code :
>
> for(i in 1: nrows)
> {
> if(A$customer[i]==A$customer[i+1])
> {
>
> if(is.na(A$Time_Diff[i]))
> A$flag_1[i] <- 1
> else if (A$Time_Diff[i] > 12)
> A$flag_1[i] <- 1
> else
> A$flag_1[i] <- A$flag_1[i-1]+1
>
> }
>
> else
> {
>
> if(is.na(A$Time_Diff[i]))
> A$flag_1[i] <- 1
> else if (A$Time_Diff[i] > 12)
> A$flag_1[i] <- 1
> else
> A$flag_1[i] <- A$flag_1[i-1]+1
>
> }
> }
The inner logic of the consequent and alternative appear identical. Vectorized approaches would surely be faster. You should post some code that matches the data. In R customer is not the same as Customer, and Time_diff is not Time_Diff, and my patience for this code review has expired.
Post the output from and do include code to create `nrows`:
dput( head (A, 20) )
>
> Resultant dataset should look like
>
> Customer Time_diff flag_1
> 1 NA 1
> 1 10 2
> 1 8 3
> 1 15 1
> 1 9 2
> 1 10 3
> 2 NA 1
> 2 2 2
> 2 5 3
>
> The above logic will take approximately 60 hours to generate the flag_1
> column on a dataset consisting of ~1.2 million records. Is there any
> effective way in R to implement this logic in R ?
>
> Appreciate your help.
>
> Thanks,
> Ravi
>
> [[alternative HTML version deleted]]
AND R-help is a plain text only mailing list.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list