[R] stalled loop
jim holtman
jholtman at gmail.com
Mon Sep 17 01:04:09 CEST 2007
If I understand what you are trying to do is to find duplicated values
of rearrangements of words. If that is the case, this is probably
faster since your final loop is removed by using "duplicated". Most
of the time is in the sapply function.
> a <- c("superman", "xman", "spiderman", "wolfman", "mansuper", "manspider")
>
> ## uncomment the below to test how it scales
> a <- rep(a, 150000)
>
> system.time(
+ b <- sapply(a, function(.srt) {paste(sort(strsplit(.srt, '')[[1]]),
+ collapse="")})
+ )
user system elapsed
142.31 0.12 204.49
>
> system.time(
+ print(unique(b[duplicated(b)])) # find duplicated values
+ )
[1] "aemnprsu" "adeimnprs" "amnx" "aflmnow"
user system elapsed
0.49 0.03 0.62
Another alternative that will save a little more time is to do the
'strsplit' to the 'a' vector one time and then use that output as
input to the 'sapply':
> system.time({
+ a.split <- strsplit(a, '')
+ })
user system elapsed
16.24 0.02 19.03
>
> system.time({
+ b <- sapply(a.split, function(.srt) paste(sort(.srt), collapse=''))
+ })
user system elapsed
83.88 0.11 121.16
>
> system.time(
+ print(unique(b[duplicated(b)])) # find duplicated values
+ )
[1] "aemnprsu" "adeimnprs" "amnx" "aflmnow"
user system elapsed
0.62 0.04 0.75
So if you major slow down was the 'for' loop, and you are looking for
duplicates in terms of contained letters, then using 'duplicated'
should improve it. R is not necessarily the best language for doing
string manipulation, but if this timing is fine by you, then go with
it.
On 9/16/07, kevinchang <shukai at seas.upenn.edu> wrote:
>
> Hey everyone,
>
> The code I wrote executes correctly but is stalled seriously. Is there a
> way to hasten execution without coming up with a brand new algorithm
> ?please help. Thanks a lot for your time.
>
>
> #a simplified version of the code
>
> a<-c("superman" , "xman" , "spiderman" ,"wolfman" ,"mansuper","manspider" )
> b<-sapply(a,function(.srt){paste(sort(strsplit(.srt,'')[[1]]),
> collapse="")})
> c<-NA
> for(i in 1:length(b)) {
> if(length(which(b==b[i]))>1)
> c[i]<-b[i]
> }
> c<-c[!is.na(c)]
> # But if my get the volumne of "a" up to about 150000 words , the loop will
> work incredibly slowly.
>
> --
> View this message in context: http://www.nabble.com/stalled-loop-tf4456879.html#a12708590
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list