[R] stalled loop

Mon Sep 17 01:04:09 CEST 2007

If I understand what you are trying to do is to find duplicated values
of rearrangements of words.  If that is the case, this is probably
faster since your final loop is removed by using "duplicated".  Most
of the time is in the sapply function.

> a <- c("superman", "xman", "spiderman", "wolfman", "mansuper", "manspider")
>
> ## uncomment the below to test how it scales
> a <- rep(a, 150000)
>
> system.time(
+     b <- sapply(a, function(.srt) {paste(sort(strsplit(.srt, '')[[1]]),
+                collapse="")})
+ )
   user  system elapsed
 142.31    0.12  204.49
>
> system.time(
+     print(unique(b[duplicated(b)]))  # find duplicated values
+ )
[1] "aemnprsu"  "adeimnprs" "amnx"      "aflmnow"
   user  system elapsed
   0.49    0.03    0.62

Another alternative that will save a little more time is to do the
'strsplit' to the 'a' vector one time and then use that output as
input to the 'sapply':

> system.time({
+ a.split <- strsplit(a, '')
+ })
   user  system elapsed
  16.24    0.02   19.03
>
> system.time({
+ b <- sapply(a.split, function(.srt) paste(sort(.srt), collapse=''))
+ })
   user  system elapsed
  83.88    0.11  121.16
>
> system.time(
+     print(unique(b[duplicated(b)]))  # find duplicated values
+ )
[1] "aemnprsu"  "adeimnprs" "amnx"      "aflmnow"
   user  system elapsed
   0.62    0.04    0.75

So if you major slow down was the 'for' loop, and you are looking for
duplicates in terms of contained letters, then using 'duplicated'
should improve it.  R is not necessarily the best language for doing
string manipulation, but if this timing is fine by you, then go with
it.

On 9/16/07, kevinchang <shukai at seas.upenn.edu> wrote:
>
> Hey everyone,
>
> The code I wrote executes correctly but  is stalled seriously. Is there a
> way to hasten execution without coming up with a  brand new algorithm
> ?please help. Thanks a lot for your time.
>
>
> #a simplified version of the code
>
> a<-c("superman" , "xman" , "spiderman" ,"wolfman" ,"mansuper","manspider" )
> b<-sapply(a,function(.srt){paste(sort(strsplit(.srt,'')[[1]]),
> collapse="")})
> c<-NA
> for(i in 1:length(b)) {
> if(length(which(b==b[i]))>1)
> c[i]<-b[i]
> }
> c<-c[!is.na(c)]
> # But if my get the volumne of "a" up to about 150000 words , the loop will
> work incredibly slowly.
>
> --
> View this message in context: http://www.nabble.com/stalled-loop-tf4456879.html#a12708590
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?