[R] split strings
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Wed May 27 10:07:42 CEST 2009
Allan Engelhardt wrote:
> Immaterial, yes, but it is always good to test :) and your solution
> *is* faster and it is even faster if you can assume byte strings:
:)
indeed; though if the speed is immaterial (and in this case it
supposedly was), it's probably not worth risking fixed=TRUE removing
'.tif' from the middle of the name, however unlikely this might be (cf
murphy's laws).
but if you can assume that each string ends with a '.tif' (or any other
\..{3} substring), then substr is marginally faster than sub, even as a
three-pass approach, while avoiding the risk of removing '.tif' from the
middle:
strings = sprintf('f:/foo/bar//%s.tif', replicate(1000,
paste(sample(letters, 10), collapse='')))
library(rbenchmark)
benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
substr={basenames=basename(strings); substr(basenames, 1,
nchar(basenames)-4)},
sub=sub('.tif', '', basename(strings), fixed=TRUE, useBytes=TRUE))
# test elapsed
# 1 substr 3.176
# 2 sub 3.296
vQ
>
> > strings = sprintf('f:/foo/bar//%s.tif', replicate(1000,
> paste(sample(letters, 10), collapse='')))
> > library(rbenchmark)
> > benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
> 'one-pass, perl'=sub('.*//(.*)[.]tif$', '\\1', strings, perl=TRUE),
> 'two-pass, perl'=sub('.tif$', '', basename(strings), perl=TRUE),
> 'one-pass, no perl'=sub('.*//(.*)[.]tif$', '\\1', strings, perl=FALSE),
> 'two-pass, no perl'=sub('.tif$', '', basename(strings), perl=FALSE),
> 'fixed'=sub(".tif", "", basename(strings), fixed=TRUE),
> 'fixed, bytes'=sub(".tif", "", basename(strings), fixed=TRUE,
> useBytes=TRUE))
>
> test elapsed
> 1 one-pass, perl 2.946
> 2 two-pass, perl 3.858
> 3 one-pass, no perl 15.884
> 4 two-pass, no perl 3.788
> 5 fixed 2.264
> 6 fixed, bytes 1.813
>
More information about the R-help
mailing list