[R] split strings
Gabor Grothendieck
ggrothendieck at gmail.com
Tue May 26 22:40:21 CEST 2009
Although speed is really immaterial here this is likely
to be faster than all shown so far:
sub(".tif", "", basename(metr_list), fixed = TRUE)
It does not allow file names with .tif in the middle
of them since it will delete the first occurrence rather
than the last but such a situation is highly unlikely.
On Tue, May 26, 2009 at 4:24 PM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
> Monica Pisica wrote:
>> Hi everybody,
>>
>> Thank you for the suggestions and especially the explanation Waclaw provided for his code. Maybe one day i will be able to wrap my head around this.
>>
>> Thanks again,
>>
>
> you're welcome. note that if efficiency is an issue, you'd better have
> perl=TRUE there:
>
> output = sub('.*//(.*)[.]tif$', '\\1', input, perl=TRUE)
>
> with perl=TRUE, the one-pass solution is somewhat faster than the
> two-pass solution of gabor's -- which, however, is probably easier to
> understand; with perl=FALSE (the default), the performance drops:
>
> strings = sprintf(
> 'f:/foo/bar//%s.tif',
> replicate(1000, paste(sample(letters, 10), collapse='')))
> library(rbenchmark)
> benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
> 'one-pass, perl'=sub('.*//(.*)[.]tif$', '\\1', strings, perl=TRUE),
> 'two-pass, perl'=sub('.tif$', '', basename(strings), perl=TRUE),
> 'one-pass, no perl'=sub('.*//(.*)[.]tif$', '\\1', strings,
> perl=FALSE),
> 'two-pass, no perl'=sub('.tif$', '', basename(strings), perl=FALSE))
> # 1 one-pass, perl 3.391
> # 2 two-pass, perl 4.944
> # 3 one-pass, no perl 18.836
> # 4 two-pass, no perl 5.191
>
> vQ
>
>
>>
>> Monica
>>
>> ----------------------------------------
>>
>>> Date: Tue, 26 May 2009 15:46:21 +0200
>>> From: Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
>>> To: pisicandru at hotmail.com
>>> CC: r-help at r-project.org
>>> Subject: Re: [R] split strings
>>>
>>> Monica Pisica wrote:
>>>
>>>> Hi everybody,
>>>>
>>>> I have a vector of characters and i would like to extract certain parts. My vector is named metr_list:
>>>>
>>>> [1] "F:/Naval_Live_Oaks/2005/data//BE.tif"
>>>> [2] "F:/Naval_Live_Oaks/2005/data//CH.tif"
>>>> [3] "F:/Naval_Live_Oaks/2005/data//CRR.tif"
>>>> [4] "F:/Naval_Live_Oaks/2005/data//HOME.tif"
>>>>
>>>> And i would like to extract BE, CH, CRR, and HOME in a different vector named "names.id"
>>>>
>>> one way that seems reasonable is to use sub:
>>>
>>> output = sub('.*//(.*)[.]tif$', '\\1', input)
>>>
>>> which says 'from each string remember the substring between the
>>> rigthmost two slashes and a .tif extension, exclusive, and replace the
>>> whole thing with the captured part'. if the pattern does not match, you
>>> get the original input:
>>>
>>> sub('.*//(.*)[.]tif$', '\\1', 'f:/foo/bar//buz.tif')
>>> # buz
>>>
>>> vQ
>>>
>> _________________________________________________________________
>
>
More information about the R-help
mailing list