[R] splitting strings effriciently

drflxms drflxms at googlemail.com
Mon Jan 9 00:56:39 CET 2012


Hi Andrew,

I am aware, that this is an R-mailing list, but for such tasks (I deal a
lot with huge genomic datasets) I tend to use awk and sed for
preprocessing of data, in case I run into performance problems.
Otherwise for handling of strings in R I recommend stringr library, but
I don't know about it's performance...

Felix

> Folks,
> 
> I have a data frame with 4861469 rows that contains an ip address 
> xxx.xxx.xxx.xxx as one of the columns. I want to assign a site to each 
> row based on IP ranges. To do this I have a function to split the ip 
> address as character into class A,B,C and D components. It works but is 
> horribly inefficient in terms of speed. I can't quite see how one of the 
> l/s/m/t/apply functions could be brought to bear on the problem. Does 
> anyone have any thoughts?
> 
> for(i in 1:4861469)
>    {
>    lst <-unlist(strsplit(data$ComputerName[i], "\\."))
>    data$IPA[i] <-lst[[1]]
>    data$IPB[i] <-lst[[2]]
>    data$IPC[i] <-lst[[3]]
>    data$IPD[i] <-lst[[4]]
>    rm(lst)
>    }
> 
> Andrew
> 
> Andrew Roberts
> Children's Orthopaedic Surgeon
> RJAH, Oswestry, UK



More information about the R-help mailing list