[R] Split Strings

Jim Lemon drjimlemon at gmail.com
Mon Jan 18 09:19:24 CET 2016


Hi Miluji,
While the other answers are correct in general, I noticed that your request
was for the elements of an incomplete string to be placed in the same
positions as in the complete strings. Perhaps this will help:

strings<-list("pc_m2_45_ssp3_wheat","pc_m2_45_ssp3_wheat",
 "ssp3_maize","m2_wheat","pc_m2_45_ssp3_maize")
split_strings<-strsplit(unlist(strings),"_")
max_length <- max(sapply(split_strings,length))
complete_sets<-split_strings[sapply(split_strings,length)==max_length]
element_sets<-list()

# build a list with the unique elements of each complete string
for(i in 1:max_length)
 element_sets[[i]]<-unique(sapply(complete_sets,"[",i))

# function to guess the position of the elements in a partial string
# and return them in the hopefully correct positions
fill_strings<-function(split_string,max_length,element_sets) {
 if(length(split_string) < max_length) {
  new_split_string<-rep(NA,max_length)
  for(i in 1:length(split_string)) {
   for(j in 1:length(complete_sets)) {
    if(grep(split_string[i],element_sets[j]))
     new_split_string[j]<-split_string[i]
   }
  }
  return(new_split_string)
 }
 return(split_string)
}

# however, if you know that the incomplete strings will always
# be composed of the last elements in the complete strings
fill_strings<-function(split_string,max_length) {
 lenstring<-length(split_string)
 if(lenstring < max_length)
  split_string<-c(rep(NA,max_length-lenstring),split_string)
 return(split_string)
}

sapply(split_strings,fill_strings,list(max_length,element_sets))

Jim

On Mon, Jan 18, 2016 at 7:56 AM, Miluji Sb <milujisb at gmail.com> wrote:

> I have a list of strings of different lengths and would like to split each
> string by underscore "_"
>
> pc_m2_45_ssp3_wheat
> pc_m2_45_ssp3_wheat
> ssp3_maize
> m2_wheat
>
> I would like to separate each part of the string into different columns
> such as
>
> pc m2 45 ssp3 wheat
>
> But because of the different lengths - I would like NA in the columns for
> the variables have fewer parts such as
>
> NA NA NA m2 wheat
>
> I have tried unlist(strsplit(x, "_")) to split, it works for one variable
> but not for the list - gives me "non-character argument" error. I would
> highly appreciate any help. Thank you!
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list