[R] spliting first 10 words in a string

Tue Nov 2 15:46:42 CET 2010

On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote:

> Though <forbidden> in this list, in Excel it's just (literally!)  
> five clicks
> away!
> (with the column in question selected)
> Data -> Text to Columns -> Delimited -> tick Space -> Finish
> Pa je! (~Voila in Slovenian)
> (then import back to R, keeping only the first 10 columns if so  
> desired)

You could do the same thing without needing to leave R. Just  
read.table( textConnection(..), header=FALSE, fill=TRUE)

 > read.table(textConnection(words), fill=T)
    V1    V2    V3      V4    V5    V6    V7      V8       V9      
V10      V11   V12 V13 V14
1   I  have     a columnn  with  text  that     has    quite        
a      few words  in it.
2   I would  like      to split these words      in separate columns
3 but  just first     ten words    in   the string.       Is    that  
possible    in  R?

>
> Regards,
> Assist. Prof. Gaj Vidmar, PhD
> University Rehabilitattion Institute, Republic of Slovenia
>
> Irrelevant P.S. Long ago, before embarking on what eventually ended  
> mainly
> in statistics,
> I did two years of geology, so (and also because of knowing what the
> poster's institute does)
> I even kinda imagine what these data are.
>
> "Matev¾ Pavliè" <matevz.pavlic at gi-zrmk.si> wrote in message
> news:AD5CA6183570B54F92AA45CE2619F9B9D96994 at gi-zrmk.si...
>> Hi,
>>
>> I am sorry, will try to be more exact from now on...
>>
>> I have a data.frame  with a field called Opis. IT contains  
>> sentenses that
>> I would like to split in words or fields in data.frame...when I say
>> columns I mean as in Excel table. I would like to split "Opis" into  
>> ten
>> fields from the first ten words in Opis field.
>> Here is an example of my data.frame.
>>
>> 'data.frame':   22928 obs. of  12 variables:
>> $ VrtinaID        : int  1 1 1 1 2 2 2 2 2 2 ...
>> $ ZapStev         : int  1 2 3 4 1 2 3 4 5 6 ...
>> $ GlobinaOd       : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
>> $ GlobinaDo       : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
>> $ Opis            : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST
>> PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884  
>> 9123 2500
>> 4756 ...
>> $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..:  
>> 154 125
>> 101 101 NA 106 125 80 106 101 ...
>> $ GeolNastOd      : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
>> $ GeolNastDo      : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
>> $ GeolNastOpis    : Factor w/ 113 levels "","B. M. S.",..: 56 53 53  
>> 53 56
>> 53 53 53 53 53 ...
>> $ NacinVrtanjaOd  : num  0e+00 1e+09 1e+09 1e+09 0e+00 ...
>> $ NacinVrtanjaDo  : num  1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ...
>> $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1  
>> 1 1 26
>> 1 1 1 1 1 ...
>>
>> Hope that explains better...
>> Thank you, m
>>
>> -----Original Message-----
>> From: David Winsemius [mailto:dwinsemius at comcast.net]
>> Sent: Monday, November 01, 2010 10:13 PM
>> To: Matev¾ Pavliè
>> Cc: r-help at r-project.org
>> Subject: Re: [R] spliting first 10 words in a string
>>
>>
>> On Nov 1, 2010, at 4:39 PM, Matev¾ Pavliè wrote:
>>
>>> Hi all,
>>>
>>>
>>>
>>> I have a columnn with text that has quite a few words in it. I would
>>> like to split these words in separate columns, but just first ten
>>> words in the string. Is that possible in R?
>>>
>>>
>>
>> Not sure what a column means to you. It's not a precisely defined R
>> type or class. (And you are requested to offered a concrete example
>> rather than making us guess.)
>>
>>> words <-"I have a columnn with text that has quite a few words in
>> it. I would like to split these words in separate columns, but just
>> first ten words in the string. Is that possible in R?"
>>
>>> strsplit(words, " ")[[1]][1:10]
>> [1] "I"       "have"    "a"       "columnn" "with"    "text"
>> "that"    "has"     "quite"   "a"
>>
>>
>> Or if in a dataframe:
>>
>>> words <-c("I have a columnn with text that has quite a few words in
>> it.",   "I would like to split these words in separate columns", "but
>> just first ten words in the string. Is that possible in R?")
>>> worddf <- data.frame(words=words)
>>
>>> t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
>>     [,1]  [,2]    [,3]    [,4]      [,5]    [,6]    [,7]    [,
>> 8]      [,9]       [,10]
>> [1,] "I"   "have"  "a"     "columnn" "with"  "text"  "that"  "has"
>> "quite"    "a"
>> [2,] "I"   "would" "like"  "to"      "split" "these" "words" "in"
>> "separate" "columns"
>> [3,] "but" "just"  "first" "ten"     "words" "in"    "the"    
>> "string."
>> "Is"       "that"
>>
>>
>> -- 
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT