[R] Search and extract string function
Marc Schwartz
marc_schwartz at me.com
Thu Jul 15 23:17:04 CEST 2010
On Jul 15, 2010, at 11:27 AM, AndrewPage wrote:
>
> Actually I have one more question that's somewhat related-- I'm starting out
> by importing a .txt file that isn't divided into vectors and is at times
> inconsistent with regards to spacing, indents, etc., so I can't rely on
> those. It looks something like this:
>
>
> "Drink=Coffee:Location=Office:Time=Morning:Market=Flat
>
> Drink=Water:Location=Office:Time=Afternoon:Market=Up
>
> Drink=Water:Location=Gym:Time=Evening:Market=Closed
> Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed
> Drink=Coffee:Location=Office:Time=Morning:Market=Flat
> Drink=Water:Location=Office:Time=Afternoon:Market=Up
>
> Drink=Water:Location=Gym:Time=Evening:Market=Closed
> Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed
> Drink=Coffee:Location=Office:Time=Morning:Market=Flat
>
> Drink=Water:Location=Office:Time=Afternoon:Market=Up
>
> Drink=Water:Location=Gym:Time=Evening:Market=Closed
>
> Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"
>
>
>
> How can I take a single string like this and divide it into twelve vectors,
> like this:
>
> FixedData
> [1] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat"
> [2] "Drink=Water:Location=Office:Time=Afternoon:Market=Up"
> [3] "Drink=Water:Location=Gym:Time=Evening:Market=Closed"
> [4] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"
> [5] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat"
> [6] "Drink=Water:Location=Office:Time=Afternoon:Market=Up"
> [7] "Drink=Water:Location=Gym:Time=Evening:Market=Closed"
> [8] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"
> [9] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat"
> [10] "Drink=Water:Location=Office:Time=Afternoon:Market=Up"
> [11] "Drink=Water:Location=Gym:Time=Evening:Market=Closed"
> [12] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"
>
> Thanks again for all of the help!
If each of the text lines in the file are in fact on a separate line, then they will be split up by carriage return/line feed sequences (CR/LF) and can be read by R on a line by line basis using readLines().
Having done so, by copying the above from the clipboard, I get the following, presuming that the quotes are not part of the file input:
> Lines
[1] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat "
[2] ""
[3] "Drink=Water:Location=Office:Time=Afternoon:Market=Up "
[4] ""
[5] "Drink=Water:Location=Gym:Time=Evening:Market=Closed "
[6] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed "
[7] " Drink=Coffee:Location=Office:Time=Morning:Market=Flat "
[8] "Drink=Water:Location=Office:Time=Afternoon:Market=Up "
[9] ""
[10] " Drink=Water:Location=Gym:Time=Evening:Market=Closed "
[11] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"
[12] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat "
[13] ""
[14] "Drink=Water:Location=Office:Time=Afternoon:Market=Up "
[15] ""
[16] "Drink=Water:Location=Gym:Time=Evening:Market=Closed "
[17] ""
[18] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"
Even with this irregular structure, you can still use:
Res1 <- gsub(".*Location=(.+):Time=.*", "\\1", Lines)
> Res1
[1] "Office" "" "Office" "" "Gym"
[6] "Restaurant" "Office" "Office" "" "Gym"
[11] "Restaurant" "Office" "" "Office" ""
[16] "Gym" "" "Restaurant"
I can get rid of the blanks by using:
> Res1[Res1 != ""]
[1] "Office" "Office" "Gym" "Restaurant" "Office"
[6] "Office" "Gym" "Restaurant" "Office" "Office"
[11] "Gym" "Restaurant"
If you do want to get just the fixed data as you have above:
# Get rid of all spaces
Res2 <- gsub(" +", "", Lines)
# get rid of blank lines
> Res2[Res2 != ""]
[1] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat"
[2] "Drink=Water:Location=Office:Time=Afternoon:Market=Up"
[3] "Drink=Water:Location=Gym:Time=Evening:Market=Closed"
[4] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"
[5] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat"
[6] "Drink=Water:Location=Office:Time=Afternoon:Market=Up"
[7] "Drink=Water:Location=Gym:Time=Evening:Market=Closed"
[8] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"
[9] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat"
[10] "Drink=Water:Location=Office:Time=Afternoon:Market=Up"
[11] "Drink=Water:Location=Gym:Time=Evening:Market=Closed"
[12] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"
HTH,
Marc
More information about the R-help
mailing list