[R] Help with splitting up values in a data set

Shanae Clarke gayonclarke at yahoo.com
Wed Jul 30 00:08:40 CEST 2014


Thank you for that link, it was very helpful.

My attempt,

This is what my data set looks like 

dput(head(des,2))
structure(list(description = structure(5:6, .Label = c("\"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\"",
"76 cases of box juice had remained in the depot beyond there expiration date. No expiration date was visible, however the case code reflects that these cases had been packaged and had remained beyond the 90days expiry date." 
"59 cases of bottle juice were found in the depot to be beyond there expiration date. The case code reflects that these cases had been packaged and had remained beyond the 90days expiry date.The codes can be found in the Support File.  folder. NB. A customer had order 50 bags which was mistaken to be 150 cases as such they remained in the depot and efforts were being made to sell these cases, however, these 59 cases were not sold."
), class = "factor")), .Names = "description", row.names = 1:2, class = "data.frame") 

Aim:

I want to split up each word in so as to ascertain how frequent the word occurs in each complain case

example: [1]  "cases" "of" "box" "juice" "had" "remained" "in" "the" "depot" "beyond" "there" "expiration" "date"
	[2]  "cases" "of" "bottle" "juice" "were" "found" "in" "the" "depot" "to" "be" "beyond" "there" "expiration" "date"

frequent words would be: "cases" "juice", "depot" "expiration" "date" 
it should also tell which case numbers these words were most frequent to:  
number : 453032, 823041, 041812, 490322

My code thus far:


library (RODBC) 
channel <- odbcConnect(dsn=dsn.name,uid=user.name,pwd = pwd) 
res <- sqlFetch(channel , "casing") 
des <- sqlQuery(channel , "Select description from casing") 
num <- sqlQuery(channel , "Select number from casing") 
cas.des<- as.character(des$num) 
do.call(rbind, strsplit(cas.des, " "))

Results

> library (RODBC) 
> channel <- odbcConnect(dsn=dsn.name,uid=user.name,pwd = pwd) 
> res <- sqlFetch(channel , "casing") 
Warning message: 
closing unused RODBC handle 1 
> des <- sqlQuery(channel , "Select description from casing") 
> num <- sqlQuery(channel , "Select id from casing")
> cas.des<- as.character(des$num) 
> do.call(rbind, strsplit(cas.des, " ")) 
NULL

I don't understand what this means.

My apologies, I only started using R a month now and I have not gotten the full concept of everything.

Could someone please help me. I would gladly appreciate it. 

Gayon.


On Tuesday, July 29, 2014 4:05 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
Hi,

My telepathy is not working today.

Have you already imported your data into R?

If so, what does it look like?

dput(head(yourdata, 20))

is an effective way to provide data on the list. Or if your question
refers to only one column, that's all we need.

But we certainly need to know what the column looks like, and what you
expect to have as the final answer.

This may also be of use:
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Sarah




On Tue, Jul 29, 2014 at 2:04 PM, Gayon Clarke <gayonclarke at yahoo.com> wrote:
> Good day,
>
> I have a data set from a MySQL database with a description field that I want to spilt up the values in order to compare the description of one record to the others. This will help me to identify any patterns with the data being recorded.
>
> Your help will be gladly appreciated.
>
> Gayon
>

-- 
Sarah Goslee
http://www.functionaldiversity.org




More information about the R-help mailing list