[R] Data transformation & cleaning

Daniel Malter daniel at umd.edu
Wed Sep 28 07:38:50 CEST 2011


On a methodological level, if the choices do not correspond on a cardinal or
at least ordinal scale, you don't want to use correlations. Instead you
should probably use Cramer's V, in particular if the choices are
multinomial. Whether the wide format is necessary will depend on the format
the function you are using expects.

HTH,
Daniel


pde3p wrote:
> 
> Hi,
> 
> I have a few methodological and implementation questions for ya'll. Thank
> you in advance for your help. I have a dataset that reflects people's
> preference choices. I want to see if there's any kind of clustering effect
> among certain preference choices (e.g. do people who pick choice A also
> pick choice D). 
> 
> I have a data set that has one record per user ID, per preference choice.
> It's a "long" form of a data set that looks like this: 
> 
> ID | Page
> 123 | Choice A
> 123 | Choice B
> 456 | Choice A
> 456 | Choice B
> ...
> 
> I thought that I should do the following
> 
> 1. Make the data set "wide", counting the observations so the data looks
> like this:
> ID | Count of Preference A | Count of Preference B
> 123 | 1 | 1
> ...
> 
> Using 
> table1 <- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )
> 
> 2. Create a correlation matrix of preferences
> cor(table2[,-1])
> 
> How would I restrict my correlation to show preferences that met a minimum
> sample threshold? Can you confirm if the two following commands do the
> same thing? What would I do from here (or am I taking the wrong approach)
> table1 <- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
> table2 <- with(data, table(Page,Page))
> 
> 
> many thanks,
> Peter
> 

--
View this message in context: http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3850076.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list