[R] merge a list of data frames

David Winsemius dwinsemius at comcast.net
Thu Sep 6 19:30:16 CEST 2012


On Sep 6, 2012, at 6:42 AM, Sam Steingold wrote:

>> * David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-09-05 21:02:16 -0700]:
>> 
>> On Sep 5, 2012, at 8:51 PM, Sam Steingold wrote:
>> 
>>> I have a list of data frames:
>>> 
>>>> str(data)
>>> List of 4
>>> $ :'data.frame':	700773 obs. of  3 variables:
>>> ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
>>> "200030633708779" "200010587002779" ...
>>> ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>>> ..$ V3: num [1:700773] 1 1 1 1 1 ...
>>> $ :'data.frame':	700773 obs. of  3 variables:
>>> ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
>>> "200030633708779" "200010587002779" ...
>>> ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>>> ..$ V3: num [1:700773] 1 1 1 1 1 ...
>>> $ :'data.frame':	700773 obs. of  3 variables:
>>> ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
>>> "200030633708779" "200010587002779" ...
>>> ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>>> ..$ V3: num [1:700773] 1 1 1 1 1 ...
>>> $ :'data.frame':	700773 obs. of  3 variables:
>>> ..$ V1: chr [1:700773] "200160325893778" "200130647544079"
>>> "200130446465779" "200120186959078" ...
>>> ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>>> ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...
>>> 
>>> I want to merge them.
>> 
>> Why? What are you expecting?
> 
> these are the results of applying a model to the test data.
> the first column is the ID

In which case you should be using the 'by' argument to `merge` and specifying that it is only the first column that is to be used. In the curent situation merge will attempt to use all of the columns because the three columns have the same names.

Notice the difference in these results:

> merge( data.frame(a=1:3, b=5:7), data.frame(a=1:3, b=10:12) )
[1] a b
<0 rows> (or 0-length row.names)

> merge( data.frame(a=1:3, b=5:7), data.frame(a=1:3, b=10:12) , by=1)
  a b.x b.y
1 1   5  10
2 2   6  11
3 3   7  12

(`merge` "by" arguments can be column numbers.)


> the second column is the actual value
> the third column is the model score
> 
> after I will merge the frames, I will
> 1. check that all the V2 columns are identical and drop all but one
> (I guess I could just merge on c("V1","V2") instead, right?)

Depends what you want. I already suggested you only what to merge on the id column.

> 
> 2. compute the sum (or the mean, whatever is easier) of all the V3
> columns

`aggregate should do that without difficulty.
> 
> 3. sort by the sum/mean of the V3 columns and evaluate the combined
> model using the lift quality metric
> (http://dl.acm.org/citation.cfm?id=380995.381018)

That's going to require more background (or more money since they want $15.00 for a pdf.


> 
> I have many more score files (not just 4), so it is not practical for me
> to rename the column to something unique.

Which column?

> 
> 
> 
> -- 
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
> http://www.childpsy.net/ http://www.memritv.org http://truepeace.org
> http://jihadwatch.org http://mideasttruth.com http://americancensorship.org
> To be popular with ladies one has to be smart, handsome & rich. Or to be a cat.

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list