[R] Data Frame housekeeping

Scott Hatcher scott.v.hatcher at gmail.com
Wed May 25 19:16:48 CEST 2011


Hello Dr. Winsemius,

First of all, thank you for your prompt and helpful reply. Also, for 
providing something I hoped would be produced from joining this mailing 
list: a means of discovering incredibly useful packages such as the 
"reshape2" one you have introduced me too.

I have a follow up question to your solution (which should produce 
exactly what I need):

when I run the cast function to reassemble the data frame I get:

Error in names(data) <- array_names(res$labels[[2]]) :
   'names' attribute [7] must be the same length as the vector [1]

This signaled to me that the function was returning 7 values where it 
expected only 1. To test this I applied a summary function "mean" to the 
cast, and the result processed (however it only produced NA's because my 
values were class:factors). What I don't understand is where these 
multiple values are coming from; there should be only a single value 
corresponding to the 4 id.vars given in the cast function 
(STN_ID,YEAR,MM,variable).

Thanks again for your help,

Scott Hatcher

On 24/05/2011 5:16 PM, David Winsemius wrote:
>
> On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:
>
>> Hello,
>>
>> I have a large data frame that is organized by date in a peculiar way. I
>> am seeking advice on how to transform the data into a format that is of
>> more use to me.
>>
>> The data is organized as follows:
>>
>>     STN_ID YEAR MM ELEM      X1         X2       X3         X4        
>> X5        X6         X7
>> 1  2402594 1997   9   1 *-00233* *-00204* *-00119*  -00190  -00251  
>> -00243  -00249
>> 2  2402594 1997  10  1              -00003  -00005  -00001  -00039  
>> -00031  -00036  -00033
>> 3  2402594 1997  11  1              000025  000065  000070  000069  
>> 000115  000072  000093
>>
>> Where "MM" is the month of the year, and ELEM is the variable to which
>> the values in the X* columns describe (in the actual data there are 31 X
>> columns, one for each day of the month). The values in bold are the
>> values that are transferred into the small chart below (which is the
>> result I hope to get). This is to give a sense of how the data is picked
>> out of the original data frame.
>
> assuming this dataframe is named 'tst':
>
> require(reshape2)
> mtst <- melt(tst[, 1:7], id.vars=1:4)  Only select idvars and  X1:X3
>  str(mtst)
> #----------
> 'data.frame':    54 obs. of  6 variables:
>  $ STN_ID  : num  2402594 2402594 2402594 2402594 2402594 ...
>  $ YEAR    : num  1997 1997 1997 1997 1998 ...
>  $ MM      : num  9 10 11 12 1 2 3 4 5 9 ...
>  $ ELEM    : num  1 1 1 1 1 1 1 1 1 2 ...
>  $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1 1 ...
>  $ value   : chr  "-00233" "-00003" "000025" "000160" ...
>
> dcast(mtst, STN_ID +YEAR+ MM  + variable ~ ELEM)
> #---------
>     STN_ID YEAR MM variable      1      2
> 1  2402594 1997  9       X1 -00233 -00339
> 2  2402594 1997  9       X2 -00204 -00339
> 3  2402594 1997  9       X3 -00119 -00343
> 4  2402594 1997 10       X1 -00003 -00207
> 5  2402594 1997 10       X2 -00005 -00289
> 6  2402594 1997 10       X3 -00001 -00278
> 7  2402594 1997 11       X1 000025 -00242
> snipped output
>
>>
>> I would like to organize the data so it looks like this:
>>
>>       STN_ID YEAR MM DAY    ELEM1 ELEM2
>> 1     2402594 1997   9  X1       -00233 -00339
>> 2     2402594 1997   9  X2       -00204 000077
>> 3     2402594 1997   9  X3       -00119 000030
>
> Where is that second column coming from. I don't see it in the data 
> example
>>
>> Such that I create a new column named "DAY" that is made up of the
>> numbers following "X" in the original data.frame columns. Also, the ELEM
>> values are converted to columns and parsed with the ELEM code (in this
>> case 1 and 2).
>>
>> I have tried to split apart the columns, transform them, and bind them
>> back together, but my ability to do so just isn't there yet. I am still
>> fairly new to R, and would really appreciate some help in working
>> towards organizing this data frame.
>>
>> Thanks in advance,
>> Scott Hatcher
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>



More information about the R-help mailing list