[R] Summary information by groups programming assitance

hadley wickham h.wickham at gmail.com
Mon Dec 22 22:59:15 CET 2008


On Mon, Dec 22, 2008 at 3:51 PM, Ranney, Steven
<steven.ranney at montana.edu> wrote:
> All -
>
> I have data that looks like
>
>          psd   Species Lake Length  Weight    St.weight    Wr
> Wr.1     vol
> 432  substock     SMB      Clear    150   41.00      0.01  95.12438
> 95.10118  0.0105
> 433  substock     SMB      Clear    152   39.00      0.01  86.72916
> 86.70692  0.0105
> 434  substock     SMB      Clear    152   40.00      3.11  88.95298
> 82.03689  3.2655
> 435  substock     SMB      Clear    159   48.00      0.04  92.42095
> 92.34393  0.0420
> 436  substock     SMB      Clear    159   48.00      0.01  92.42095
> 92.40170  0.0105
> 437  substock     SMB      Clear    165   47.00      0.03  80.38023
> 80.32892  0.0315
> 438  substock     SMB      Clear    171   62.00      0.21  94.58105
> 94.26070  0.2205
> 439  substock     SMB      Clear    178   70.00      0.01  93.91912
> 93.90571  0.0105
> 440  substock     SMB      Clear    179   76.00      1.38 100.15760
> 98.33895  1.4490
> 441       S-Q     SMB      Clear    180   75.00      0.01  97.09330
> 97.08035  0.0105
> 442       S-Q     SMB      Clear    180   92.00      0.02 119.10111
> 119.07522  0.0210
> ...
> [truncated]
>
> where psd and lake are categorical variables, with five and four
> categories, respectively.  I'd like to find the maximum vol and the
> lengths associated with each maximum vol by each category by each lake.
> In other words, I'd like to have a data frame that looks something like
>
> Lake            Category        Length  vol
> Clear           substock        152             3.2655
> Clear           S-Q             266             11.73
> Clear           Q-P             330             14.89
> ...
> Pickerel        substock        170             3.4965
> Pickerel        S-Q             248             10.69
> Pickerel        Q-P             335             25.62
> Pickerel        P-M             415             32.62
> Pickerel        M-T             442             17.25
>
>
> In order to originally get this, I used
>
> with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))
>
> and pulled the values I needed out by hand and put them into a .csv.
> Unfortunately, I've got a number of other data sets upon which I'll need
> to do the same analysis.  Finding a programmable alternative would
> provide a much easier (and likely less error prone) method to achieve
> the same results.  Ideally, the "Length" and "vol" data would be in a
> data frame such that I could then analyze with nls.
>
> Does anyone have any thoughts as to how I might accomplish this?

You might want to have a look at the plyr package,
http://had.co.nz/plyr, which provides a set of tools to make tasks
like this easy.  The are a number of similar examples in the
introductory pdf that should get you started.

Regards,

Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list