[R] Summary information by groups programming assitance
hadley wickham
h.wickham at gmail.com
Mon Dec 22 22:59:15 CET 2008
On Mon, Dec 22, 2008 at 3:51 PM, Ranney, Steven
<steven.ranney at montana.edu> wrote:
> All -
>
> I have data that looks like
>
> psd Species Lake Length Weight St.weight Wr
> Wr.1 vol
> 432 substock SMB Clear 150 41.00 0.01 95.12438
> 95.10118 0.0105
> 433 substock SMB Clear 152 39.00 0.01 86.72916
> 86.70692 0.0105
> 434 substock SMB Clear 152 40.00 3.11 88.95298
> 82.03689 3.2655
> 435 substock SMB Clear 159 48.00 0.04 92.42095
> 92.34393 0.0420
> 436 substock SMB Clear 159 48.00 0.01 92.42095
> 92.40170 0.0105
> 437 substock SMB Clear 165 47.00 0.03 80.38023
> 80.32892 0.0315
> 438 substock SMB Clear 171 62.00 0.21 94.58105
> 94.26070 0.2205
> 439 substock SMB Clear 178 70.00 0.01 93.91912
> 93.90571 0.0105
> 440 substock SMB Clear 179 76.00 1.38 100.15760
> 98.33895 1.4490
> 441 S-Q SMB Clear 180 75.00 0.01 97.09330
> 97.08035 0.0105
> 442 S-Q SMB Clear 180 92.00 0.02 119.10111
> 119.07522 0.0210
> ...
> [truncated]
>
> where psd and lake are categorical variables, with five and four
> categories, respectively. I'd like to find the maximum vol and the
> lengths associated with each maximum vol by each category by each lake.
> In other words, I'd like to have a data frame that looks something like
>
> Lake Category Length vol
> Clear substock 152 3.2655
> Clear S-Q 266 11.73
> Clear Q-P 330 14.89
> ...
> Pickerel substock 170 3.4965
> Pickerel S-Q 248 10.69
> Pickerel Q-P 335 25.62
> Pickerel P-M 415 32.62
> Pickerel M-T 442 17.25
>
>
> In order to originally get this, I used
>
> with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))
>
> and pulled the values I needed out by hand and put them into a .csv.
> Unfortunately, I've got a number of other data sets upon which I'll need
> to do the same analysis. Finding a programmable alternative would
> provide a much easier (and likely less error prone) method to achieve
> the same results. Ideally, the "Length" and "vol" data would be in a
> data frame such that I could then analyze with nls.
>
> Does anyone have any thoughts as to how I might accomplish this?
You might want to have a look at the plyr package,
http://had.co.nz/plyr, which provides a set of tools to make tasks
like this easy. The are a number of similar examples in the
introductory pdf that should get you started.
Regards,
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list