[R] working with summarized data

Greg Snow Greg.Snow at intermountainmail.org
Wed Aug 30 18:28:18 CEST 2006

There are functions to do weighted summary statistics in the Hmisc
package (wtd.quantile, ...).

For more complicated analyses (but not plots yet) the biglm package has
a bigglm function that expects the data in chunks, you could write a
function that expand parts of the dataset at a time.

Hope this helps, 

Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Rick Bischoff
Sent: Wednesday, August 30, 2006 8:28 AM
To: r-help at stat.math.ethz.ch
Subject: [R] working with summarized data

The data sets I am working with all have a weight variable--e.g., each
row doesn't mean 1 observation.

With that in mind, nearly all of the graphs and summary statistics are
incorrect for my data, because they don't take into account the weight.

For example "median" is incorrect, as the quantiles aren't calculated
with weights:

sum( weights[X < median(X)] ) / sum(weights)

This should be 0.5... of course it's not.

Unfortunately, it seems that most(all?) of R's graphics and summary  
statistic functions don't take a weight or frequency argument.    
(Fortunately the models do...)

Am I completely missing how to do this?  One way would be to replicate
each row proportional to the weight (e.g. if the weight was 4, we would
3 additional copies) but this will get prohibitive pretty quickly as the
dataset grows.

Thanks in advance!

R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list