[R] data summerization etc...

ajay ohri ohri2007 at gmail.com
Sat Jul 12 00:56:35 CEST 2008


Have you tried using the GUI Rattle from www.rattle.togaware.com . It
works pretty well for summarization.



On Sat, Jul 12, 2008 at 4:14 AM, sj <ssj1364 at gmail.com> wrote:
> Hello,
> I am trying to do some fairly straightforward data summarization, i.e., the
> kind you would do with a pivot table in excel or by using SQL queires. I
> have a moderately sized data set of ~70,000 records and I am trying to
> compute some group averages and sum values within groups. the code example
> below shows how I am trying to go about doing this
> pti <-rnorm(70000,10)
> fid <- rnorm(70000,100)
> finc <- rnorm(70000,1000)
> ### compute the sums of pti within fid groups
> sum_pinc <-aggregate(cbind(fid,pti),list(fid),FUN=sum)
> #### compute mean finc within fid groups
> tot_finc <- aggregate(cbind(fid,finc),list(fid),FUN=mean)
> when I try to do it this way I get an error message telling me that enough
> memory cannot be allocated ( I am using R 2.7.1 on Windows XP with 2 GB of
> Memory). I figure that there must be a more efficent way to go about doing
> this. Please suggest.
> I would typically do this kind of task in a database and use SQL to push the
> data around. I know RODBC allows you to write SQL to query external DBs. Is
> there any mechanisim that allows you to write SQL queies against datasets
> internal to R e.g. in the case above
> I could do something like
> set <- cbind(fid,pti,finc)
> select fid, sum(pti)
> from set
> group by fid
> that would be handy!
> Thanks,
> Spencer
>        [[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list