[R] Re : Large database help

roger koenker rkoenker at uiuc.edu
Tue May 16 23:26:06 CEST 2006


In ancient times, 1999 or so, Alvaro Novo and I experimented with an
interface to mysql that brought chunks of data into R and accumulated  
results.
This is still described and available on the web in its original form at

	http://www.econ.uiuc.edu/~roger/research/rq/LM.html

Despite claims of "future developments" nothing emerged, so anyone
considering further explorations with it may need training in  
Rchaeology.

The toy problem we were solving was a large least squares problem,
which was a stalking horse for large quantile regression  problems.   
Around the same
time I discovered sparse linear algebra and realized that virtually all
large problems that I was interested in were better handled in from
that perspective.

url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    rkoenker at uiuc.edu            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Champaign, IL 61820


On May 16, 2006, at 3:57 PM, Robert Citek wrote:

>
> On May 16, 2006, at 11:19 AM, Prof Brian Ripley wrote:
>> Well, there *is* a manual about R Data Import/Export, and this does
>> discuss using R with DBMSs with examples.  How about reading it?
>
> Thanks for the pointer:
>
>    http://cran.r-project.org/doc/manuals/R-data.html#Relational-
> databases
>
> Unfortunately, that manual doesn't really answer my question.  My
> question is not about how do I make R interact with a database, but
> rather how do I make R interact with a database containing large sets.
>
>> The point being made is that you can import just the columns you
>> need, and indeed summaries of those columns.
>
> That sounds great in theory.  Now I want to reduce it to practice.
> In the toy problem from the previous post, how can one compute the
> mean of a set of 1e9 numbers?  R has some difficulty generating a
> billion (1e9) number set let alone taking the mean of that set.  To  
> wit:
>
>    bigset <- runif(1e9,0,1e9)
>
> runs out of memory on my system.  I realize that I can do some fancy
> data shuffling and hand-waving to calculate the mean.  But I was
> wondering if R has a module that already abstracts out that magic,
> perhaps using a database.
>
> Any pointers to more detailed reading is greatly appreciated.
>
> Regards,
> - Robert
> http://www.cwelug.org/downloads
> Help others get OpenSource software.  Distribute FLOSS
> for Windows, Linux, *BSD, and MacOS X with BitTorrent
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html




More information about the R-help mailing list