[R] Re :  Large database help
    Robert Citek 
    rwcitek at alum.calberkeley.org
       
    Tue May 16 18:07:08 CEST 2006
    
    
  
On May 16, 2006, at 8:15 AM, justin bem wrote:
> Try to open your db with MySQL and use RMySQL
I've seen this offered up as a suggestion a few times but with little  
detail.  In my experience, even using SQL to pull in data from a  
MySQL DB, R would need to load the entire data set into RAM before  
doing some calculations.  But perhaps I'm using RMySQL incorrectly[1].
As a toy problem, let's imagine a data set (foo) with a single  
numerical field (bar) and 1 billion records (1e9).  In MySQL one  
would do the following to calculate the mean:
   select avg(bar) from foo ;
For a smaller data set I would issue a select statement and then  
fetch the entire set into a data frame before calculating the mean.   
Given such a large data set, how would one calculate the mean using R  
connected to this MySQL database?  How would one calculate the median  
using R connected to this MySQL database?
Pointers to references appreciated.
[1] http://www.sourcekeg.co.uk/cran/src/contrib/Descriptions/RMySQL.html
Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software.  Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent
    
    
More information about the R-help
mailing list