[R] Sampling from a database
F. Tusell
etptupaf at bs.ehu.es
Fri Feb 1 12:31:59 CET 2002
I use RODBC and RpgSQL quite a lot to access files stored in another
machine under PostgreSQL. Since I am now using files which do not fit
into R's memory, I would like to take random samples. What I would
like is to issue a query such as
SELECT * FROM file WHERE runif > 0.9
with "runif" being a uniformly distributed random number, generated on
the fly; but I cannot find any way to do that in PostgreSQL. So far,
my only idea is to create a new table with two fields, "IDnumber" and
"Random", fill "IDnumber" with the similarly named field from the
table to sample, populate "Random" with random numbers generated in R
and then issue a query for records where "IDnumber" of the two tables
match AND Random > 0.9.
Seems to me clumsy, and surely there has to be a better way and
someone knowing SQL can suggest it.
ft.
--
Fernando TUSELL e-mail:
Departamento de Econometría y Estadística etptupaf at bs.ehu.es
Facultad de CC.EE. y Empresariales Tel: (+34)94.601.3733
Avenida Lendakari Aguirre, 83 Fax: (+34)94.601.3754
E-48015 BILBAO (Spain) Secr: (+34)94.601.3740
PGP: finger etptupaf at bsdx01.bs.ehu.es http://etdx01.bs.ehu.es
----------------------------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list