[R] the computational complexity of sample()

Thu Nov 15 20:00:16 CET 2001

Hi all,

	I am studying R's sample() function's behaviour when we are sampling
from a large vector with weights. For instance,
the following code measures the time it takes to do a weighted Bootstrap
sampling. 

>logfile <-  file("/home/jonqli/nortel/logfile.txt", "at")
>cat("\n", file=logfile)
>cat(date(), file=logfile)
>UPPER <- 1e5
>tmp <- sample(UPPER, replace=T, prob=1:UPPER)
>cat(date(), file=logfile)

	Here is a table for the time it takes in a Pentium III with 750 M
memory running Debian Linux.

UPPER = 1e6	TIME = 24600 seconds.
UPPER = 1e5 	TIME = 218 seconds
UPPER = 1e4 	TIME = 1~2 seconds

Clearly this is an algorithm with complexity of O(n^2). When I use
sample(UPPER) instead (no weights),
the complexity seems to go up in O(n).

UPPER = 1e6	TIME < 1 second
UPPER = 1e7	TIME = 7 seconds
UPPER = 1e8	TIME = 95 seconds

(UPPER = 1e9 didn't run because of memory allocation error).

	Am I right in these observations? If I am right, are there ways to
speed up the weighted bootstrap
sampling to O(nlogn), for instance? 	

Thanks in advance!

-- 
Jonathan Q. Li, PhD
Agilent Technologies Laboratory
Palo Alto, California, USA
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._