[R] Median of streaming data

Rolf Turner r.turner at auckland.ac.nz
Wed Sep 24 12:14:47 CEST 2014


On 24/09/14 20:16, Martin Maechler wrote:
>>>>>> Rolf Turner <r.turner at auckland.ac.nz>
>>>>>>      on Wed, 24 Sep 2014 18:43:34 +1200 writes:
>
>      > On 24/09/14 17:31, Mohan Radhakrishnan wrote:
>      >> Hi,
>      >>
>      >> I have streaming data(1 TB) that can't fit in memory. Is
>      >> there a way for me to find the median of these streaming
>      >> integers assuming I can fit only a small part in memory ?
>      >> This is about the statistical approach to find the median
>      >> of a large number of values when I can inspect only a
>      >> part of them due to memory constraints.
>
>      > You cannot, I'm pretty sure, calculate the median
>      > recursively.  However there are "approximate" recursive
>      > median algorithms which provide an estimate of location
>      > that has the same asymptotic properties as the median.
>
>      > See:
>
>      > * U. Holst, Recursive estimators of location.
>      > Commun. Statist. Theory Meth., vol. 16, 1987,
>      > pp. 2201--2226.
>
>      > and
>
>      > * Murray A. Cameron and T. Rolf Turner, Recursive location
>      > and scale estimators, Commun. Statist. Theory Meth.,
>      > vol. 22, 1993, pp. 2503--2515.
>
> This is really interesting to me, thank you, Rolf!
>
> OTOH,
>
> 1) has your proposal ever been provided in R?
>     I'd be happy to add it to the robustX
>     (http://cran.ch.r-project.org/web/packages/robustX) or even
>     robustbase (http://cran.ch.r-project.org/web/packages/robustbase) package.

I coded the stuff up when Murray and I wrote the paper referred to, but 
I'm not sure if I'll be able to find the code now.  I think I was 
probably using Splus r.t. R at the time.

I'll have a bit of a search in my ancient archives and see what I can find.

> 2) Would anybody know of more recent research on the subject?
>     (I quickly "googled around" and found research more geared
>      for the time series situation which is more involved anyway)
>
>     --> Hence CC'ing the experts' list  R-SIG-robust

I don't know of anything more recent.  Some sort of citation search 
might help to turn up things that are out there.

cheers,

Rolf


-- 
Rolf Turner
Technical Editor ANZJS



More information about the R-help mailing list