[R] Median of streaming data

Martin Maechler maechler at stat.math.ethz.ch
Fri Sep 26 11:48:49 CEST 2014

>>>>> Rolf Turner <r.turner at auckland.ac.nz>
>>>>>     on Thu, 25 Sep 2014 11:44:38 +1200 writes:

    > On 24/09/14 20:16, Martin Maechler wrote: <SNIP>

    >> 1) has your proposal ever been provided in R?  I'd be
    >> happy to add it to the robustX
    >> (http://cran.ch.r-project.org/web/packages/robustX) or
    >> even robustbase
    >> (http://cran.ch.r-project.org/web/packages/robustbase)
    >> package.

    > <SNIP>

    > I have coded up the algorithm from the Cameron and Turner
    > paper.  Dunno if it gives exactly the same results as my
    > (Splus?) code from lo these many years ago (the code that
    > is lost in the mists of time), but it *seems* to work.

excellent, thank you, Rolf!

    > It is not designed to work with actual "streaming" data
    > --- I don't know how to do that.  It takes a complete data
    > vector as input.  Someone who knows about streaming data
    > should be able to adapt it pretty easily.  Said he, the
    > proverbial optimist.

I agree; that should not be hard. 
One way is to replace   'y[ind]' by   'getY(ind)' everywhere in the code
and let 'getY' be an argument to rlas() provided by the user.

    > The function code and a help file are attached.  These
    > files have had their names changed to end in ".txt" so
    > that they will get through the mailing list processor
    > without being stripped.  With a bit of luck.

It did work indeed.
I've added them to  'robustX' -- on R-forge,
including a plot() method and some little more flexibility.

  --> https://r-forge.r-project.org/R/?group_id=59

Thank you for all the other pointers to litterature (but none to
software), some of which quite recent.

One old idea that was not directly mentioned I think is the
"Remedian"  of Rouseeuw and Basset:

  Peter J. Rousseeuw and Gilbert W. Bassett, Jr. (1998)
  The Remedian: A Robust Averaging Method for Large Data Sets
  Journal of the American Statistical Association, Vol. 85, No. 409, pp. 97-104
	  [URL: http://www.jstor.org/stable/2289530]

which is also easy to implement and I plan to add to robustbase
(as I'd want to use the C code already in robustbase) as a
"reference" estimator.

Personally, I think there is quite some room for research and
implementation, not the least because the litterature seems to
always be a bit incomplete {one "school" not knowning about, or
at least not citing works of the other "school", etc...}


Martin Maechler, ETH Zurich

    > If they *don't* get through, anyone who is interested
    > should contact me and I will send them to you "privately".

    > cheers,
    > Rolf

    > -- 
    > Rolf Turner Technical Editor ANZJS 
    > external: rlas.R, plain text]
    > external: rlas.Rd, plain text]

More information about the R-help mailing list