[R] ref card for data manipulation?

hadley wickham h.wickham at gmail.com
Thu Dec 11 15:19:03 CET 2008

>> You (as many before you) have overlooked the ave() function, which can
>> replace the ordering as well the do.call(c,tapply(....))
> Majority of questions on this list concern data manipulation. Many are
> repetitive. "Overlooking" like that will always happen unless some
> comprehensive data manipulation documentation is made.
> I think many people would benefit if  a specialized data.manip ref.card were
> conceived.

I like the idea, but is a reference card really enough?  To me, what
most people need to tackle data manipulation problems is a broad
strategy, not a list of useful functions.  plyr is a codification of
my most recent ideas on one such strategy: splitting a big data
structure into smaller pieces, applying a function to each piece and
then joining them back together.  Just recognising your problem can be
solved with this strategy is a big step forward, the functions in plyr
just save you some typing and a bit of thought compared to doing it in
base R.

Recognising this strategy has helped me in my own data manipulation
problems - many tasks with which I used to struggle are now easy to
solve, not just because of plyr, but because I have a framework in
which to think about the problem.  But this is just one strategy and
there must be many more common strategies waiting to be identified.  I
think working on this would be time better spent - describing a
strategy gives people the tools to help themselves.  (Of course this
doesn't help the people who just want canned answers, but I'm less
interested in helping them)



More information about the R-help mailing list