[R] Beyond reshape: automatically streamlining data
Marshall Feldman
marsh at uri.edu
Fri Apr 9 14:59:02 CEST 2010
Hello,
I've been very impressed by the reshape package and how easy it makes
reorganizing statistical data structures. This makes me wonder if
there's another package out there that addresses another set of tasks
that one often does when preparing data for analysis.
For any particular set of analyses, one typically recodes variables and
deletes cases and variables. It would be really nice to have a package
that, for example, if one selected cases from a larger data set based on
the values of certain variables would inspect the resulting data and
drop any variables that have the same value for all cases. Similarly, if
any cases are entirely zero or NA, the package could (under user
control) drop these cases. Finally, it could take a set of data
transformations and keep them as an object, so that the same
selection/reshape/streamlining can easily be applied to similar data sets.
My motivation for this came from working with employment data this
morning. I started out with 11 variables and 35569 cases for Rhode
Island, a few selections later I had only 420 cases and 3 variables. It
struck me that the process I went through, which included not only
making selections but also inspecting the results and deleting
unnecessary cases/variables, could be automated at least to eliminate
the inspection step. Also, since I want to do the same thing with data
for other states, automation would be very nice indeed.
I realize that programming this kind of stuff in R is relatively easy,
but the reshape package makes me wonder if someone has already done it.
Thanks
Marsh Feldman
More information about the R-help
mailing list