[R] Do you use R for data manipulation?

Paul Emberson lists at calidasoft.co.uk
Wed May 6 12:44:29 CEST 2009


I also use the approach Philipp describes below.  I use Python and shell 
scripts for processing thousands of input files and getting all the data 
into one tidy csv table.  From that point onwards it's R all the way 
(often with the reshape package).

Paul

Philipp Pagel wrote:
> On Wed, May 06, 2009 at 12:22:45AM -0400, Farrel Buchinsky wrote:
>> Is R an appropriate tool for data manipulation and data reshaping and data
>> organizing? I think so but someone who recently joined our group thinks not.
>> The new recruit believes that python or another language is a far better
>> tool for developing data manipulation scripts that can be then used by
>> several members of our research group.
> 
> 
> I happily use both approaches depending on the original format the
> data come in:
> 
> For data that are not in a "well behaved" format and require actual
> parsing, I tend to use Python scripts for transmogrifying the data
> into nice and tidy tables (and maybe some very basic filtering). For
> everything after that I prefer R. I also use Python if the relevant
> data needs to be harvested and assembled from many differnt sources
> (e.g. data files + web + databases).
> 
> Once the data files are easy to read (csv, tab separated, database,
> ...) and the task is to reshape, filter and clean the data, I usually
> do it in R. R has true advantages here: 
> 
>  - After reading a table into a data frame I can immediatly tell, if all
>    measurements are what they are supposed to be (integer, numeric,
>    factor, boolean) and functions like read.table even do quite some
>    error checking for me (equal number of columns etc.)
> 
>  - Finding out if factors have the right (or plausible) number of levels is easy
>  
>  - Filtering by logical indexing
> 
>  - Powerful and reliable reshaping (reshape package)
> 
>  - Very conveniant diagnostics: str(), dim(), table(), summary(),
>    plotting the data in various ways, ...
> 
> cu
> 	Philipp
>




More information about the R-help mailing list