[R] First time r user

Steve Lianoglou lianoglou.steve at gene.com
Sun Aug 18 08:38:22 CEST 2013


Hi,

In addition to Rainer's suggestion (which are to give an small example
of what your input data look like and an example of what you want to
output), given the size of your input data, you might want to try to
use the data.table package instead of plyr::ddply -- especially while
you are exploring different combinations/calculations over your data.

Usually, the equivalent data.table approach (to the ddply one) tend to
be orders of magnitude faster and usually more memory efficient.

When the size of my data is small, I often use both (I think the
plyr/ddply "language" is rather beautiful), but when my data gets into
the 1000++ rows, I'll universally switch to data.table.

HTH,
-steve


On Sat, Aug 17, 2013 at 4:33 PM, Dylan Doyle <ddoyle.dub at gmail.com> wrote:
>
> Hello R users,
>
>
> I have recently begun a project to analyze a large data set of approximately 1.5 million rows it also has 9 columns. My objective consists of locating particular subsets within this data ie. take all rows with the same column 9 and perform a function on that subset. It was suggested to me that i use the ddply() function from the Pylr package. Any advice would be greatly appreciated
>
>
> Thanks much,
>
> Dylan
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the R-help mailing list