[R] why is nrow() so slow?
jim holtman
jholtman at gmail.com
Tue Sep 15 23:17:29 CEST 2009
'by' works with data.frames. Look at what happens if you don't send
in a dataframe to 'by':
> by.default
function (data, INDICES, FUN, ..., simplify = TRUE)
{
dd <- as.data.frame(data)
if (length(dim(data)))
by(dd, INDICES, FUN, ..., simplify = simplify)
else {
if (!is.list(INDICES)) {
The 'as.data.frame' converts it to a dataframe. Matrices are a lot
faster in many instances where you are working with 'matrix-like'
operations.
On Tue, Sep 15, 2009 at 5:12 PM, ivo welch <ivo_welch at brown.edu> wrote:
> interestingly, in my case, the opposite seems to be the case. data frames
> seem faster than matrices when it comes to "by" computation (which is where
> most of my calculations are in):
>
> ### here is my data frame and some information about it
>> dim(rets.subset)
> [1] 132508 3
>> names(rets.subset)
> [1] "PERMNO" "RET" "mdate"
>> length(unique(as.factor(rets.subset$PERMNO)))
> [1] 6832
>> length((as.factor(rets.subset$PERMNO)))
> [1] 132508
>
> ### calculation using data frame
>> system.time( { by( rets.subset, as.factor(rets.subset$PERMNO), mean) } )
> user system elapsed
> 3.295 2.798 6.095
>
> ### same as matrix
>> m=as.matrix(rets.subset)
>> system.time( { a=by( m, as.factor(m[,1]), mean) } )
> user system elapsed
> 5.371 5.557 10.928
>
> PS: Any speed suggestions are appreciated. This is "experimenting time" for
> me.
>
>
>> One note: if you're worried about speed, it almost always makes sense to
> use matrices rather than dataframes. If you've got mixed types this is
> tedious and error-prone (each type needs to be in a separate matrix), but if
> your data is all numeric, it's very simple, and will make things a lot
> faster.
>
>
>
>
>>
>> Duncan Murdoch
>>
>
>
>
> --
> Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
> CV Starr Professor of Economics (Finance), Brown University
> http://welch.econ.brown.edu/
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list