[R] Alternative to apply in base R
Doran, Harold
HDoran at air.org
Wed Nov 9 13:30:30 CET 2016
The speed enhancement using your f2() is remarkable when compared to the apply() method I implemented. In the larger context of my actual problem, this essentially now solves the big computational hog and I can do some real work in a meaningful timeframe as a result.
Thank you for all the suggestions to those on this thread.
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Tuesday, November 08, 2016 5:14 PM
To: Doran, Harold <HDoran at air.org>
Cc: peter dalgaard <pdalgd at gmail.com>; r-help at r-project.org; Fox, John <jfox at mcmaster.ca>
Subject: Re: [R] Alternative to apply in base R
The version which allows any number of columns does not take
much more time than the one that requires exactly 7 columns.
If you have a zillion columns then these are not so good.
> f1 <- function(x) x[,1]*x[,2]*x[,3]*x[,4]*x[,5]*x[,6]*x[,7]
> f2 <- function(x) {
+ val <- rep(1, nrow(x))
+ for(i in seq_len(ncol(x))) {
+ val <- val * x[,i]
+ }
+ val
+ }
> z <- matrix(runif(10e6 * 7), ncol=7)
> system.time(v1 <- f1(z))
user system elapsed
0.686 0.140 0.826
> system.time(v2 <- f2(z))
user system elapsed
0.663 0.196 0.860
> all.equal(v1,v2,tolerance=0)
[1] TRUE
You might speed up f2 a tad by special-casing the ncol==0,
ncol==1, and ncol>1 cases.
The versions that call prod() nrow(x) times take about 25 seconds
on this machine and dataset.
Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>
On Tue, Nov 8, 2016 at 1:58 PM, Doran, Harold <HDoran at air.org<mailto:HDoran at air.org>> wrote:
Well, I wish R-help had a “like” button as I would most certainly like
this reply :)
As usual, you’re right. I should have added a disclaimer that “in this
instance” there are 7 columns as the function I wrote evaluates an
N-dimensional integral and so as the dimensions change, so do the number
of columns in this matrix (plus another factor). But the number of columns
is never all that large.
On 11/8/16, 4:37 PM, "peter dalgaard" <pdalgd at gmail.com<mailto:pdalgd at gmail.com>> wrote:
>
>> On 08 Nov 2016, at 21:23 , Doran, Harold <HDoran at air.org<mailto:HDoran at air.org>> wrote:
>>
>> It¹s a good suggestion. Multiplication in this case is over 7 columns in
>> the data, but the number of rows is millions. Unfortunately, the values
>> are negative as these are actually gauss-quad nodes used to evaluate a
>> multidimensional integral.
>
>If there really are only 7 cols, then there's also the blindingly obvious
>
>mm[,1]*mm[,2]*mm[,3]*mm[,4]*mm[,5]*mm[,6]*mm[,7]
>
>-pd
>
>
>>
>> colSums is better than something like apply(dat, 2, sum); I was hoping
>> there was something similar to colSums/rowSums using prod().
>>
>> On 11/8/16, 3:00 PM, "Fox, John" <jfox at mcmaster.ca<mailto:jfox at mcmaster.ca>> wrote:
>>
>>> Dear Harold,
>>>
>>> If the actual data with which you're dealing are non-negative, you
>>>could
>>> log all the values, and use colSums() on the logs. That might also have
>>> the advantage of greater numerical accuracy than multiplying millions
>>>of
>>> numbers. Depending on the numbers, the products may be too large or
>>>small
>>> to be represented. Of course, logs won't work with your toy example,
>>> where rnorm() will generate values that are both negative and positive.
>>>
>>> I hope this helps,
>>> John
>>> -----------------------------
>>> John Fox, Professor
>>> McMaster University
>>> Hamilton, Ontario
>>> Canada L8S 4M4
>>> web: socserv.mcmaster.ca/jfox<http://socserv.mcmaster.ca/jfox>
>>>
>>>
>>> ________________________________________
>>> From: R-help [r-help-bounces at r-project.org<mailto:r-help-bounces at r-project.org>] on behalf of Doran, Harold
>>> [HDoran at air.org<mailto:HDoran at air.org>]
>>> Sent: November 8, 2016 10:57 AM
>>> To: r-help at r-project.org<mailto:r-help at r-project.org>
>>> Subject: [R] Alternative to apply in base R
>>>
>>> Without reaching out to another package in R, I wonder what the best
>>>way
>>> is to speed enhance the following toy example? Over the years I have
>>> become very comfortable with the family of apply functions and
>>>generally
>>> not good at finding an improvement for speed.
>>>
>>> This toy example is small, but my real data has many millions of rows
>>>and
>>> the same operations is repeated many times and so finding a less
>>> expensive alternative would be helpful.
>>>
>>> mm <- matrix(rnorm(100), ncol = 10)
>>> rn <- apply(mm, 1, prod)
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>--
>Peter Dalgaard, Professor,
>Center for Statistics, Copenhagen Business School
>Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>Phone: (+45)38153501<tel:%28%2B45%2938153501>
>Office: A 4.23
>Email: pd.mes at cbs.dk<mailto:pd.mes at cbs.dk> Priv: PDalgd at gmail.com<mailto:PDalgd at gmail.com>
>
>
>
>
>
>
>
>
>
______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list