[R] R runtime performance and memory usage
Martin Maechler
maechler at stat.math.ethz.ch
Tue Nov 17 18:49:41 CET 2015
>>>>> William Dunlap <wdunlap at tibco.com>
>>>>> on Mon, 16 Nov 2015 16:01:42 -0800 writes:
> If a quick running time is important and your models involve only
> numeric data with no missing values and you are willing to spend more
> programming time setting things up, the lsfit() function may work
> better for you.
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
or even faster is the extra-simple but fast .lm.fit() function
(in R >= 3.1.0).
I've written a small demo about it and published it here,
http://rpubs.com/maechler/fast_lm
Martin Maechler, ETH Zurich (and R Core)
> On Mon, Nov 16, 2015 at 3:25 PM, Sasikumar Kandhasamy <ckmsasi at gmail.com> wrote:
>> Thanks a lot Bill & Bert.
>>
>> Hi Bill,
>>
>> Sorry i was wrong on number of records, actually, i am using two dimensional
>> data of 250K records each. And regarding CPU usage, it was the elapsed time.
>> Infact, i have pined one core to run R.
>>
>> Thanks & Regards
>> Sasi
>>
>> On Mon, Nov 16, 2015 at 2:04 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>>
>>> You cannot do a linear regression with one column of data - there must
>>> be at least one response column and one predictor. By default, lm
>>> throws in a constant term which gives you a second predictor. If your
>>> predictor is categorical, you get a new column for all but the first
>>> unique value in it.
>>>
>>> lm() deals only with double precision data, at 8 bytes/number. Thus
>>> 250k numbers occupies 2 million bytes. Your three columns (in the
>>> non-categorical-predictor case) take up 6 million bytes,
>>>
>>> lm()'s output contains several columns the size of the response
>>> variable: residuals, effects, and fitted.values. It also contains the
>>> QR decomposition of the design matrix (the size of all the predictor
>>> columns together).
>>>
>>> There are also some temporary variables generated in the course of the
>>> computation.
>>>
>>> So your observed 40 MB memory usage seems reasonable.
>>>
>>> Use the object.size() function to see how big objects are and str() to
>>> look at their structure.
>>>
>>> My laptop with a 2.5 GHz Intel i7 processor takes a quarter second to
>>> fit a simple linear model with one numeric predictor and a constant
>>> term. 6 seconds sounds slow. Is that cpu or elapsed time (use
>>> system.time() to see)?
>>>
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>> On Mon, Nov 16, 2015 at 12:25 PM, Sasikumar Kandhasamy
>>> <ckmsasi at gmail.com> wrote:
>>> > Hi All,
>>> >
>>> > I have couple of clarifications on R run-time performance. I have
>>> > R-3.2.2
>>> > package compiled for MIPS64 and am running it on my linux machine with
>>> > mips64 processor (core speed 1.5GHz) and observing the following
>>> > behaviors,
>>> >
>>> > 1. Applying "linear regression model" (lm) on 1MB of data (contains 1
>>> > column of 250K records) takes ~6 seconds to complete. Anyidea, is it an
>>> > expected behavior or not? If not, can you please the suggestions or
>>> > options
>>> > to improve if we have any?
>>> >
>>> > 2. Also, the R process runtime virtual memory is increased by 40MB after
>>> > applying the linear model on 1MB data. Is it also expected behavior? If
>>> > it
>>> > is expected, can you please share the insight of memory usage?
>>> >
>>> > Thanks in advance.
>>> >
>>> > Regards
>>> > Sasi
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> > http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list