[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?
Heinz Tuechler
tuechler at gmx.at
Thu Oct 31 11:25:18 CET 2013
on/am 31.10.2013 09:12, Prof Brian Ripley wrote/hat geschrieben:
> On 30/10/2013 21:15, William Dunlap wrote:
>> I have to defer to others for policy declarations like how long
>> the current format used by load and save should be readable.
>
> You could also ask how long R will last ....
>
> R can still read (but not write) save() formats used in the 1990's. We
> would expect R to be able to read saves since R 1.0.0 for as long as R
> exists. And as R is Open Source, you would be able to compile it and
> dump the objects you want for as long as suitable compilers and OSes
> exist .... And of course R is not the only application which will read
> the format.
>
> There is no guarantee that source() will be able to parse dumps from
> earlier versions of R, and that has not always been true.
>
> People commenting on parse() speed should note the NEWS for R-devel:
>
> • The parser has been modified to use less memory.
>
>
Thank you for the hint.
It appears to me that source() in R-devel performs at about the same
speed as in R 2.15.2.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>>> -----Original Message-----
>>> From: Heinz Tuechler [mailto:tuechler at gmx.at]
>>> Sent: Wednesday, October 30, 2013 1:43 PM
>>> To: William Dunlap
>>> Cc: Carl Witthoft; r-help at r-project.org
>>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R
>>> 3.0.2 ?
>>>
>>> Best thanks for confirming my impression. I use dump for storing large
>>> data.frames with a number of attributes for each variable. save/load is
>>> much faster, but I am unsure, if such files will be readable by R
>>> versions years later.
>>> What format/functions would you suggest for data storage/transfer
>>> between different (future) R versions?
>>>
>>> best regards,
>>> Heinz
>>>
>>> on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
>>>> I see a big 2.15.2/3.0.2 speed difference in parse() (which is used
>>>> by source())
>>>> when it is parsing long vectors of numeric data. dump/source has
>>>> never been an
>>> efficient
>>>> way of transferring data between different R session, but it is much
>>>> worse
>>>> now for long vectors. In 2.15.2 doubling the size of the vector
>>>> (of lengths
>>>> in the range 10^4 to 10^7) makes the time to parse go up by a factor
>>>> of c. 2.1.
>>>> In 3.0.2 that factor is more like 4.4.
>>>>
>>>> n elapsed-2.15.2 elapsed-3.0.2
>>>> 2048 0.003 0.018
>>>> 4096 0.006 0.065
>>>> 8192 0.013 0.254
>>>> 16384 0.025 1.067
>>>> 32768 0.050 4.114
>>>> 65536 0.100 16.236
>>>> 131072 0.219 66.013
>>>> 262144 0.808 291.883
>>>> 524288 2.022 1285.265
>>>> 1048576 4.918 NA
>>>> 2097152 9.857 NA
>>>> 4194304 22.916 NA
>>>> 8388608 49.671 NA
>>>> 16777216 101.042 NA
>>>> 33554432 512.719 NA
>>>>
>>>> I tried this with 64-bit R on a Linux box. The NA's represent sizes
>>>> that did not
>>>> finish while I was at a 1 1/2 hour dentist's apppointment. The
>>>> timing function
>>>> was:
>>>> test <- function(n = 2^(11:25))
>>>> {
>>>> tf <- tempfile()
>>>> on.exit(unlink(tf))
>>>> t(sapply(n, function(n){
>>>> dput(log(seq_len(n)), file=tf)
>>>> print(c(n=n, system.time(parse(file=tf))[1:3]))
>>>> }))
>>>> }
>>>>
>>>> Bill Dunlap
>>>> Spotfire, TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at r-project.org
>>>>> [mailto:r-help-bounces at r-project.org] On
>>> Behalf
>>>>> Of Carl Witthoft
>>>>> Sent: Wednesday, October 30, 2013 5:29 AM
>>>>> To: r-help at r-project.org
>>>>> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R
>>>>> 3.0.2 ?
>>>>>
>>>>> Did you run the identical code on the identical machine, and did
>>>>> you verify
>>>>> there were no other tasks running which might have limited the RAM
>>>>> available
>>>>> to R? And equally important, did you run these tests in the
>>>>> reverse order
>>>>> (in case R was storing large objects from the first run, thus
>>>>> chewing up
>>>>> RAM)?
>>>>>
>>>>>
>>>>>
>>>>> Dear All,
>>>>>
>>>>> is it known that source works much faster in R 2.15.2 than in R
>>>>> 3.0.2 ?
>>>>> In the example below I observe e.g. for a data.frame with 10^7 rows
>>>>> the
>>>>> following timings:
>>>>>
>>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>>> length: 1e+07
>>>>> user system elapsed
>>>>> 62.04 0.22 62.26
>>>>>
>>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>>> length: 1e+07
>>>>> user system elapsed
>>>>> 388.63 176.42 566.41
>>>>>
>>>>> Is there a way to speed R version 3.0.2 up to the performance of R
>>>>> version 2.15.2?
>>>>>
>>>>> best regards,
>>>>>
>>>>> Heinz Tüchler
>>>>>
>>>>>
>>>>> example:
>>>>> sessionInfo()
>>>>> sample.vec <-
>>>>> c('source', 'causes', 'R', 'to', 'accept', 'its', 'input',
>>>>> 'from', 'the',
>>>>> 'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>> dmp.size <- c(10^(1:7))
>>>>> set.seed(37)
>>>>>
>>>>> for(i in dmp.size) {
>>>>> df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>> dump('df0', file='testdump')
>>>>> cat('length:', i, '\n')
>>>>> print(system.time(source('testdump', keep.source = FALSE,
>>>>> encoding='')))
>>>>> }
>>>>>
>>>>> output for R version 2.15.2 Patched (2012-11-29 r61184):
>>>>>> sessionInfo()
>>>>> R version 2.15.2 Patched (2012-11-29 r61184)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=German_Switzerland.1252
>>>>> LC_CTYPE=German_Switzerland.1252
>>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=German_Switzerland.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>> sample.vec <-
>>>>> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>>> 'the',
>>>>> + 'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>>> dmp.size <- c(10^(1:7))
>>>>>> set.seed(37)
>>>>>>
>>>>>> for(i in dmp.size) {
>>>>> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>> + dump('df0', file='testdump')
>>>>> + cat('length:', i, '\n')
>>>>> + print(system.time(source('testdump', keep.source = FALSE,
>>>>> + encoding='')))
>>>>> + }
>>>>> length: 10
>>>>> user system elapsed
>>>>> 0 0 0
>>>>> length: 100
>>>>> user system elapsed
>>>>> 0 0 0
>>>>> length: 1000
>>>>> user system elapsed
>>>>> 0 0 0
>>>>> length: 10000
>>>>> user system elapsed
>>>>> 0.02 0.00 0.01
>>>>> length: 1e+05
>>>>> user system elapsed
>>>>> 0.21 0.00 0.20
>>>>> length: 1e+06
>>>>> user system elapsed
>>>>> 4.47 0.04 4.51
>>>>> length: 1e+07
>>>>> user system elapsed
>>>>> 62.04 0.22 62.26
>>>>>>
>>>>>
>>>>>
>>>>> output for R version 3.0.2 Patched (2013-10-27 r64116):
>>>>>> sessionInfo()
>>>>> R version 3.0.2 Patched (2013-10-27 r64116)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=German_Switzerland.1252
>>>>> LC_CTYPE=German_Switzerland.1252
>>>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=German_Switzerland.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>> sample.vec <-
>>>>> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
>>>>> 'the',
>>>>> + 'named', 'file', 'or', 'URL', 'or', 'connection')
>>>>>> dmp.size <- c(10^(1:7))
>>>>>> set.seed(37)
>>>>>>
>>>>>> for(i in dmp.size) {
>>>>> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
>>>>> + dump('df0', file='testdump')
>>>>> + cat('length:', i, '\n')
>>>>> + print(system.time(source('testdump', keep.source = FALSE,
>>>>> + encoding='')))
>>>>> + }
>>>>> length: 10
>>>>> user system elapsed
>>>>> 0 0 0
>>>>> length: 100
>>>>> user system elapsed
>>>>> 0 0 0
>>>>> length: 1000
>>>>> user system elapsed
>>>>> 0 0 0
>>>>> length: 10000
>>>>> user system elapsed
>>>>> 0.01 0.00 0.01
>>>>> length: 1e+05
>>>>> user system elapsed
>>>>> 0.36 0.06 0.42
>>>>> length: 1e+06
>>>>> user system elapsed
>>>>> 6.02 1.86 7.88
>>>>> length: 1e+07
>>>>> user system elapsed
>>>>> 388.63 176.42 566.41
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://r.789695.n4.nabble.com/big-speed-difference-
>>> in-
>>>>> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
More information about the R-help
mailing list