[R] Memory filling up while looping
jim holtman
jholtman at gmail.com
Fri Dec 21 21:23:18 CET 2012
I ran your code and did not see any growth:
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 463828 24.8 818163 43.7 818163 43.7
Vcells 546318 4.2 1031040 7.9 909905 7.0
1 (1) - eval : <33.6 376.6> 376.6 : 48.9MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 471049 25.2 818163 43.7 818163 43.7
Vcells 544105 4.2 1031040 7.9 909905 7.0
2 (1) - eval : <35.9 379.2> 379.2 : 48.7MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 479520 25.7 818163 43.7 818163 43.7
Vcells 543882 4.2 1031040 7.9 909905 7.0
3 (1) - eval : <38.0 381.4> 381.4 : 48.7MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 488376 26.1 818163 43.7 818163 43.7
Vcells 544191 4.2 1031040 7.9 909905 7.0
4 (1) - eval : <40.0 383.4> 383.4 : 48.8MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 496695 26.6 818163 43.7 818163 43.7
Vcells 543971 4.2 1031040 7.9 909905 7.0
5 (1) - eval : <42.0 385.4> 385.4 : 48.7MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 505562 27.0 899071 48.1 818163 43.7
Vcells 544034 4.2 1031040 7.9 909905 7.0
6 (1) - eval : <44.1 387.5> 387.5 : 48.8MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 513896 27.5 899071 48.1 899071 48.1
Vcells 543973 4.2 1031040 7.9 909905 7.0
7 (1) - eval : <46.2 389.8> 389.8 : 52.5MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 523203 28.0 899071 48.1 899071 48.1
Vcells 544751 4.2 1031040 7.9 909905 7.0
8 (1) - eval : <48.5 392.2> 392.2 : 46.7MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 531519 28.4 899071 48.1 899071 48.1
Vcells 544418 4.2 1031040 7.9 909905 7.0
9 (1) - eval : <50.6 394.5> 394.5 : 47.3MB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 539556 28.9 899071 48.1 899071 48.1
Vcells 544057 4.2 1031040 7.9 909905 7.0
10 (1) - eval : <52.6 396.6> 396.6 : 47.8MB
started out with 48M and ended with 47M. This is with
R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-w64-mingw32/x64 (64-bit)
On Fri, Dec 21, 2012 at 10:27 AM, Peter Meißner
<peter.meissner at uni-konstanz.de> wrote:
> Here is an working example that reproduces the behavior by creating 1000
> xml-files and afterwards parsing them.
>
> At my PC, R starts with about 90MB of RAM with every cycle another 10-12MB
> are further added to the RAM-usage so I end up with 200MB RAM usage.
>
> In the real code one chunk-cycle eats about 800MB of RAM which was one of
> the reasons I decided to splitt up the process in seperate chunks in the
> first place.
>
> ----------------
> 'Minimal'Example - START
> ----------------
>
> # the general problem
> require(XML)
>
> chunk <- function(x, chunksize){
> # source: http://stackoverflow.com/a/3321659/1144966
> x2 <- seq_along(x)
> split(x, ceiling(x2/chunksize))
> }
>
>
>
> chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)
>
> for(i in 1:1000){
> writeLines(c(paste('<?xml version="1.0"?>\n <note>\n <to>Tove</to>\n
> <nr>',i,'</nr>\n <from>Jani</from>\n <heading>Reminder</heading>\n
> ',sep=""), paste(rep('<body>Do not forget me this
> weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
> ,paste("test",i,".xml",sep=""))
> }
>
> for(k in 1:length(chunky)){
> gc()
> print(chunky[[k]])
> xmlCatcher <- NULL
>
> for(i in 1:length(chunky[[k]])){
> filename <- chunky[[k]][i]
> xml <- xmlTreeParse(filename)
> xml <- xmlRoot(xml)
> result <- sapply(getNodeSet(xml,"//body"), xmlValue)
> id <- sapply(getNodeSet(xml,"//nr"), xmlValue)
> dummy <- cbind(id,result)
> xmlCatcher <- rbind(xmlCatcher,dummy)
> }
> save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
> }
>
> ----------------
> 'Minimal'Example - END
> ----------------
>
>
>
> Am 21.12.2012 15:14, schrieb jim holtman:
>
>> Can you send either your actual script or the console output so I can
>> get an idea of how fast memory is growing. Also at the end, can you
>> list the sizes of the objects in the workspace. Here is a function I
>> use to get the space:
>>
>> my.ls <-
>> function (pos = 1, sorted = FALSE, envir = as.environment(pos))
>> {
>> .result <- sapply(ls(envir = envir, all.names = TRUE),
>> function(..x) object.size(eval(as.symbol(..x),
>> envir = envir)))
>> if (length(.result) == 0)
>> return("No objects to list")
>> if (sorted) {
>> .result <- rev(sort(.result))
>> }
>> .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` =
>> sum(.result)))
>> names(.ls) <- "Size"
>> .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
>> format = "f")
>> .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>> function(x) class(eval(as.symbol(x),
>> envir = envir))[1L])), "-------")
>> .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>> function(x) length(eval(as.symbol(x), envir = envir)))),
>> "-------")
>> .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
>> paste(dim(eval(as.symbol(x),
>> envir = envir)), collapse = " x "))), "-------")
>> .ls
>> }
>>
>>
>> which gives output like this:
>>
>>> my.ls()
>>
>> Size Class Length Dim
>> .Last 736 function 1
>> .my.env.jph 28 environment 39
>> x 424 integer 100
>> y 40,024 integer 10000
>> z 4,000,024 integer 1000000
>> **Total 4,041,236 ------- ------- -------
>>
>>
>> On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
>> <peter.meissner at uni-konstanz.de> wrote:
>>>
>>> Thanks for your answer,
>>>
>>> yes, I tried 'gc()' it did not change the bahavior.
>>>
>>> best, Peter
>>>
>>>
>>> Am 21.12.2012 13:37, schrieb jim holtman:
>>>>
>>>>
>>>> have you tried putting calls to 'gc' at the top of the first loop to
>>>> make sure memory is reclaimed? You can print the call to 'gc' to see
>>>> how fast it is growing.
>>>>
>>>> On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
>>>> <peter.meissner at uni-konstanz.de> wrote:
>>>>>
>>>>>
>>>>> Hey,
>>>>>
>>>>> I have an double loop like this:
>>>>>
>>>>>
>>>>> chunk <- list(1:10, 11:20, 21:30)
>>>>> for(k in 1:length(chunk)){
>>>>> print(chunk[k])
>>>>> DummyCatcher <- NULL
>>>>> for(i in chunk[k]){
>>>>> print("i load something")
>>>>> dummy <- 1
>>>>> print("i do something")
>>>>> dummy <- dummy + 1
>>>>> print("i do put it together")
>>>>> DummyCatcher = rbind(DummyCatcher, dummy)
>>>>> }
>>>>> print("i save a chunk and restart with another chunk of
>>>>> data")
>>>>> }
>>>>>
>>>>> The problem now is that with each 'chunk'-cycle the memory used by R
>>>>> becomes
>>>>> bigger and bigger until it exceeds my RAM but the RAM it needs for any
>>>>> of
>>>>> the chunk-cycles alone is only a 1/5th of what I have overall.
>>>>>
>>>>> Does somebody have an idea why this behaviour might occur? Note that
>>>>> all
>>>>> the
>>>>> objects (like 'DummyCatcher') are reused every cycle so that I would
>>>>> assume
>>>>> that the RAM used should stay about the same after the first 'chunk'
>>>>> cycle.
>>>>>
>>>>>
>>>>> Best, Peter
>>>>>
>>>>>
>>>>> SystemInfo:
>>>>>
>>>>> R version 2.15.2 (2012-10-26)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>> Win7 Enterprise, 8 GB RAM
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Peter Meißner
>>> Workgroup 'Comparative Parliamentary Politics'
>>> Department of Politics and Administration
>>> University of Konstanz
>>> Box 216
>>> 78457 Konstanz
>>> Germany
>>>
>>> +49 7531 88 5665
>>> http://www.polver.uni-konstanz.de/sieberer/home/
>>
>>
>>
>>
>
> --
> Peter Meißner
> Workgroup 'Comparative Parliamentary Politics'
> Department of Politics and Administration
> University of Konstanz
> Box 216
> 78457 Konstanz
> Germany
>
> +49 7531 88 5665
> http://www.polver.uni-konstanz.de/sieberer/home/
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
More information about the R-help
mailing list