[R] Memory filling up while looping

Fri Dec 21 18:33:09 CET 2012

Circle 2 of 'The R Inferno' may help you.

http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

In particular, it has an example of how to do what
Duncan suggested.

Pat

On 21/12/2012 15:27, Peter Meißner wrote:
> Here is an working example that reproduces the behavior by creating 1000
> xml-files and afterwards parsing them.
>
> At my PC, R starts with about 90MB of RAM with every cycle another
> 10-12MB are further added to the RAM-usage so I end up with 200MB RAM
> usage.
>
> In the real code one chunk-cycle eats about 800MB of RAM which was one
> of the reasons I decided to splitt up the process in seperate chunks in
> the first place.
>
> ----------------
> 'Minimal'Example - START
> ----------------
>
> # the general problem
> require(XML)
>
> chunk <- function(x, chunksize){
>              # source: http://stackoverflow.com/a/3321659/1144966
>              x2 <- seq_along(x)
>              split(x, ceiling(x2/chunksize))
>          }
>
>
>
> chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)
>
> for(i in 1:1000){
>      writeLines(c(paste('<?xml version="1.0"?>\n <note>\n
> <to>Tove</to>\n    <nr>',i,'</nr>\n    <from>Jani</from>\n
> <heading>Reminder</heading>\n    ',sep=""), paste(rep('<body>Do not
> forget me this weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
>      ,paste("test",i,".xml",sep=""))
> }
>
> for(k in 1:length(chunky)){
>      gc()
>      print(chunky[[k]])
>      xmlCatcher <- NULL
>
>      for(i in 1:length(chunky[[k]])){
>          filename    <- chunky[[k]][i]
>          xml         <- xmlTreeParse(filename)
>          xml         <- xmlRoot(xml)
>          result      <- sapply(getNodeSet(xml,"//body"), xmlValue)
>          id          <- sapply(getNodeSet(xml,"//nr"), xmlValue)
>          dummy       <- cbind(id,result)
>          xmlCatcher  <- rbind(xmlCatcher,dummy)
>          }
>      save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
> }
>
> ----------------
> 'Minimal'Example - END
> ----------------
>
>
>
> Am 21.12.2012 15:14, schrieb jim holtman:
>> Can you send either your actual script or the console output so I can
>> get an idea of how fast memory is growing.  Also at the end, can you
>> list the sizes of the objects in the workspace.  Here is a function I
>> use to get the space:
>>
>> my.ls <-
>> function (pos = 1, sorted = FALSE, envir = as.environment(pos))
>> {
>>      .result <- sapply(ls(envir = envir, all.names = TRUE),
>> function(..x) object.size(eval(as.symbol(..x),
>>          envir = envir)))
>>      if (length(.result) == 0)
>>          return("No objects to list")
>>      if (sorted) {
>>          .result <- rev(sort(.result))
>>      }
>>      .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` =
>> sum(.result)))
>>      names(.ls) <- "Size"
>>      .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
>>          format = "f")
>>      .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>> function(x) class(eval(as.symbol(x),
>>          envir = envir))[1L])), "-------")
>>      .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>>          function(x) length(eval(as.symbol(x), envir = envir)))),
>>          "-------")
>>      .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
>> paste(dim(eval(as.symbol(x),
>>          envir = envir)), collapse = " x "))), "-------")
>>      .ls
>> }
>>
>>
>> which gives output like this:
>>
>>> my.ls()
>>                   Size       Class  Length     Dim
>> .Last             736    function       1
>> .my.env.jph        28 environment      39
>> x                 424     integer     100
>> y              40,024     integer   10000
>> z           4,000,024     integer 1000000
>> **Total     4,041,236     ------- ------- -------
>>
>>
>> On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
>> <peter.meissner at uni-konstanz.de> wrote:
>>> Thanks for your answer,
>>>
>>> yes, I tried 'gc()' it did not change the bahavior.
>>>
>>> best, Peter
>>>
>>>
>>> Am 21.12.2012 13:37, schrieb jim holtman:
>>>>
>>>> have you tried putting calls to 'gc' at the top of the first loop to
>>>> make sure memory is reclaimed? You can print the call to 'gc' to see
>>>> how fast it is growing.
>>>>
>>>> On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
>>>> <peter.meissner at uni-konstanz.de> wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>> I have an double loop like this:
>>>>>
>>>>>
>>>>> chunk <- list(1:10, 11:20, 21:30)
>>>>> for(k in 1:length(chunk)){
>>>>>           print(chunk[k])
>>>>>           DummyCatcher <- NULL
>>>>>           for(i in chunk[k]){
>>>>>                   print("i load something")
>>>>>                   dummy <- 1
>>>>>                   print("i do something")
>>>>>                   dummy <- dummy + 1
>>>>>                   print("i do put it together")
>>>>>                   DummyCatcher = rbind(DummyCatcher, dummy)
>>>>>           }
>>>>>           print("i save a chunk and restart with another chunk of
>>>>> data")
>>>>> }
>>>>>
>>>>> The problem now is that with each 'chunk'-cycle the memory used by R
>>>>> becomes
>>>>> bigger and bigger until it exceeds my RAM but the RAM it needs for
>>>>> any of
>>>>> the chunk-cycles alone is only a 1/5th of what I have overall.
>>>>>
>>>>> Does somebody have an idea why this behaviour might occur? Note
>>>>> that all
>>>>> the
>>>>> objects (like 'DummyCatcher') are reused every cycle so that I would
>>>>> assume
>>>>> that the RAM used should stay about the same after the first 'chunk'
>>>>> cycle.
>>>>>
>>>>>
>>>>> Best, Peter
>>>>>
>>>>>
>>>>> SystemInfo:
>>>>>
>>>>> R version 2.15.2 (2012-10-26)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>> Win7 Enterprise, 8 GB RAM
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Peter Meißner
>>> Workgroup 'Comparative Parliamentary Politics'
>>> Department of Politics and Administration
>>> University of Konstanz
>>> Box 216
>>> 78457 Konstanz
>>> Germany
>>>
>>> +49 7531 88 5665
>>> http://www.polver.uni-konstanz.de/sieberer/home/
>>
>>
>>
>

-- 
Patrick Burns
pburns at pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')