[R] Memory filling up while looping
Patrick Burns
pburns at pburns.seanet.com
Fri Dec 21 18:33:09 CET 2012
Circle 2 of 'The R Inferno' may help you.
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
In particular, it has an example of how to do what
Duncan suggested.
Pat
On 21/12/2012 15:27, Peter Meißner wrote:
> Here is an working example that reproduces the behavior by creating 1000
> xml-files and afterwards parsing them.
>
> At my PC, R starts with about 90MB of RAM with every cycle another
> 10-12MB are further added to the RAM-usage so I end up with 200MB RAM
> usage.
>
> In the real code one chunk-cycle eats about 800MB of RAM which was one
> of the reasons I decided to splitt up the process in seperate chunks in
> the first place.
>
> ----------------
> 'Minimal'Example - START
> ----------------
>
> # the general problem
> require(XML)
>
> chunk <- function(x, chunksize){
> # source: http://stackoverflow.com/a/3321659/1144966
> x2 <- seq_along(x)
> split(x, ceiling(x2/chunksize))
> }
>
>
>
> chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)
>
> for(i in 1:1000){
> writeLines(c(paste('<?xml version="1.0"?>\n <note>\n
> <to>Tove</to>\n <nr>',i,'</nr>\n <from>Jani</from>\n
> <heading>Reminder</heading>\n ',sep=""), paste(rep('<body>Do not
> forget me this weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
> ,paste("test",i,".xml",sep=""))
> }
>
> for(k in 1:length(chunky)){
> gc()
> print(chunky[[k]])
> xmlCatcher <- NULL
>
> for(i in 1:length(chunky[[k]])){
> filename <- chunky[[k]][i]
> xml <- xmlTreeParse(filename)
> xml <- xmlRoot(xml)
> result <- sapply(getNodeSet(xml,"//body"), xmlValue)
> id <- sapply(getNodeSet(xml,"//nr"), xmlValue)
> dummy <- cbind(id,result)
> xmlCatcher <- rbind(xmlCatcher,dummy)
> }
> save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
> }
>
> ----------------
> 'Minimal'Example - END
> ----------------
>
>
>
> Am 21.12.2012 15:14, schrieb jim holtman:
>> Can you send either your actual script or the console output so I can
>> get an idea of how fast memory is growing. Also at the end, can you
>> list the sizes of the objects in the workspace. Here is a function I
>> use to get the space:
>>
>> my.ls <-
>> function (pos = 1, sorted = FALSE, envir = as.environment(pos))
>> {
>> .result <- sapply(ls(envir = envir, all.names = TRUE),
>> function(..x) object.size(eval(as.symbol(..x),
>> envir = envir)))
>> if (length(.result) == 0)
>> return("No objects to list")
>> if (sorted) {
>> .result <- rev(sort(.result))
>> }
>> .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` =
>> sum(.result)))
>> names(.ls) <- "Size"
>> .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
>> format = "f")
>> .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>> function(x) class(eval(as.symbol(x),
>> envir = envir))[1L])), "-------")
>> .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>> function(x) length(eval(as.symbol(x), envir = envir)))),
>> "-------")
>> .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
>> paste(dim(eval(as.symbol(x),
>> envir = envir)), collapse = " x "))), "-------")
>> .ls
>> }
>>
>>
>> which gives output like this:
>>
>>> my.ls()
>> Size Class Length Dim
>> .Last 736 function 1
>> .my.env.jph 28 environment 39
>> x 424 integer 100
>> y 40,024 integer 10000
>> z 4,000,024 integer 1000000
>> **Total 4,041,236 ------- ------- -------
>>
>>
>> On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
>> <peter.meissner at uni-konstanz.de> wrote:
>>> Thanks for your answer,
>>>
>>> yes, I tried 'gc()' it did not change the bahavior.
>>>
>>> best, Peter
>>>
>>>
>>> Am 21.12.2012 13:37, schrieb jim holtman:
>>>>
>>>> have you tried putting calls to 'gc' at the top of the first loop to
>>>> make sure memory is reclaimed? You can print the call to 'gc' to see
>>>> how fast it is growing.
>>>>
>>>> On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
>>>> <peter.meissner at uni-konstanz.de> wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>> I have an double loop like this:
>>>>>
>>>>>
>>>>> chunk <- list(1:10, 11:20, 21:30)
>>>>> for(k in 1:length(chunk)){
>>>>> print(chunk[k])
>>>>> DummyCatcher <- NULL
>>>>> for(i in chunk[k]){
>>>>> print("i load something")
>>>>> dummy <- 1
>>>>> print("i do something")
>>>>> dummy <- dummy + 1
>>>>> print("i do put it together")
>>>>> DummyCatcher = rbind(DummyCatcher, dummy)
>>>>> }
>>>>> print("i save a chunk and restart with another chunk of
>>>>> data")
>>>>> }
>>>>>
>>>>> The problem now is that with each 'chunk'-cycle the memory used by R
>>>>> becomes
>>>>> bigger and bigger until it exceeds my RAM but the RAM it needs for
>>>>> any of
>>>>> the chunk-cycles alone is only a 1/5th of what I have overall.
>>>>>
>>>>> Does somebody have an idea why this behaviour might occur? Note
>>>>> that all
>>>>> the
>>>>> objects (like 'DummyCatcher') are reused every cycle so that I would
>>>>> assume
>>>>> that the RAM used should stay about the same after the first 'chunk'
>>>>> cycle.
>>>>>
>>>>>
>>>>> Best, Peter
>>>>>
>>>>>
>>>>> SystemInfo:
>>>>>
>>>>> R version 2.15.2 (2012-10-26)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>> Win7 Enterprise, 8 GB RAM
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Peter Meißner
>>> Workgroup 'Comparative Parliamentary Politics'
>>> Department of Politics and Administration
>>> University of Konstanz
>>> Box 216
>>> 78457 Konstanz
>>> Germany
>>>
>>> +49 7531 88 5665
>>> http://www.polver.uni-konstanz.de/sieberer/home/
>>
>>
>>
>
--
Patrick Burns
pburns at pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')
More information about the R-help
mailing list