[R] Memory filling up while looping
Peter Meißner
peter.meissner at uni-konstanz.de
Fri Dec 21 18:41:22 CET 2012
Yeah, thanks,
I know: !DO NOT USE RBIND! !
But it does not help, although using a predefined list to store results
as suggested there, it does not help.
The problems seems to stem from the XML-package and not from the way I
store the data until saved.
Best, Peter
Am 21.12.2012 18:33, schrieb Patrick Burns:
> Circle 2 of 'The R Inferno' may help you.
>
> http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
>
> In particular, it has an example of how to do what
> Duncan suggested.
>
> Pat
>
>
> On 21/12/2012 15:27, Peter Meißner wrote:
>> Here is an working example that reproduces the behavior by creating 1000
>> xml-files and afterwards parsing them.
>>
>> At my PC, R starts with about 90MB of RAM with every cycle another
>> 10-12MB are further added to the RAM-usage so I end up with 200MB RAM
>> usage.
>>
>> In the real code one chunk-cycle eats about 800MB of RAM which was one
>> of the reasons I decided to splitt up the process in seperate chunks in
>> the first place.
>>
>> ----------------
>> 'Minimal'Example - START
>> ----------------
>>
>> # the general problem
>> require(XML)
>>
>> chunk <- function(x, chunksize){
>> # source: http://stackoverflow.com/a/3321659/1144966
>> x2 <- seq_along(x)
>> split(x, ceiling(x2/chunksize))
>> }
>>
>>
>>
>> chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)
>>
>> for(i in 1:1000){
>> writeLines(c(paste('<?xml version="1.0"?>\n <note>\n
>> <to>Tove</to>\n <nr>',i,'</nr>\n <from>Jani</from>\n
>> <heading>Reminder</heading>\n ',sep=""), paste(rep('<body>Do not
>> forget me this weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
>> ,paste("test",i,".xml",sep=""))
>> }
>>
>> for(k in 1:length(chunky)){
>> gc()
>> print(chunky[[k]])
>> xmlCatcher <- NULL
>>
>> for(i in 1:length(chunky[[k]])){
>> filename <- chunky[[k]][i]
>> xml <- xmlTreeParse(filename)
>> xml <- xmlRoot(xml)
>> result <- sapply(getNodeSet(xml,"//body"), xmlValue)
>> id <- sapply(getNodeSet(xml,"//nr"), xmlValue)
>> dummy <- cbind(id,result)
>> xmlCatcher <- rbind(xmlCatcher,dummy)
>> }
>> save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
>> }
>>
>> ----------------
>> 'Minimal'Example - END
>> ----------------
>>
>>
>>
>> Am 21.12.2012 15:14, schrieb jim holtman:
>>> Can you send either your actual script or the console output so I can
>>> get an idea of how fast memory is growing. Also at the end, can you
>>> list the sizes of the objects in the workspace. Here is a function I
>>> use to get the space:
>>>
>>> my.ls <-
>>> function (pos = 1, sorted = FALSE, envir = as.environment(pos))
>>> {
>>> .result <- sapply(ls(envir = envir, all.names = TRUE),
>>> function(..x) object.size(eval(as.symbol(..x),
>>> envir = envir)))
>>> if (length(.result) == 0)
>>> return("No objects to list")
>>> if (sorted) {
>>> .result <- rev(sort(.result))
>>> }
>>> .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` =
>>> sum(.result)))
>>> names(.ls) <- "Size"
>>> .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
>>> format = "f")
>>> .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>>> function(x) class(eval(as.symbol(x),
>>> envir = envir))[1L])), "-------")
>>> .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
>>> function(x) length(eval(as.symbol(x), envir = envir)))),
>>> "-------")
>>> .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
>>> paste(dim(eval(as.symbol(x),
>>> envir = envir)), collapse = " x "))), "-------")
>>> .ls
>>> }
>>>
>>>
>>> which gives output like this:
>>>
>>>> my.ls()
>>> Size Class Length Dim
>>> .Last 736 function 1
>>> .my.env.jph 28 environment 39
>>> x 424 integer 100
>>> y 40,024 integer 10000
>>> z 4,000,024 integer 1000000
>>> **Total 4,041,236 ------- ------- -------
>>>
>>>
>>> On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
>>> <peter.meissner at uni-konstanz.de> wrote:
>>>> Thanks for your answer,
>>>>
>>>> yes, I tried 'gc()' it did not change the bahavior.
>>>>
>>>> best, Peter
>>>>
>>>>
>>>> Am 21.12.2012 13:37, schrieb jim holtman:
>>>>>
>>>>> have you tried putting calls to 'gc' at the top of the first loop to
>>>>> make sure memory is reclaimed? You can print the call to 'gc' to see
>>>>> how fast it is growing.
>>>>>
>>>>> On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
>>>>> <peter.meissner at uni-konstanz.de> wrote:
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> I have an double loop like this:
>>>>>>
>>>>>>
>>>>>> chunk <- list(1:10, 11:20, 21:30)
>>>>>> for(k in 1:length(chunk)){
>>>>>> print(chunk[k])
>>>>>> DummyCatcher <- NULL
>>>>>> for(i in chunk[k]){
>>>>>> print("i load something")
>>>>>> dummy <- 1
>>>>>> print("i do something")
>>>>>> dummy <- dummy + 1
>>>>>> print("i do put it together")
>>>>>> DummyCatcher = rbind(DummyCatcher, dummy)
>>>>>> }
>>>>>> print("i save a chunk and restart with another chunk of
>>>>>> data")
>>>>>> }
>>>>>>
>>>>>> The problem now is that with each 'chunk'-cycle the memory used by R
>>>>>> becomes
>>>>>> bigger and bigger until it exceeds my RAM but the RAM it needs for
>>>>>> any of
>>>>>> the chunk-cycles alone is only a 1/5th of what I have overall.
>>>>>>
>>>>>> Does somebody have an idea why this behaviour might occur? Note
>>>>>> that all
>>>>>> the
>>>>>> objects (like 'DummyCatcher') are reused every cycle so that I would
>>>>>> assume
>>>>>> that the RAM used should stay about the same after the first 'chunk'
>>>>>> cycle.
>>>>>>
>>>>>>
>>>>>> Best, Peter
>>>>>>
>>>>>>
>>>>>> SystemInfo:
>>>>>>
>>>>>> R version 2.15.2 (2012-10-26)
>>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>> Win7 Enterprise, 8 GB RAM
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Peter Meißner
>>>> Workgroup 'Comparative Parliamentary Politics'
>>>> Department of Politics and Administration
>>>> University of Konstanz
>>>> Box 216
>>>> 78457 Konstanz
>>>> Germany
>>>>
>>>> +49 7531 88 5665
>>>> http://www.polver.uni-konstanz.de/sieberer/home/
>>>
>>>
>>>
>>
>
--
Peter Meißner
Workgroup 'Comparative Parliamentary Politics'
Department of Politics and Administration
University of Konstanz
Box 216
78457 Konstanz
Germany
+49 7531 88 5665
http://www.polver.uni-konstanz.de/sieberer/home/
More information about the R-help
mailing list