[R] Memory filling up while looping
Peter Meißner
peter.meissner at uni-konstanz.de
Fri Dec 21 16:27:41 CET 2012
Here is an working example that reproduces the behavior by creating 1000
xml-files and afterwards parsing them.
At my PC, R starts with about 90MB of RAM with every cycle another
10-12MB are further added to the RAM-usage so I end up with 200MB RAM
usage.
In the real code one chunk-cycle eats about 800MB of RAM which was one
of the reasons I decided to splitt up the process in seperate chunks in
the first place.
----------------
'Minimal'Example - START
----------------
# the general problem
require(XML)
chunk <- function(x, chunksize){
# source: http://stackoverflow.com/a/3321659/1144966
x2 <- seq_along(x)
split(x, ceiling(x2/chunksize))
}
chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)
for(i in 1:1000){
writeLines(c(paste('<?xml version="1.0"?>\n <note>\n
<to>Tove</to>\n <nr>',i,'</nr>\n <from>Jani</from>\n
<heading>Reminder</heading>\n ',sep=""), paste(rep('<body>Do not
forget me this weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
,paste("test",i,".xml",sep=""))
}
for(k in 1:length(chunky)){
gc()
print(chunky[[k]])
xmlCatcher <- NULL
for(i in 1:length(chunky[[k]])){
filename <- chunky[[k]][i]
xml <- xmlTreeParse(filename)
xml <- xmlRoot(xml)
result <- sapply(getNodeSet(xml,"//body"), xmlValue)
id <- sapply(getNodeSet(xml,"//nr"), xmlValue)
dummy <- cbind(id,result)
xmlCatcher <- rbind(xmlCatcher,dummy)
}
save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
}
----------------
'Minimal'Example - END
----------------
Am 21.12.2012 15:14, schrieb jim holtman:
> Can you send either your actual script or the console output so I can
> get an idea of how fast memory is growing. Also at the end, can you
> list the sizes of the objects in the workspace. Here is a function I
> use to get the space:
>
> my.ls <-
> function (pos = 1, sorted = FALSE, envir = as.environment(pos))
> {
> .result <- sapply(ls(envir = envir, all.names = TRUE),
> function(..x) object.size(eval(as.symbol(..x),
> envir = envir)))
> if (length(.result) == 0)
> return("No objects to list")
> if (sorted) {
> .result <- rev(sort(.result))
> }
> .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` = sum(.result)))
> names(.ls) <- "Size"
> .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
> format = "f")
> .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
> function(x) class(eval(as.symbol(x),
> envir = envir))[1L])), "-------")
> .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
> function(x) length(eval(as.symbol(x), envir = envir)))),
> "-------")
> .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
> paste(dim(eval(as.symbol(x),
> envir = envir)), collapse = " x "))), "-------")
> .ls
> }
>
>
> which gives output like this:
>
>> my.ls()
> Size Class Length Dim
> .Last 736 function 1
> .my.env.jph 28 environment 39
> x 424 integer 100
> y 40,024 integer 10000
> z 4,000,024 integer 1000000
> **Total 4,041,236 ------- ------- -------
>
>
> On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
> <peter.meissner at uni-konstanz.de> wrote:
>> Thanks for your answer,
>>
>> yes, I tried 'gc()' it did not change the bahavior.
>>
>> best, Peter
>>
>>
>> Am 21.12.2012 13:37, schrieb jim holtman:
>>>
>>> have you tried putting calls to 'gc' at the top of the first loop to
>>> make sure memory is reclaimed? You can print the call to 'gc' to see
>>> how fast it is growing.
>>>
>>> On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
>>> <peter.meissner at uni-konstanz.de> wrote:
>>>>
>>>> Hey,
>>>>
>>>> I have an double loop like this:
>>>>
>>>>
>>>> chunk <- list(1:10, 11:20, 21:30)
>>>> for(k in 1:length(chunk)){
>>>> print(chunk[k])
>>>> DummyCatcher <- NULL
>>>> for(i in chunk[k]){
>>>> print("i load something")
>>>> dummy <- 1
>>>> print("i do something")
>>>> dummy <- dummy + 1
>>>> print("i do put it together")
>>>> DummyCatcher = rbind(DummyCatcher, dummy)
>>>> }
>>>> print("i save a chunk and restart with another chunk of data")
>>>> }
>>>>
>>>> The problem now is that with each 'chunk'-cycle the memory used by R
>>>> becomes
>>>> bigger and bigger until it exceeds my RAM but the RAM it needs for any of
>>>> the chunk-cycles alone is only a 1/5th of what I have overall.
>>>>
>>>> Does somebody have an idea why this behaviour might occur? Note that all
>>>> the
>>>> objects (like 'DummyCatcher') are reused every cycle so that I would
>>>> assume
>>>> that the RAM used should stay about the same after the first 'chunk'
>>>> cycle.
>>>>
>>>>
>>>> Best, Peter
>>>>
>>>>
>>>> SystemInfo:
>>>>
>>>> R version 2.15.2 (2012-10-26)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>> Win7 Enterprise, 8 GB RAM
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>>
>>
>> --
>> Peter Meißner
>> Workgroup 'Comparative Parliamentary Politics'
>> Department of Politics and Administration
>> University of Konstanz
>> Box 216
>> 78457 Konstanz
>> Germany
>>
>> +49 7531 88 5665
>> http://www.polver.uni-konstanz.de/sieberer/home/
>
>
>
--
Peter Meißner
Workgroup 'Comparative Parliamentary Politics'
Department of Politics and Administration
University of Konstanz
Box 216
78457 Konstanz
Germany
+49 7531 88 5665
http://www.polver.uni-konstanz.de/sieberer/home/
More information about the R-help
mailing list