[R] Very slow using S4 classes
    Martin Morgan 
    mtmorgan at fhcrc.org
       
    Sat Sep 10 19:18:11 CEST 2011
    
    
  
On 09/10/2011 08:08 AM, André Rossi wrote:
> Hi everybody!
>
> I'm creating an object of a S4 class that has two slots: ListExamples, which
> is a list, and idx, which is an integer (as the code below).
>
> Then, I read a data.frame file with 10000 (ten thousands) of lines and 10
> columns, do some pre-processing and, basically, I store each line as an
> element of a list in the slot ListExamples of the S4 object. However, many
> operations after this take a considerable time.
>
> Can anyone explain me why dois it happen? Is it possible to speed up an
> script that deals with a big number of data (it might be data.frame or
> list)?
>
> Thank you,
>
> André Rossi
>
> setClass("Buffer",
>      representation=representation(
>          Listexamples = "list",
>          idx = "integer"
>      )
> )
Hi André,
Can you provide a simpler and more reproducible example, for instance
 > setClass("Buf", representation=representation(lst="list"))
[1] "Buf"
 > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE))
 > system.time({ b at lst[[1]][[1]] = 2 })
    user  system elapsed
   0.005   0.000   0.005
Generally it sounds like you're modeling the rows as elements of 
Listofelements, but you're better served by modeling the columns (lst = 
replicate(10, integer(10000)), if all of your 10 columns were 
integer-valued, for instance). Also, S4 is providing some measure of 
type safety, and you're undermining that by having your class contain a 
'list'. I'd go after
setClass("Buffer",
          representation=representation(
            col1="integer",
            col2="character",
            col3="numeric"
            ## etc.
            ),
          validity=function(object) {
              nms <- slotNames(object)
              len <- sapply(nms, function(nm) length(slot(object, nm)))
              if (1L != length(unique(len)))
                  "slots must all be of same length"
              else TRUE
          })
Buffer <-
     function(col1, col2, col3, ...)
{
     new("Buffer", col1=col1, col2=col2, col3=col3, ...)
}
Let's see where the inefficiencies are before deciding that this is an 
S4 issue.
Martin
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
    
    
More information about the R-help
mailing list