[R] rbind and data.frame
Göran Broström
gb at stat.umu.se
Fri Dec 7 17:30:35 CET 2001
On Fri, 7 Dec 2001, Liaw, Andy wrote:
> Are you sure that the time difference is *only* in creating the data frame,
> rather than other computations in the loop?
Of course it depends on all the calculations. And that is a lot (of code).
Here it is. Suggestions of improvements are most welcome!
Göran
-----------------------------------------------------------------------
[...]
## We now have 'nn.out'. We next create an empty data frame 'dat.out':
xx <- cbind(dat[1, , drop = FALSE], com.dat[1, , drop = FALSE])
dat.out <- matrix(NA, ncol = ncol(xx), nrow = nn.out)
dat.out <- data.frame(dat.out)
names(dat.out) <- names(xx)
dat.out <- rbind(xx, dat.out)[-1, ]
## And so we fill it!
cur.row <- 0
for (j in 1:nn){
start.ind <- dat$bdate[j] + dat$enter[j]
stopp.ind <- dat$bdate[j] + dat$exit[j]
if ((start.ind < end.per) &&
(stopp.ind > beg.per)){ ## We have a case!
fixed.rec <- dat[j, , drop = FALSE]
out.rec <- fixed.rec
if (start.ind < beg.per){ ## start.ind < beg.per (A)
if (stopp.ind > end.per){ ## stopp.ind > end.per (A1)
##nn.out <- nn.out + n.years
out.rec$event <- 0
for (iv in 1:n.years){
cur.row <- cur.row + 1
out.rec$enter <- cuts[iv] - fixed.rec$bdate
out.rec$exit <- cuts[iv + 1] - fixed.rec$bdate
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[iv, , drop = FALSE])
}
}else{ ## stopp.ind <= end.per (A2)
last.iv <- 1
while ((last.iv <= n.years) &&
(stopp.ind > cuts[last.iv + 1])){
last.iv <- last.iv + 1
}
##nn.out <- nn.out + last.iv
if (last.iv == 1){
cur.row <- cur.row + 1
out.rec$enter <- beg.per - fixed.rec$bdate
out.rec$exit <- fixed.rec$exit
out.rec$event <- fixed.rec$event
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[1, , drop = FALSE])
}else{
out.rec$event <- 0
for (iv in 1:(last.iv - 1)){
cur.row <- cur.row + 1
out.rec$enter <- cuts[iv] - fixed.rec$bdate
out.rec$exit <- cuts[iv + 1] - fixed.rec$bdate
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[iv, , drop = FALSE])
}
cur.row <- cur.row + 1
out.rec$event <- fixed.rec$event
out.rec$enter <- cuts[last.iv] - fixed.rec$bdate
out.rec$exit <- fixed.rec$exit
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[last.iv, , drop = FALSE])
}
}
}else{ ## start.ind >= beg.per (B)
first.iv <- 1
while ((first.iv <= n.years) &&
(start.ind >= cuts[first.iv + 1])){
first.iv <- first.iv + 1
}
if (stopp.ind > end.per){ ## stopp.ind > end.per (B1)
##nn.out <- nn.out + n.years - first.iv + 1
cur.row <- cur.row + 1
out.rec$event <- 0
out.rec$enter <- fixed.rec$enter
out.rec$exit <- cuts[first.iv + 1] - fixed.rec$bdate
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[first.iv, , drop = FALSE])
if (first.iv < n.years){
for (iv in (first.iv + 1):n.years){
cur.row <- cur.row + 1
out.rec$enter <- cuts[iv] - fixed.rec$bdate
out.rec$exit <- cuts[iv + 1] - fixed.rec$bdate
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[iv, , drop = FALSE])
}
}
}else{ ## stopp.ind <= end.per (B2)
last.iv <- first.iv
while ((last.iv <= n.years) &&
(stopp.ind > cuts[last.iv + 1])){
last.iv <- last.iv + 1
}
##nn.out <- nn.out + last.iv - first.iv + 1
if (last.iv == first.iv){
cur.row <- cur.row + 1
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[first.iv, , drop = FALSE])
}else{
cur.row <- cur.row + 1
out.rec$event <- 0
out.rec$exit <- cuts[first.iv + 1] - fixed.rec$bdate
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[first.iv, , drop = FALSE])
if (last.iv > (first.iv + 1)){
for (iv in (first.iv + 1):(last.iv - 1)){
cur.row <- cur.row + 1
out.rec$enter <- cuts[iv] - fixed.rec$bdate
out.rec$exit <- cuts[iv + 1] - fixed.rec$bdate
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[iv, , drop = FALSE])
}
}
cur.row <- cur.row + 1
out.rec$event <- fixed.rec$event
out.rec$enter <- cuts[last.iv] - fixed.rec$bdate
out.rec$exit <- fixed.rec$exit
dat.out[cur.row, ] <-
cbind(out.rec, com.dat[last.iv, , drop = FALSE])
}
}
}
}
cat("j = ", j, "cur.row = ", cur.row, "\n")
}
dat.out
}
-------------------------------------------------------------------------
>
> Andy
>
> > -----Original Message-----
> > From: Göran Broström [mailto:gb at stat.umu.se]
> > Sent: Friday, December 07, 2001 7:25 AM
> > To: Prof Brian Ripley
> > Cc: r-help at stat.math.ethz.ch
> > Subject: Re: [R] rbind and data.frame
> >
> >
> > On Fri, 7 Dec 2001, Prof Brian Ripley wrote:
> >
> > > On Fri, 7 Dec 2001, [iso-8859-1] Göran Broström wrote:
> > >
> > > > On Wed, 5 Dec 2001, Göran Broström wrote:
> > > >
> > > > [...]
> > > >
> > > > > My real problem is how to create a data frame in a
> > sequentially growing
> > > > > manner, when I know the final size (no of cases). I
> > want to avoid to
> > > > > call 'rbind' many times, and instead create an 'empty'
> > data frame in
> > > > > one call, and then fill it. Are there better ways of doing this?
> > > >
> > > > Got no answer to this one, so I provide one myself:
> > >
> > > The usual answer is to create a data frame of the desired size and
> > > populate it via indexing. That's in some books I know!
> >
> > I know that book too (thanks!). I did what you suggest, and
> > that took 7
> > hours to run. Definitely.
> >
> > Göran
> >
> > > >
> > > > The answer is: Yes, definitely. I did this, with pure R
> > code, and
> > > > created a new data frame with around 58000 records. It
> > took 7 hours to
> > > > run. I then did it with compiled code (Fortran), and that
> > made a slight
> > > > difference: It took 4.8 seconds(!).
> > > >
> > > > Göran
> > > >
> > > >
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > -.-.-.-.-.-.-.-.-
> > > > r-help mailing list -- Read
> > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > > Send "info", "help", or "[un]subscribe"
> > > > (in the "body", not the subject !) To:
> > r-help-request at stat.math.ethz.ch
> > > >
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> > _._._._._._._._._
> > > >
> > >
> > >
> >
> > --
> > Göran Broström tel: +46 90 786 5223
> > professor fax: +46 90 786 6614
> > Department of Statistics http://www.stat.umu.se/egna/gb/
> > Umeå University
> > SE-90187 Umeå, Sweden e-mail: gb at stat.umu.se
> >
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > -.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read
> > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > Send "info", "help", or "[un]subscribe"
> > (in the "body", not the subject !) To:
> > r-help-request at stat.math.ethz.ch
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> > _._._._._._._._._
> >
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
--
Göran Broström tel: +46 90 786 5223
professor fax: +46 90 786 6614
Department of Statistics http://www.stat.umu.se/egna/gb/
Umeå University
SE-90187 Umeå, Sweden e-mail: gb at stat.umu.se
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list