Wed Mar 28 19:43:28 CEST 2001

Dear R listers --

The program below does the following tasks:

1.  It creates a file (wintemp4) that is a subset of alldata4 consisting of
"winner" records in 50 industry groups (about 5400 obs);

2.  It defines a function (myppr1) that runs the ppr function in modreg
once to generate goodness of fit (sum of squared errors) measures by number
of terms included in model and then reruns ppr using the number of terms
with the lowest sum of squared errors.

3.  It grinds through a loop, subsetting wintemp4 by group and running
myppr1 for each
group subset; and

4.  It puts the ppr output into a separate vector element for each group
(in an attempt to avoid "growing" the vector).

I am using R version 1.2.2 in Emacs/ESS on Win98 with 256mb RAM.

I have two questions; I would be most grateful for any help the list can

A.  This program *seems* to take a long time.  I have been careful to free
as much memory as I can, and the gc()'s seem to help avoid using the
swapfile and to keep available system resources above 90%.  Is there
anything else I can do to make the program more efficient?

B.  I say "seems" because after running the program for an hour, I type
ctl-G to quit.  The *R* session seemed to be terminated, with about 40 or
so groups processed, so I opened up another R session to try to see what
had happened.  After I quit the second session, suddenly the first session
seemed to come back to life and spit out the printed output for the rest of
the groups!  So I wonder if there is something I need to add to my program
to "force" it to finish processing?  (I apologize for the inarticulate way
I am posing this question!)

Thanks in advance.

#Here is the program
for(i in 1:4) gc()
assign("wintemp4", subset(alldata4, 1 <= group & group <= 50 & winner==1))
for(i in 1:4) gc()

myppr1 <- function(x)
#run pprfile once to get list of sum of squared errors corresponding to differen numbers of terms
      pprfile.ppr <- ppr(
               data=x, nterms=1, max.terms= min(nrow(x),40), optlevel=3
#pick number of terms giving best fit
         numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn>0])
         pprfile.ppr <- ppr(
               data=x, nterms=numterm, max.terms= min(nrow(x),40), optlevel=3
      cat("group =", x$group[1],"\n")
      cat("NAIC =", x$naic4[1],"\n")
      cat("cendiv =", as.character(x$cendiv[1]),"\n")
      cat("number of obs used =", nrow(x),"\n")

grouparr <- levels(as.factor(wintemp4$group))
pprest <- vector(mode="list",length=length(grouparr))

for(i in seq(along=grouparr))
    subi <- subset(wintemp4,wintemp4$group==grouparr[i])
    if(nrow(subi) > 40) pprest[i][[1]] <- myppr1(subi)


