[R] Group averages

David Kling klingd at reed.edu
Mon Jun 12 23:19:39 CEST 2006


I hope none of you will mind helping a newbie.  I'm a student research 
assistant working with a large data set in which observations are 
categorized according to two factors. I'm trying to calculate the group 
mean and variance of a variable (called 'hsgpa' in the example data 
presented below) to each observation  , excluding that observation.  For 
example, if there are 20 observations with the same value of the two 
factors, for each of the 20 I'd like to generate the mean and variance 
of the 'hsgpa' values of the other 19 group members.  This must be done 
for every observation in the data set.

I've searched the R mail archives, read the manuals, and read 
documentation for tapply() andby() as well as summaryBy() in the 'doBy' 
package and with() from 'Hmisc.'  It may be that since I'm new to 
writing functions and R is the first language I've ever worked with I'm 
less able to come up with a solution than some other new R users.  None 
of the functions I have tried have been succesful, and it doesn't seem 
worth it to reproduce and explain my best effort.  I hope someone has 
some ideas!  Looking at what an experienced user would try should help 
me with my present task as well as future problems.

Below I've included some lines that will generate a sample data set 
similar to the one I'm working with:

#Example data:
case <- sample(seq(1,10000,1),5000,replace=FALSE)
hsgpa <- rbeta(5000,7,1.5)*4.25
yr <- sample(seq(1993,2005,1),5000,replace=TRUE)
conf <- sample(letters[1:5],5000,replace=TRUE)
data <- data.frame(case=case,hsgpa=hsgpa,yr=yr,conf=conf)
data$conf <- as.character(data$conf)
s1 <- sample(seq(1,5000,1),500,replace=FALSE)
k <- data$hsgpa
k[row.names(data) %in% s1] <- NA
data$hsgpa <- k
s2 <- sample(seq(1,5000,1),100,replace=FALSE)
k <- data$yr
k[row.names(data) %in% s2] <- NA
data$yr <- k
k <- data$conf
k[row.names(data) %in% s2] <- NA
data$conf <- k

More information about the R-help mailing list