[R] aggregate data.frame by one column

Guo Wei-Wei wwguocn at gmail.com
Fri Jun 30 04:54:47 CEST 2006


Hi, everyone,

I have a data.frame named "eva" like this:

IND PARTNO VC1 EO1 EO2 EO3 EO4 EO5
114 114001   2   5   4   4   5   4
114 114001   2   4   4   4   4   4
114 114001   2   4  NA  NA  NA  NA
112 112002   2   3   3   6   2   6
112 112002   2   1   1   3   4   4
112 112003   2   6   6   6   5   6
112 112003   2   5   7   6   6   6
112 112003   2   6   6   6   4   5
114 114004   2   2   3   3   2   4
114 114004   2   5   3   4   4   2
114 114004   2  NA  NA  NA  NA  NA
113 113005   2   5   5   6   6   5
113 113005   2   7   7   4   7   6
111 111006   2   5   7   7   7   7
112 112007   2   7   7   7   2   2
112 112007   2   6   6   6   1   2
112 112007   2   7   6   6   2   2
111 111008   2   4   1   3   1   4
111 111008   2   3   1   5   3   2

This is only a small part of the whole data. "PARTNO" is a digit variable
and I want to use it as a group variable to aggreate other variables.
What I want to get looks like this:

IND PARTNO NUM VC1 EO1 EO2 EO3 EO4 EO5
114 114001   3   2 4.3   4   4 4.5   4
112 112002   2   2   2   2 4.5   3   5
112 112003   3   2 5.7 6.3   6   5 5.7
114 114004   3   2 3.5   3 3.5   3   3
113 113005   2   2   6   6   5 6.5 5.5
111 111006   1   2   5   7   7   7   7
112 112007   3   2 6.7 6.3 6.3 1.7   2
111 111008   2   2 3.5   1   4   2   3

"NUM" is a newly added variable which indicates the case number
of each group grouped by "PARTNO".

I have two questions on this manipulation.

The first is how to get the newly added variable "NUM". I have no idea
on this question.

The second is how to average other variables by group. If there are
"NA", I want
the average operation is done on other cases. For example, the
variable "EO1" has
values of 2, 5, and "NA" on case 114004. What I have done is

> aggregate(eva[,-2], by=eva[,-2], mean)

But it seems because there are "NA"s, the "aggregate" cannot process.
Because the "NA" values are not a small part, I cannot use imputation
methods. I'm not sure whether my operation is right.

Does anyone have any suggestion on the two problems? Thanks in advance!



More information about the R-help mailing list