[R] data management question
Deepankar Basu
basu.15 at osu.edu
Fri May 9 06:29:52 CEST 2008
Hi all,
I have a data management question. I am using an panel dataset read into
R as a dataframe, call it "ex". The variables in "ex" are: id year x
id: a character string which identifies the unit
year: identifies the time period
x: the variable of interest (which might contain NAs).
Here is an example:
> id <- rep(c("A","B","C"),2)
> year <- c(rep(1970,3),rep(1980,3))
> x <- c(20,30,40,25,35,45)
> ex <- data.frame(id=id,year=year,x=x)
> ex
id year x
1 A 1970 20
2 B 1970 30
3 C 1970 40
4 A 1980 25
5 B 1980 35
6 C 1980 45
I want to draw a subset of "ex" by selecting only the A and B units:
> ex1 <- subset(ex[which(ex$id=="A"|ex$id=="B"),])
Now I want to do some computations on x for each unit:
> tapply(ex1$x, ex1$id, mean)
A B C
22.5 32.5 NA
But this gives me an NA value for the unit C, which I thought I had
already left out. How do I ensure that the computation (in the last
step) is limited to only the units I have selected in the first step?
Deepankar
More information about the R-help
mailing list