[R] data management question
Philipp Pagel
p.pagel at wzw.tum.de
Fri May 9 09:01:31 CEST 2008
> I want to draw a subset of "ex" by selecting only the A and B units:
>
> > ex1 <- subset(ex[which(ex$id=="A"|ex$id=="B"),])
or a bit simpler:
ex1 <- subset(ex, ex$id %in% c('A','B'))
In your expresion you don't need the subset function, as you are already
using indexing to extract the desired subset. Furthermore, there is no
need to use which() because R will happily use a logical vector for
indexing. Finally, I prefer the solution using %in% because it scales
nicely for longer lists where using '|' becomes cumbersome. So another
way to put it would have been:
ex1 <- ex[ex$id %in% c('A','B'), ]
> > tapply(ex1$x, ex1$id, mean)
> A B C
> 22.5 32.5 NA
>
> But this gives me an NA value for the unit C, which I thought I had
> already left out.
id is a factor and the subset extraction does not alter the set of levels
of the factor even when no actual case of a level is left:
> str(ex1)
'data.frame': 4 obs. of 3 variables:
$ id : Factor w/ 3 levels "A","B","C": 1 2 1 2
$ year: num 1970 1970 1980 1980
$ x : num 20 30 25 35
If you want to get rid of the unused levels you can "re-build" the
factor like this:
> ex1$id <- factor(ex1$id)
> str(ex1)
'data.frame': 4 obs. of 3 variables:
$ id : Factor w/ 2 levels "A","B": 1 2 1 2
$ year: num 1970 1970 1980 1980
$ x : num 20 30 25 35
> tapply(ex1$x, ex1$id, mean)
A B
22.5 32.5
cu
Philipp
--
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
and
Institut für Bioinformatik und Systembiologie / MIPS
Helmholtz Zentrum München -
Deutsches Forschungszentrum für Gesundheit und Umwelt
Ingolstädter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel
More information about the R-help
mailing list