[R] Tapply.
Petr PIKAL
petr.pikal at precheza.cz
Tue Apr 27 11:40:19 CEST 2010
Hi
r-help-bounces at r-project.org napsal dne 26.04.2010 17:05:54:
> I guess my problem was seeing a bunch of examples where they pulled a
> variable from a dataframe..
>
> tapply(df$data, index=list(..
df$data results in vector so as eg. df[,5] unless you use drop=FALSE
option
>
> and I
> assumed that the df$data was just generalizable to a collection of
vectors
> a vector of vector being a vector
df[,1:15] is not a vector of vectors. R sometimes can give you nasty
surprise with object types and modes but changing a type of object merely
by selecting some part of it wold be quite problematic.
see
str(df$data)
str(df[, 1])
str(df[,1, drop=FALSE])
str(df[,1:15])
Regards
Petr
>
> Thanks.
>
> On Mon, Apr 26, 2010 at 2:43 AM, Petr PIKAL <petr.pikal at precheza.cz>
wrote:
>
> > Hi
> >
> >
> > steven mosher <moshersteven at gmail.com> napsal dne 26.04.2010 10:21:37:
> >
> > > That fails:
> > >
> > > The manual says:
> > >
> > > tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
> >
> > > Arguments
> > >
> > > X
> > >
> > > an atomic object, typically a vector.
> > >
> > > INDEX
> > >
> > > list of factors, each of same length as X. The elements are coerced
to
> > factors by
> > > as.factor.
> > >
> > > my error says:
> >
> > >
> > > Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) :
> > >
> > > arguments must have same length
> > >
> > > The issue that I have is I dont understand what the requirements for
the
> > list of factors
> > > are. In my example DF$Years is a sequence of
> > years..1979,1980,1982,1983, 1987..
> > > like that with missing years: so when the manual say: list of
factors
> > each the same
> > > length as X? what does that mean? I could have a DF with 20 rows and
> > only two
> > > different years. or 20 rows and 20 different years.
> > >
> > > Suppose:
> > >
> > > a<- c(1,2,3,4)
> > > > b<-c(2,3,4,5)
> > > > df=data.frame(a,b)
> > > > length(df)
> >
> > data frame is not vector nor atomic but list hence length(df) gives
you
> > number of columns. It is similar to length of a list
> >
> > > lll<-list(a=1, b=2, c=3)
> > > length(lll)
> > [1] 3
> > >
> >
> > If you accept that the first argument of tapply has to be vector you
can
> > not put data frame there.
> >
> > Next second argument has to be list of factors so you can put there
> > several factors, each of the same length as first argument (a vector).
> >
> > If you want to perform aggregating operation on whole data frame you
shall
> > consider
> >
> > ?by or ?aggregate
> >
> > Other options are plyr or doBy packages.
> >
> > Syntax for aggregate is quite similar to tapply, only first argument
can
> > be data frame.
> >
> > Regards
> > Petr
> >
> >
> > >
> > > The length of DF is 2.
> > > Does that mean the "list of factors, each of same length as X."
would
> > have to be
> > > 2? that doesnt seem to make sense.
> > >
> > >
> > >
> > > On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL
<petr.pikal at precheza.cz>
> > wrote:
> > > Hi
> > >
> > > r-help-bounces at r-project.org napsal dne 26.04.2010 06:52:55:
> > >
> > > > Having some difficulties with understanding how tapply works and
> > getting
> > > > return values I expect
> > > >
> > > > Data: dataframe. DF DF$Id $D $Year.......
> > > >
> > > > Id D Year Jan Feb Mar Apr May Jun Jul
Aug
> > Sep
> > > Oct
> > > > Nov Dec
> > > > 11264402000 1 1980 NA NA NA NA NA 212 203 209 228
237
> > NA
> > > NA
> > > > 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA
> > 231
> > > NA
> > > > 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA
> > NA
> > > 245
> > > > 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA
> > NA
> > > NA
> > > > 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA
> > NA
> > > NA
> > > > 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA
> > NA
> > > NA
> > > > 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA
225
> > NA
> > > NA
> > > > 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA
> > NA
> > > NA
> > > > 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235
243
> > 240
> > > NA
> > > > 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235
243
> > 240
> > > NA
> > > > 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235
243
> > 240
> > > NA
> > > > 11264402000 0 1988 238 246 249 NA 244 213 212 224 232
238
> > 232
> > > 230
> > > > 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA
> > NA
> > > 230
> > > > 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA
> > NA
> > > 230
> > > > 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA
> > NA
> > > 238
> > > > 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA
> > NA
> > > 238
> > > > 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA
> > NA
> > > 238
> > > >
> > > > and the result should be a dataframe of column means by year with
the
> > > > variable D dropped (or kept doesnt matter)
> > > >
> > > > 11264402000 1 1980 NA NA NA NA NA 212 203 209 228
237
> > NA
> > > NA
> > > > 11264402000 .5 1981 NA NA 243 244 NA NA NA NA 225
NA
> > 231
> > > NA
> > > > 11264402000 .5 1982 236 237 242 240 242 205 199 NA NA
NA
> > NA
> > > NA
> > > > 11264402000 .5 1983 NA 247 NA NA NA NA NA 205 NA
225
> > NA
> > > > NA
> > > > 11264402000 1 1986 NA NA NA 240 NA NA NA 213 NA NA
> > NA
> > > NA
> > > > 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235
243
> > 240
> > > NA
> > > > 11264402000 1.33 1988 238 246 249 246 244 213 212 224 232
238
> > > 232
> > > > 230
> > > > 11264402000 1.33 1989 232 233 238 239 231 NA 215 NA NA
NA
> > > NA
> > > > 238
> > > >
> > > > It would seem that Tapply should work
> > > > result<-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)
> >
> > > Why colMeans? It is function used instead of apply(...,.. ,mean).
> > >
> > > Maybe you want
> > >
> > > result<-tapply( DF[,1:15], DF$Year, mean,na.rm=T)
> > >
> > > Regards
> > > Petr
> > >
> > > >
> > > > but i get errors about the length of arguments, which
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list