[R] Tapply.
Petr PIKAL
petr.pikal at precheza.cz
Tue Apr 27 14:21:02 CEST 2010
Hi
If you are not satisfied with R intro docs which are distributed with R
installation you can consider Introductory statistics with R by P.Dalgaard
for beginners and mayby Modern applied statistics with S by W.N.Venables
and B.D.Ripley which is a bit outdated and applies maybe a little more to
S but still worth reading.
Regards
Petr
r-help-bounces at r-project.org napsal dne 27.04.2010 10:05:25:
> Thanks dennis.
>
> Is there a book on R u could recommend.
>
>
>
> On Mon, Apr 26, 2010 at 7:12 PM, Dennis Murphy <djmuser at gmail.com>
wrote:
>
> > Hi:
> >
> >
> > > On Mon, Apr 26, 2010 at 8:01 AM, steven mosher
<moshersteven at gmail.com>wrote:
> > > Thanks,
> >
> > > I was trying to stick with the base package and figure out how the
base
> > routines worked.
> >
> > If you want to use base functions, then here's a solution with
aggregate:
> > (the Id column
> > was removed first):
> >
> > > with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm =
> > TRUE))
> > Year D Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> > 1 1980 1.000000 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
> > 2 1981 0.500000 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
> > 3 1982 0.500000 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
> > 4 1983 0.500000 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
> > 5 1986 0.000000 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
> > 6 1987 1.333333 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
> > 7 1988 1.333333 238 246 249 246 244 213 212 224 232 238 232 230
> > 8 1989 1.333333 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238
> >
> > The problem with tapply() is that the function has to be called
recursively
> > on each
> > column you want to summarize. You could do it in a loop:
> > > res <- matrix(NA, 8, 14)
> > > res[, 1] <- unique(DF$Year)
> > > res[, 2] <- with(DF, tapply(D, Year, mean, na.rm = TRUE))
> > > for(j in 3:14) res[, j] <- tapply(DF[, j], DF$Year, mean, na.rm =
TRUE)
> > > res
> > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[,12]
> > [,13]
> > [1,] 1980 1.000000 NaN NaN NaN NaN NaN 212 203 209 228 237
> > NaN
> > [2,] 1981 0.500000 NaN 251 243 246 241 NaN NaN NaN 230 NaN
> > 231
> > [3,] 1982 0.500000 236 237 242 240 242 205 199 NaN NaN NaN
> > NaN
> > [4,] 1983 0.500000 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225
> > NaN
> > [5,] 1986 0.000000 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN
> > NaN
> > [6,] 1987 1.333333 241 NaN NaN NaN NaN 218 NaN NaN 235 243
> > 240
> > [7,] 1988 1.333333 238 246 249 246 244 213 212 224 232 238
> > 232
> > [8,] 1989 1.333333 232 233 238 239 231 NaN 215 NaN NaN NaN
> > NaN
> > [,14]
> > [1,] NaN
> > [2,] 245
> > [3,] NaN
> > [4,] NaN
> > [5,] NaN
> > [6,] NaN
> > [7,] 230
> > [8,] 238
> >
> > but it's not the most efficient way to do things.
> >
> > Essentially, this approach conforms to the 'split-apply-combine'
strategy
> > which is
> > more efficiently implemented in functions like aggregate() or in
packages
> > such
> > as doBy, plyr, reshape and data.table, some of which were mentioned
earlier
> > by
> > Petr Pikal.
> >
> > HTH,
> > Dennis
> >
> >
> > On Mon, Apr 26, 2010 at 8:01 AM, steven mosher
<moshersteven at gmail.com>wrote:
> >
> >> Thanks,
> >>
> >> I was trying to stick with the base package and figure out how the
base
> >> routines worked. I looked at plyer and it was very appealing. I guess
i'll
> >> give in and use it
> >>
> >> On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy <djmuser at gmail.com>
wrote:
> >>
> >>> Hi:
> >>>
> >>> Use of ddply() in the plyr package appears to work.
> >>>
> >>> library(plyr)
> >>> ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)
> >>>
> >>> D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> >>> 1 1.000000 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
> >>> 2 0.500000 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
> >>> 3 0.500000 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
> >>> 4 0.500000 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
> >>> 5 0.000000 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
> >>> 6 1.333333 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
> >>> 7 1.333333 1988 238 246 249 246 244 213 212 224 232 238 232 230
> >>> 8 1.333333 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238
> >>>
> >>> Replace the NaNs with NAs and that should do it....
> >>>
> >>> HTH,
> >>> Dennis
> >>>
> >>> On Sun, Apr 25, 2010 at 9:52 PM, steven mosher
<moshersteven at gmail.com>wrote:
> >>>
> >>>> Having some difficulties with understanding how tapply works and
getting
> >>>> return values I expect
> >>>>
> >>>> Data: dataframe. DF DF$Id $D $Year.......
> >>>>
> >>>> Id D Year Jan Feb Mar Apr May Jun Jul
Aug Sep
> >>>> Oct
> >>>> Nov Dec
> >>>> 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237
NA
> >>>> NA
> >>>> 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA
231
> >>>> NA
> >>>> 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA
NA
> >>>> 245
> >>>> 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA
NA
> >>>> NA
> >>>> 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA
NA
> >>>> NA
> >>>> 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA
NA
> >>>> NA
> >>>> 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225
NA
> >>>> NA
> >>>> 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA
NA
> >>>> NA
> >>>> 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243
240
> >>>> NA
> >>>> 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243
240
> >>>> NA
> >>>> 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243
240
> >>>> NA
> >>>> 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238
232
> >>>> 230
> >>>> 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA
NA
> >>>> 230
> >>>> 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA
NA
> >>>> 230
> >>>> 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA
NA
> >>>> 238
> >>>> 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA
NA
> >>>> 238
> >>>> 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA
NA
> >>>> 238
> >>>>
> >>>> and the result should be a dataframe of column means by year with
the
> >>>> variable D dropped (or kept doesnt matter)
> >>>>
> >>>> 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237
NA
> >>>> NA
> >>>> 11264402000 .5 1981 NA NA 243 244 NA NA NA NA 225 NA
231
> >>>> NA
> >>>> 11264402000 .5 1982 236 237 242 240 242 205 199 NA NA NA
NA
> >>>> NA
> >>>> 11264402000 .5 1983 NA 247 NA NA NA NA NA 205 NA
225
> >>>> NA
> >>>> NA
> >>>> 11264402000 1 1986 NA NA NA 240 NA NA NA 213 NA NA
NA
> >>>> NA
> >>>> 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235 243
240
> >>>> NA
> >>>> 11264402000 1.33 1988 238 246 249 246 244 213 212 224 232
238
> >>>> 232
> >>>> 230
> >>>> 11264402000 1.33 1989 232 233 238 239 231 NA 215 NA NA
NA
> >>>> NA
> >>>> 238
> >>>>
> >>>> It would seem that Tapply should work
> >>>> result<-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)
> >>>>
> >>>> but i get errors about the length of arguments, which
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>>
> >>
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list