[R] Tapply.

Tue Apr 27 14:21:02 CEST 2010

Hi

If you are not satisfied with R intro docs which are distributed with R 
installation you can consider Introductory statistics with R by P.Dalgaard 
for beginners and mayby Modern applied statistics with S by W.N.Venables 
and B.D.Ripley which is a bit outdated and applies maybe a little more to 
S but still worth reading.

Regards
Petr

r-help-bounces at r-project.org napsal dne 27.04.2010 10:05:25:

> Thanks dennis.
> 
>     Is there a book on R u could recommend.
> 
> 
> 
> On Mon, Apr 26, 2010 at 7:12 PM, Dennis Murphy <djmuser at gmail.com> 
wrote:
> 
> > Hi:
> >
> >
> > > On Mon, Apr 26, 2010 at 8:01 AM, steven mosher 
<moshersteven at gmail.com>wrote:
> > > Thanks,
> >
> > >  I was trying to stick with the base package and figure out how the 
base
> > routines worked.
> >
> > If you want to use base functions, then here's a solution with 
aggregate:
> > (the Id column
> > was removed first):
> >
> > > with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm =
> > TRUE))
> >   Year        D Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> > 1 1980 1.000000 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
> > 2 1981 0.500000 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
> > 3 1982 0.500000 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
> > 4 1983 0.500000 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
> > 5 1986 0.000000 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
> > 6 1987 1.333333 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
> > 7 1988 1.333333 238 246 249 246 244 213 212 224 232 238 232 230
> > 8 1989 1.333333 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238
> >
> > The problem with tapply() is that the function has to be called 
recursively
> > on each
> > column you want to summarize. You could do it in a loop:
> > > res <- matrix(NA, 8, 14)
> > > res[, 1] <- unique(DF$Year)
> > > res[, 2] <- with(DF, tapply(D, Year, mean, na.rm = TRUE))
> > > for(j in 3:14) res[, j] <- tapply(DF[, j], DF$Year, mean, na.rm = 
TRUE)
> > > res
> >      [,1]     [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] 
[,12]
> > [,13]
> > [1,] 1980 1.000000  NaN  NaN  NaN  NaN  NaN  212  203   209   228 237
> > NaN
> > [2,] 1981 0.500000  NaN  251  243  246  241  NaN  NaN   NaN   230 NaN
> > 231
> > [3,] 1982 0.500000  236  237  242  240  242  205  199   NaN   NaN NaN
> > NaN
> > [4,] 1983 0.500000  NaN  247  NaN  NaN  NaN  NaN  NaN   205   NaN 225
> > NaN
> > [5,] 1986 0.000000  NaN  NaN  NaN  240  NaN  NaN  NaN   213   NaN NaN
> > NaN
> > [6,] 1987 1.333333  241  NaN  NaN  NaN  NaN  218  NaN   NaN   235 243
> > 240
> > [7,] 1988 1.333333  238  246  249  246  244  213  212   224   232 238
> > 232
> > [8,] 1989 1.333333  232  233  238  239  231  NaN  215   NaN   NaN NaN
> > NaN
> >      [,14]
> > [1,]   NaN
> > [2,]   245
> > [3,]   NaN
> > [4,]   NaN
> > [5,]   NaN
> > [6,]   NaN
> > [7,]   230
> > [8,]   238
> >
> > but it's not the most efficient way to do things.
> >
> > Essentially, this approach conforms to the 'split-apply-combine' 
strategy
> > which is
> > more efficiently implemented in functions like aggregate() or in 
packages
> > such
> > as doBy, plyr, reshape and data.table, some of which were mentioned 
earlier
> > by
> > Petr Pikal.
> >
> > HTH,
> > Dennis
> >
> >
> > On Mon, Apr 26, 2010 at 8:01 AM, steven mosher 
<moshersteven at gmail.com>wrote:
> >
> >> Thanks,
> >>
> >>   I was trying to stick with the base package and figure out how the 
base
> >> routines worked. I looked at plyer and it was very appealing. I guess 
i'll
> >> give in and use it
> >>
> >> On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy <djmuser at gmail.com> 
wrote:
> >>
> >>> Hi:
> >>>
> >>> Use of ddply() in the plyr package appears to work.
> >>>
> >>> library(plyr)
> >>> ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE)
> >>>
> >>>          D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> >>> 1 1.000000 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN
> >>> 2 0.500000 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245
> >>> 3 0.500000 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN
> >>> 4 0.500000 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN
> >>> 5 0.000000 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN
> >>> 6 1.333333 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN
> >>> 7 1.333333 1988 238 246 249 246 244 213 212 224 232 238 232 230
> >>> 8 1.333333 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238
> >>>
> >>> Replace the NaNs with NAs and that should do it....
> >>>
> >>> HTH,
> >>> Dennis
> >>>
> >>> On Sun, Apr 25, 2010 at 9:52 PM, steven mosher 
<moshersteven at gmail.com>wrote:
> >>>
> >>>> Having some difficulties with understanding how tapply works and 
getting
> >>>> return values I expect
> >>>>
> >>>> Data: dataframe. DF  DF$Id $D $Year.......
> >>>>
> >>>>  Id                          D  Year Jan Feb Mar Apr May Jun Jul 
Aug Sep
> >>>> Oct
> >>>> Nov Dec
> >>>>  11264402000         1 1980  NA  NA  NA  NA  NA 212 203 209 228 237 
 NA
> >>>>  NA
> >>>>  11264402000         0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 
231
> >>>>  NA
> >>>>  11264402000         1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA 
 NA
> >>>> 245
> >>>>  11264402000         0 1982 236 237 242 240 242 205 199  NA  NA  NA 
 NA
> >>>>  NA
> >>>>  11264402000         1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA 
 NA
> >>>>  NA
> >>>>  11264402000         0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA 
 NA
> >>>>  NA
> >>>>  11264402000         1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225 
 NA
> >>>>  NA
> >>>>  11264402000         0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA 
 NA
> >>>>  NA
> >>>>  11264402000         0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 
240
> >>>>  NA
> >>>>  11264402000         1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 
240
> >>>>  NA
> >>>>  11264402000         3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 
240
> >>>>  NA
> >>>>  11264402000         0 1988 238 246 249  NA 244 213 212 224 232 238 
232
> >>>> 230
> >>>>  11264402000         1 1988 238 246 249 246 244 213 212 224 232  NA 
 NA
> >>>> 230
> >>>>  11264402000         3 1988 238 246 249 246 244 213 212 224 232  NA 
 NA
> >>>> 230
> >>>>  11264402000         0 1989 232 233 238 239 231  NA 215  NA  NA  NA 
 NA
> >>>> 238
> >>>>  11264402000         1 1989 232 233 238 239 231  NA  NA  NA  NA  NA 
 NA
> >>>> 238
> >>>>  11264402000         3 1989 232 233 238 239 231  NA  NA  NA  NA  NA 
 NA
> >>>> 238
> >>>>
> >>>> and the result should be a dataframe of column means by year  with 
the
> >>>> variable D dropped (or kept doesnt matter)
> >>>>
> >>>> 11264402000         1  1980  NA  NA  NA  NA  NA 212 203 209 228 237 
 NA
> >>>>  NA
> >>>>  11264402000        .5  1981  NA  NA 243 244  NA  NA  NA  NA 225 NA 
231
> >>>>  NA
> >>>>  11264402000        .5  1982 236 237 242 240 242 205 199  NA  NA NA 
 NA
> >>>>  NA
> >>>>  11264402000        .5  1983  NA 247  NA  NA  NA  NA  NA 205  NA 
225
> >>>>  NA
> >>>>  NA
> >>>>  11264402000        1  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA 
 NA
> >>>>  NA
> >>>>  11264402000         2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 
240
> >>>>  NA
> >>>>  11264402000        1.33 1988 238 246 249  246 244 213 212 224 232 
238
> >>>> 232
> >>>> 230
> >>>>  11264402000        1.33  1989 232 233 238 239 231  NA 215  NA  NA 
NA
> >>>>  NA
> >>>> 238
> >>>>
> >>>>  It would seem that Tapply should work
> >>>>  result<-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)
> >>>>
> >>>>  but i get errors about the length of arguments, which
> >>>>
> >>>>        [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>>
> >>
> >
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.