[R] how to get the group mean deviation data ?

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Jul 25 08:57:47 CEST 2005

> if n id quite large,say n=1000 and t=3, it require too much time.so i 
> want to know any more efficient way to do it?

Why is about 0.4 second (which is what it takes on my system) too long?

Given that you want to operate on 3000 cells, a second does not look 

This is a toy problem, and it is unclear what the real problem is (if 
any).  Since you have the same number of replications for each cell 
(group-variable combination), I would use this as a n x 3 x t array (a 
simple call to dim and aperem).  Then rowMeans will find the group means, 
and you can just subtract those to get the deviations from the means, 
making use of recycling.


D <- d[,-1]
dim(D) <- c(t,n,3)
D <- aperm(D, c(2,3,1))
gmeans <- rowMeans(D, dims=2)
d[,-1] - rep(gmeans, each=3)

That takes under 10ms for n=1000

On Mon, 25 Jul 2005, ronggui wrote:

>> n=10;t=3
>> d<-cbind(id=rep(1:n,each=t),y=rnorm(n*t),x=rnorm(n*t),z=rnorm(n*t))
>> head(d)
>     id          y           x          z
> [1,]  1 -2.1725379  0.07629954 -0.3985258
> [2,]  1 -1.2383038 -2.49667038  0.6966127
> [3,]  1 -1.2642401 -0.50613307  0.4895856
> [4,]  2  0.2171246  0.86711864 -0.6660036
> [5,]  2  2.2765760 -0.48547142 -1.4496664
> [6,]  2  0.5985345 -1.06427035  2.1761071
> first,i want to get the group mean of each variable,which i can use
>> d<-data.frame(d)
>> aggregate(d,list(d$id),mean)[,-1]
>   id           y          x           z
> 1   1 -1.55836060 -0.9755013  0.26255754
> 2   2  1.03074502 -0.2275410  0.02014565
> 3   3  0.20700121 -0.7159450  1.35890176
> 4   4  0.17839650  1.2575891  0.04135165
> 5   5 -0.20012508  0.4310221  0.55458899
> 6   6 -0.13084185 -0.2953392  0.28229068
> 7   7  0.20737288 -0.8863761 -0.50793880
> 8   8  0.07512612 -0.6591304 -0.21656533
> 9   9  0.94727796 -0.6108891  0.13529884
> 10 10 -0.04434875  0.1332086 -0.88229808
> then i want the  group mean deviation data,like
>> head(sapply(d[,2:4],function(x) x-ave(x,d$id)))
>              y          x          z
> [1,] -0.6141773  1.0518008 -0.6610833
> [2,]  0.3200568 -1.5211691  0.4340552
> [3,]  0.2941205  0.4693682  0.2270281
> [4,] -0.8136205  1.0946597 -0.6861493
> [5,]  1.2458310 -0.2579304 -1.4698121
> [6,] -0.4322105 -0.8367293  2.1559614
> both above are what i want.though i can do it use the function  to do it.but if n id quite large,say n=1000 and t=3, it require too much time.so i want to know any more efficient way to do it?
> myfun<-function(x,id)
> {
> x<-as.matrix(x)
> id<-as.factor(id)
> xm<- apply(x,2,function(y,z) tapply(y,z, mean), z=id)
> xdm<- x[] <- x-xm[id,]
> re<-list(xm=xm, xdm=xdm)
> re
> }

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list