[R] Problem with aggregating data across time points
Allan Engelhardt
allane at cybaea.com
Fri Jul 2 18:09:53 CEST 2010
On 02/07/10 16:21, Chris Beeley wrote:
> Hello-
>
> I have a dataset which basically looks like this:
>
> Location Sex Date Time Verbal Self harm
> Violence_objects Violence
> A 1 1-4-2007 1800 3 0
> 1 3
> A 1 1-4-2007 1230 2 1
> 2 4
> D 2 2-4-2007 1100 0 4
> 0 0
> ...
>
> I've put a dput of the first section of the data at the end of this
> email. [...]
>
> What I want to do is:
>
> A) sum each of the dependent variables for each of the dates (so e.g.
> in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and
> 3+4=7 for each of the variables)
>
If 'data' is the data at the end of your email, then
> aggregate(cbind(verbal,self.harm,violence_objects,violence) ~ Date, data = data, FUN = sum)
Date verbal self.harm violence_objects violence
1 01/04/07 25 15 3 9
2 02/04/07 24 6 8 13
3 03/04/07 17 13 0 10
is one approach. Read help("aggregate") and don't forget the na.action=
argument.
> B) do this sum, but only in each location this time (location is the
> first variable)- so the sum for 1-4-2007 in location A, sum for
> 1-4-2007 in location B, and so on and so on. Because this is divided
>
The basic approach could be
> aggregate(cbind(verbal,self.harm,violence_objects,violence) ~ Date + Location, data = data, FUN = sum)
Date Location verbal self.harm violence_objects violence
1 01/04/07 A 7 1 0 3
2 02/04/07 A 8 2 0 1
3 03/04/07 A 0 0 0 2
4 01/04/07 B 3 2 0 1
5 02/04/07 B 4 2 0 0
6 03/04/07 B 4 0 0 3
7 01/04/07 C 4 2 3 2
8 02/04/07 C 0 0 4 2
9 03/04/07 C 1 1 0 5
10 01/04/07 D 7 6 0 3
11 02/04/07 D 0 0 0 9
12 03/04/07 D 4 11 0 0
13 01/04/07 E 4 3 0 0
14 02/04/07 E 4 0 4 0
15 03/04/07 E 8 1 0 0
16 01/04/07 F 0 1 0 0
17 02/04/07 F 8 2 0 1
> across locations, some dates will have no data going into them and
> will return 0 sums. Crucially I still want these dates to appear- so
> e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2
> 0 0, then 23-5-2008 0 0 0 0 again, and etc.
>
Why?
But variations on
> data2<- data[!(as.numeric(data$Date)==3& data$Location=="B"),] # For example
> z<- with(data2, tapply(verbal, list(Date,Location), FUN=sum))
> z[is.na(z)]<- 0
> print(z)
A B C D E F
0 0 0 0 0 0 0
01/04/07 0 7 3 4 7 4 0
02/04/07 0 8 0 0 0 4 8
03/04/07 0 0 4 1 4 8 0
will perhaps work for you.
Hope this helps
Allan
More information about the R-help
mailing list