[R] merging with aggregating
Dubravko Dolic
Dubravko.Dolic at komdat.com
Tue Dec 6 15:19:20 CET 2005
Hi all,
the moment you hit the 'send' button you know the answer...
I approached a solution similar to this one given by Marc. But maybe there is a better one? Even because this operation is done in a for-loop during which R gets new data from a database. So I sum up 16 data.frames eventually.
Dubro
-----Ursprüngliche Nachricht-----
Von: Marc Schwartz [mailto:MSchwartz at mn.rr.com]
Gesendet: Dienstag, 6. Dezember 2005 15:11
An: Dubravko Dolic
Cc: r-help at stat.math.ethz.ch
Betreff: Re: [R] merging with aggregating
On Tue, 2005-12-06 at 14:22 +0100, Dubravko Dolic wrote:
> Dear List,
>
> I have two data.frame of the following form:
>
> A:
>
> n V1 V2
> 1 12 0
> 2 10 8
> 3 3 8
> 4 8 4
> 6 7 3
> 7 12 0
> 8 1 0
> 9 18 0
> 10 1 0
> 13 2 0
>
> B:
>
> n V1 V2
> 1 0 2
> 2 0 3
> 3 1 9
> 4 12 8
> 5 2 9
> 6 2 9
> 8 2 0
> 10 4 1
> 11 7 1
> 12 0 1
>
>
> Now I want to merge those frame to one data.frame with summing up the
> columns V1 and V2 but not the column n. So the result in this example
> would be:
>
> AB:
>
> n V1 V2
> 1 12 2
> 2 10 11
> 3 4 17
> 4 20 12
> 5 2 9
> 6 9 12
> 7 12 0
> 8 3 0
> 9 18 0
> 10 5 1
> 11 7 1
> 12 0 1
> 13 2 0
>
>
> So Columns V1 and V2 are the sum of A und B while n has its old value.
> Notice that there are different rows in n of A and B.
>
> I don't have a clue how to start here. Any hint is welcome.
>
> Thanks
There might be a somewhat easier way, but here is one approach:
# Use merge() to join A and B on 'n'
# Set all = TRUE to include non-matched rows
> C <- merge(A, B, by = "n", all = TRUE)
> C
n V1.x V2.x V1.y V2.y
1 1 12 0 0 2
2 2 10 8 0 3
3 3 3 8 1 9
4 4 8 4 12 8
5 5 NA NA 2 9
6 6 7 3 2 9
7 7 12 0 NA NA
8 8 1 0 2 0
9 9 18 0 NA NA
10 10 1 0 4 1
11 11 NA NA 7 1
12 12 NA NA 0 1
13 13 2 0 NA NA
# Now get the rowSums() for the V1/V2 column pairs
# and create a new dataframe from the
# results
> AB <- data.frame(n = C$n,
V1 = rowSums(C[, c(2, 4)], na.rm = TRUE),
V2 = rowSums(C[, c(3, 5)], na.rm = TRUE))
> AB
n V1 V2
1 1 12 2
2 2 10 11
3 3 4 17
4 4 20 12
5 5 2 9
6 6 9 12
7 7 12 0
8 8 3 0
9 9 18 0
10 10 5 1
11 11 7 1
12 12 0 1
13 13 2 0
See ?merge and ?rowSums for more information.
HTH,
Marc Schwartz
More information about the R-help
mailing list