[R] Two Problems while trying to aggregate a dataframe
Delcour Libertus
delcour.libertus at gmail.com
Sat Mar 24 18:35:59 CET 2007
Hello!
Given is an Excel-Sheet with actually 11,000 rows and 9 columns. I want
to work with the data in R. The contents are similar to my following
example.
I have a list with ID-number, personal name and two kinds of
loan-values. I want to aggregate the list, that for each person only one
row remains and where the loan-values are added.
First I tried some commands with tapply but had no success at all. Then
I found in this mailing list a hint for aggregate (though I did not
understand most of that mail).
So I made some efforts with aggregate() and it seems to lead the right way:
[code]
> atest <- read.csv2 ("aggregatetest.csv")
> str(atest)
`data.frame': 10 obs. of 4 variables:
$ PrsNr : int 1 2 2 3 4 5 6 6 6 7
$ Namen : Factor w/ 7 levels "Holla","Mabba",..: 1 2 2 4 5 6 7 7 7 3
$ Betrag1: num 1.99 2.34 5.23 4.23 2.23 2.77 3.83 2.76 6.32 2.88
$ Betrag2: num 3.44 5.32 5.21 9.12 7.32 8.32 6.99 4.45 5.34 3.81
> atest
PrsNr Namen Betrag1 Betrag2
1 1 Holla 1.99 3.44
2 2 Mabba 2.34 5.32
3 2 Mabba 5.23 5.21
4 3 Pisa 4.23 9.12
5 4 Pulla 2.23 7.32
6 5 Raba 2.77 8.32
7 6 Saba 3.83 6.99
8 6 Saba 2.76 4.45
9 6 Saba 6.32 5.34
10 7 Mulla 2.88 3.81
> aggregate(list(Betrag1=atest$Betrag1), by=list(PsrNr=atest$PrsNr,
Namen=atest$Namen), sum)
PsrNr Namen Betrag1
1 1 Holla 1.99
2 2 Mabba 7.57
3 7 Mulla 2.88
4 3 Pisa 4.23
5 4 Pulla 2.23
6 5 Raba 2.77
7 6 Saba 12.91
[/code]
The result is nearly that I want.
First problem:
How do I get all columnss in my result. "Betrag2" is missing.
Second problem:
If I use the aggregate-command on the real data then it is for me
impossible to use more than on by-grouping variable (my example above
has two). Impossible because 1 GB RAM and 1.5 GB SWAP are not enough to
process my command. My computer (Ubuntu Linux, Gmome) freezes. So I
doubt wether I use the appropriate method to follow my target.
Which ist the best way to aggregate dataframes as I want? Are there any
better functions/commands or do I have to learn programming for this?
Greetings
Delcour
More information about the R-help
mailing list