[R] data frame manipulation
arnaud Gaboury
arnaud.gaboury at gmail.com
Fri Apr 16 13:21:30 CEST 2010
When I pass your command line, here is what I get :
>op=ddply(df,c("DESCRIPTION"),summarise,POSITION=sum(QUANITY),DATE=max(CREAT
ED.DATE),SETTLEMENT=CLOSING.PRICE[CREATED.DATE=max(CREATED.DATE)])
> op
DESCRIPTION POSITION DATE SETTLEMENT
1 PRIMARY NICKEL 0 2010-03-10 <NA>
2 PRM HGH GD ALU 0 2010-04-09 <NA>
3 SPCL HIGH GRAD 2 2010-04-09 <NA>
4 STANDARD LEAD 0 2010-04-06 <NA>
That is exactly what I want, but not with the NA ! the SETTLEMENT column
should show the corresponding CLOSING.PRICE for the CREATED.DATE
***************************
Arnaud Gaboury
Mobile: +41 79 392 79 56
BBM: 255B488F
***************************
From: Ista Zahn [mailto:istazahn at gmail.com]
Sent: Friday, April 16, 2010 1:05 PM
To: arnaud Gaboury
Cc: r-help at r-project.org
Subject: Re: [R] data frame manipulation
Hi,
I'm not sure I understand what you want exactly. My best guess is that you
want something like
op=ddply(DF, c("DESCRIPTION"), summarise, POSITION=
sum(QUANITY),DATE=max(CREATED.DATE), CLOSING.PRICE =
CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)])
op <- unique(op)
Does that do it?
-Ista
On Fri, Apr 16, 2010 at 4:16 AM, arnaud Gaboury <arnaud.gaboury at gmail.com>
wrote:
Dear group,
Here is my data.frame :
df <-
structure(list(DESCRIPTION = c("PRM HGH GD ALU", "PRM HGH GD ALU",
"PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL",
"STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
"STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
"SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
"SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
"SPCL HIGH GRAD", "SPCL HIGH GRAD"), CREATED.DATE = structure(c(14708,
14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700,
14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708,
14708, 14708, 14708, 14622, 14634), class = "Date"), QUANITY = c(-1,
1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
-1, 1, 1, 1, -1), CLOSING.PRICE = c("2,415.9000", "2,415.9000",
"25,755.7100", "25,755.7100", "25,760.8600", "25,760.8600", "2,355.9600",
"2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600",
"2,355.9600", "2,357.1200", "2,420.7300", "2,420.7300", "2,420.7300",
"2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500",
"2,388.4300", "2,388.4300")), .Names = c("DESCRIPTION", "CREATED.DATE",
"QUANITY", "CLOSING.PRICE"), row.names = 26:49, class = "data.frame")
I am looking at summarize it in something like this :
> op
DESCRIPTION POSITION DATE
1 PRIMARY NICKEL 0 2010-03-10
2 PRM HGH GD ALU 0 2010-04-09
3 SPCL HIGH GRAD 2 2010-04-09
4 STANDARD LEAD 0 2010-04-06
To obtain "op", I wrote this following line :
> op=ddply(df, c("DESCRIPTION"), summarise, POSITION=
sum(QUANITY),DATE=max(CREATED.DATE)).
Until there, fine. But I need to have one more column, "CLOSING.PRICE". If I
write this line :
> op1=ddply(c, c("DESCRIPTION","CLOSING.PRICE"), summarise, POSITION=
sum(QUANITY),DATE=max(CREATED.DATE))
Here is what I get:
> op1
DESCRIPTION CLOSING.PRICE POSITION DATE
1 PRIMARY NICKEL 25,755.7100 0 2010-03-05
2 PRIMARY NICKEL 25,760.8600 0 2010-03-10
3 PRM HGH GD ALU 2,415.9000 0 2010-04-09
4 SPCL HIGH GRAD 2,388.4300 0 2010-01-25
5 SPCL HIGH GRAD 2,420.7300 1 2010-04-08
6 SPCL HIGH GRAD 2,421.0500 1 2010-04-09
7 STANDARD LEAD 2,355.9600 -1 2010-04-01
8 STANDARD LEAD 2,357.1200 1 2010-04-06
Not exactly what I want. Can anyone help?
TY
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org
More information about the R-help
mailing list