[R] data frame manipulation and regex
David Winsemius
dwinsemius at comcast.net
Wed Apr 28 14:25:09 CEST 2010
On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:
> Dear group,
>
> Here is my data.frame :
>
> avprix <-
> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10",
> "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE
> ZINC USD
> Jul/10",
> "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5,
> -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names =
> c("DESCRIPTION",
> "prix", "quantity"), row.names = c(NA, -6L), class = "data.frame")
>
>> avprix
> DESCRIPTION prix quantity
> 1 CORN Jul/10 -1.5 0
> 2 CORN May/10 -1082.0 -3
> 3 ROBUSTA COFFEE (10) Jul/10 11084.0 8
> 4 SOYBEANS Jul/10 1983.5 2
> 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0 -1
> 6 STANDARD LEAD USD Jul/10 -118.0 0
>
> I need to remove the date (i.e. Jul/10 in this example) for each
> element of
> the DESCRIPTION column that contains the USD symbol. I am trying to
> do this
> using regular expressions, but must admit I am going nowhere.
> My elements in the DESCRIPTION column and the dates can change every
> day.
This searches for the pattern USD and then replaces any three
characters , forward-slash, any two characters:
> sub("USD+.*(.../..)", "", avprix$DESCRIPTION)
[1] "CORN Jul/10" "CORN May/10" "ROBUSTA
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC "
"STANDARD LEAD "
This tightens up the matching by requiring that that the characters
after the slash be digits:
> sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION)
[1] "CORN Jul/10" "CORN May/10" "ROBUSTA
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC "
"STANDARD LEAD "
-- David.
>
>
> TY for any help.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list