[R] read.csv data frame thousands separator
Marc Schwartz
marc_schwartz at me.com
Fri Apr 23 15:10:26 CEST 2010
On Apr 23, 2010, at 8:00 AM, arnaud Gaboury wrote:
> Dear group,
>
>
>
> Here is my df, trades1 :
>
>
>
> trades1 <-
>
> structure(list(Instrument.Long.Name = c("CORN", "CORN", "CORN",
> "CORN", "CORN", "SOYBEANS", "SOYBEANS", "SOYBEANS", "SOYBEANS",
> "SOYBEANS", "SOYBEANS", "STANDARD LEAD USD", "STANDARD LEAD USD",
> "SPCL HIGH GRADE ZINC USD", "SPCL HIGH GRADE ZINC USD", "SPCL HIGH GRADE
> ZINC USD",
> "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)",
> "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)",
> "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)"), Buy.Sell..Cleared. = c("Buy",
> "Buy", "Sell", "Buy", "Sell", "Buy", "Buy", "Buy", "Buy", "Sell",
> "Sell", "Buy", "Sell", "Buy", "Sell", "Sell", "Buy", "Buy", "Buy",
> "Buy", "Buy", "Buy", "Buy", "Buy"), Volume = c(1L, 1L, 5L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L), Price = c("361.5000", "361.5000", "361.0000",
> "372.5000", "374.0000", "990.0000", "992.2500", "995.2500", "995.7500",
> "992.2500", "997.5000", "2,251.5000", "2,369.5000", "2,406.0000",
> "2,421.5000", "2,448.5000", "1,380.0000", "1,383.0000", "1,383.0000",
> "1,386.0000", "1,386.0000", "1,388.0000", "1,389.0000", "1,389.0000"
> ), Net.Charges..sum. = c(-4.32, -4.32, -21.6, -4.32, -4.32, -4.32,
> -4.32, -4.32, -4.32, -4.32, -4.32, -12.5, -12.5, -12.5, -12.5,
> -12.5, -3.3, -3.3, -3.3, -3.3, -3.3, -3.3, -3.3, -3.3)), .Names =
> c("Instrument.Long.Name",
> "Buy.Sell..Cleared.", "Volume", "Price", "Net.Charges..sum."), class =
> "data.frame", row.names = c(NA,
> -24L))
>
>
>
> I got it from this command line :
>
>
>
>> trades1=read.csv2("LSCTrades.csv",dec=".",sep=",",as.is=T,h=T,skip=1)
>
>
>
> The csv file has some numbers with thousands separator as a ".", so the
> class for my "Price" column is character, when I want it numeric.
>
> I can't change the column class, even when playing with the read.csv or
> as.numeric commands and all their arguments.
>
>
>
> Any help would be appreciate.
Use gsub() to replace the commas with "" and then coerce to numeric:
> as.numeric(gsub(",", "", trades1$Price))
[1] 361.50 361.50 361.00 372.50 374.00 990.00 992.25 995.25
[9] 995.75 992.25 997.50 2251.50 2369.50 2406.00 2421.50 2448.50
[17] 1380.00 1383.00 1383.00 1386.00 1386.00 1388.00 1389.00 1389.00
If you want to assign the result back to the column:
> str(trades1)
'data.frame': 24 obs. of 5 variables:
$ Instrument.Long.Name: chr "CORN" "CORN" "CORN" "CORN" ...
$ Buy.Sell..Cleared. : chr "Buy" "Buy" "Sell" "Buy" ...
$ Volume : int 1 1 5 1 1 1 1 1 1 1 ...
$ Price : chr "361.5000" "361.5000" "361.0000" "372.5000" ...
$ Net.Charges..sum. : num -4.32 -4.32 -21.6 -4.32 -4.32 -4.32 -4.32 -4.32 -4.32 -4.32 ...
trades1$Price <- as.numeric(gsub(",", "", trades1$Price))
> str(trades1)
'data.frame': 24 obs. of 5 variables:
$ Instrument.Long.Name: chr "CORN" "CORN" "CORN" "CORN" ...
$ Buy.Sell..Cleared. : chr "Buy" "Buy" "Sell" "Buy" ...
$ Volume : int 1 1 5 1 1 1 1 1 1 1 ...
$ Price : num 362 362 361 372 374 ...
$ Net.Charges..sum. : num -4.32 -4.32 -21.6 -4.32 -4.32 -4.32 -4.32 -4.32 -4.32 -4.32 ...
> trades1$Price
[1] 361.50 361.50 361.00 372.50 374.00 990.00 992.25 995.25
[9] 995.75 992.25 997.50 2251.50 2369.50 2406.00 2421.50 2448.50
[17] 1380.00 1383.00 1383.00 1386.00 1386.00 1388.00 1389.00 1389.00
See ?gsub
HTH,
Marc Schwartz
More information about the R-help
mailing list