[R] read.csv data frame thousands separator

Marc Schwartz marc_schwartz at me.com
Fri Apr 23 15:10:26 CEST 2010


On Apr 23, 2010, at 8:00 AM, arnaud Gaboury wrote:

> Dear group,
> 
> 
> 
> Here is my df, trades1 :
> 
> 
> 
> trades1 <-
> 
> structure(list(Instrument.Long.Name = c("CORN", "CORN", "CORN", 
> "CORN", "CORN", "SOYBEANS", "SOYBEANS", "SOYBEANS", "SOYBEANS", 
> "SOYBEANS", "SOYBEANS", "STANDARD LEAD USD", "STANDARD LEAD USD", 
> "SPCL HIGH GRADE ZINC USD", "SPCL HIGH GRADE ZINC USD", "SPCL HIGH GRADE
> ZINC USD", 
> "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)", 
> "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)", 
> "ROBUSTA COFFEE (10)", "ROBUSTA COFFEE (10)"), Buy.Sell..Cleared. = c("Buy",
> "Buy", "Sell", "Buy", "Sell", "Buy", "Buy", "Buy", "Buy", "Sell", 
> "Sell", "Buy", "Sell", "Buy", "Sell", "Sell", "Buy", "Buy", "Buy", 
> "Buy", "Buy", "Buy", "Buy", "Buy"), Volume = c(1L, 1L, 5L, 1L, 
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
> 1L, 1L, 1L, 1L), Price = c("361.5000", "361.5000", "361.0000", 
> "372.5000", "374.0000", "990.0000", "992.2500", "995.2500", "995.7500", 
> "992.2500", "997.5000", "2,251.5000", "2,369.5000", "2,406.0000", 
> "2,421.5000", "2,448.5000", "1,380.0000", "1,383.0000", "1,383.0000", 
> "1,386.0000", "1,386.0000", "1,388.0000", "1,389.0000", "1,389.0000"
> ), Net.Charges..sum. = c(-4.32, -4.32, -21.6, -4.32, -4.32, -4.32, 
> -4.32, -4.32, -4.32, -4.32, -4.32, -12.5, -12.5, -12.5, -12.5, 
> -12.5, -3.3, -3.3, -3.3, -3.3, -3.3, -3.3, -3.3, -3.3)), .Names =
> c("Instrument.Long.Name", 
> "Buy.Sell..Cleared.", "Volume", "Price", "Net.Charges..sum."), class =
> "data.frame", row.names = c(NA, 
> -24L))
> 
> 
> 
> I got it from this command line :
> 
> 
> 
>> trades1=read.csv2("LSCTrades.csv",dec=".",sep=",",as.is=T,h=T,skip=1)
> 
> 
> 
> The csv file has some numbers with thousands separator as a ".", so the
> class for my "Price" column is character, when I want it numeric.
> 
> I can't change the column class, even when playing with the read.csv or
> as.numeric commands and all their arguments.
> 
> 
> 
> Any help would be appreciate.


Use gsub() to replace the commas with "" and then coerce to numeric:

> as.numeric(gsub(",", "", trades1$Price))
 [1]  361.50  361.50  361.00  372.50  374.00  990.00  992.25  995.25
 [9]  995.75  992.25  997.50 2251.50 2369.50 2406.00 2421.50 2448.50
[17] 1380.00 1383.00 1383.00 1386.00 1386.00 1388.00 1389.00 1389.00


If you want to assign the result back to the column:

> str(trades1)
'data.frame':	24 obs. of  5 variables:
 $ Instrument.Long.Name: chr  "CORN" "CORN" "CORN" "CORN" ...
 $ Buy.Sell..Cleared.  : chr  "Buy" "Buy" "Sell" "Buy" ...
 $ Volume              : int  1 1 5 1 1 1 1 1 1 1 ...
 $ Price               : chr  "361.5000" "361.5000" "361.0000" "372.5000" ...
 $ Net.Charges..sum.   : num  -4.32 -4.32 -21.6 -4.32 -4.32 -4.32 -4.32 -4.32 -4.32 -4.32 ...


trades1$Price <- as.numeric(gsub(",", "", trades1$Price))


> str(trades1)
'data.frame':	24 obs. of  5 variables:
 $ Instrument.Long.Name: chr  "CORN" "CORN" "CORN" "CORN" ...
 $ Buy.Sell..Cleared.  : chr  "Buy" "Buy" "Sell" "Buy" ...
 $ Volume              : int  1 1 5 1 1 1 1 1 1 1 ...
 $ Price               : num  362 362 361 372 374 ...
 $ Net.Charges..sum.   : num  -4.32 -4.32 -21.6 -4.32 -4.32 -4.32 -4.32 -4.32 -4.32 -4.32 ...


> trades1$Price
 [1]  361.50  361.50  361.00  372.50  374.00  990.00  992.25  995.25
 [9]  995.75  992.25  997.50 2251.50 2369.50 2406.00 2421.50 2448.50
[17] 1380.00 1383.00 1383.00 1386.00 1386.00 1388.00 1389.00 1389.00


See ?gsub

HTH,

Marc Schwartz



More information about the R-help mailing list