[R] SQldf with sqlite and H2

Mandans mandans_p at yahoo.com
Thu Jul 14 16:33:06 CEST 2011


SQldf with sqlite and H2

I have a large csv file (about 2GB) and wanted to import the file into R and do some filtering and analysis. Came across sqldf ( a great idea and product) and was trying to play around to see what would be the best method of doing this. csv file is comma delimited with some columns having comma inside the quoation like this "John, Doe". 

I tried this first 

#######
library(sqldf)
sqldf("attach testdb as new")
In.File <- "C:/JP/Temp/2008.csv"
read.csv.sql(In.File, sql = "create table table1 as select * from file", 
  dbname = "testdb")

It errored out with message

NULL
Warning message:
closing unused connection 3 (C:/JP/Temp/2008.csv)   

When this failed, I converted this file from comma delimited to tab delimited and used this command  

#########  
read.csv.sql(In.File, sql = "create table table1 as select * from file", 
  dbname = "testdb", sep = "\t")

and this worked, it created testdb sqlite file with the size of 3GB

now my question is in 3 parts.

1. Is it possible to create a dataframe with appropriate column classes and use that column classes when I use the read.csv.sql command to create the table. Something like may be create the table from that DF and then update with read.csv.sql.?

Any example code will be really helpful.

2. If we use the H2 database instead of default sqlite and use the readcsv option, will that be faster and is there a way we can specify the above thought of applying a DF class to table column properties and update with CSVREAD

library(RH2)
something like SELECT * FROM CSVREAD('C:/JP/Temp/2008.csv')

Any example code will be really helpful.

3. How do we specify where the H2 file is saved. Saw something like this, when I ran this example from RH2 package, couldn't find the file in the working directory.

con <- dbConnect(H2(), "jdbc:h2:~/test", "sa", "")

Sorry for the long mail. Appreciate all for building a great community and for the wonderful software in R.
Thanks for Gabor Grothendieck for bring sqldf to this great community.

Any help or direction you can provide in this is highly appreciated.

Thanks all.



More information about the R-help mailing list