[R] SQldf with sqlite and H2

Mandans mandans_p at yahoo.com
Fri Jul 15 01:14:47 CEST 2011


Thanks a lot Gabor. It helped a lot. Appreciate your time and effort.

Thanks

--- On Thu, 7/14/11, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:

> From: Gabor Grothendieck <ggrothendieck at gmail.com>
> Subject: Re: [R] SQldf with sqlite and H2
> To: "Mandans" <mandans_p at yahoo.com>
> Cc: r-help at r-project.org
> Date: Thursday, July 14, 2011, 2:22 PM
> On Thu, Jul 14, 2011 at 10:33 AM,
> Mandans <mandans_p at yahoo.com>
> wrote:
> > SQldf with sqlite and H2
> >
> > I have a large csv file (about 2GB) and wanted to
> import the file into R and do some filtering and analysis.
> Came across sqldf ( a great idea and product) and was trying
> to play around to see what would be the best method of doing
> this. csv file is comma delimited with some columns having
> comma inside the quoation like this "John, Doe".
> >
> > I tried this first
> >
> > #######
> > library(sqldf)
> > sqldf("attach testdb as new")
> > In.File <- "C:/JP/Temp/2008.csv"
> > read.csv.sql(In.File, sql = "create table table1 as
> select * from file",
> >  dbname = "testdb")
> >
> > It errored out with message
> >
> > NULL
> > Warning message:
> > closing unused connection 3 (C:/JP/Temp/2008.csv)
> >
> > When this failed, I converted this file from comma
> delimited to tab delimited and used this command
> >
> > #########
> > read.csv.sql(In.File, sql = "create table table1 as
> select * from file",
> >  dbname = "testdb", sep = "\t")
> >
> > and this worked, it created testdb sqlite file with
> the size of 3GB
> >
> > now my question is in 3 parts.
> >
> > 1. Is it possible to create a dataframe with
> appropriate column classes and use that column classes when
> I use the read.csv.sql command to create the table.
> Something like may be create the table from that DF and then
> update with read.csv.sql.?
> >
> > Any example code will be really helpful.
> 
> Here is an example of using method = "name__class". 
> Note there are
> two underscores in a row.  It appears I neglected to
> document that
> Date2 means convert from character representation whereas
> Date means
> convert from numeric representation.  It would also be
> possible to use
> method = "raw" and then coerce the columns yourself
> afterwards.
> 
> # create test file
> Lines <- 'A__Date2|B
> 2000-01-01|x,y
> 2000-01-02|c,d
> '
> tf <- tempfile()
> cat(Lines, file = tf)
> 
> 
> library(sqldf)
> DF <- read.csv.sql(tf, sep = "|", method =
> "name__class")
> str(DF)
> 
> >
> > 2. If we use the H2 database instead of default sqlite
> and use the readcsv option, will that be faster and is there
> a way we can specify the above thought of applying a DF
> class to table column properties and update with CSVREAD
> >
> > library(RH2)
> > something like SELECT * FROM
> CSVREAD('C:/JP/Temp/2008.csv')
> >
> > Any example code will be really helpful.
> 
> Sorry, I haven't tested the speed of this.  postgresql
> and mysql, both
> supported by sqldf, also have builtin methods to read
> files. If I had
> to guess I would guess that mysql would be fastest but this
> would have
> to be tested.
> 
> >
> > 3. How do we specify where the H2 file is saved. Saw
> something like this, when I ran this example from RH2
> package, couldn't find the file in the working directory.
> >
> > con <- dbConnect(H2(), "jdbc:h2:~/test", "sa", "")
> 
> ~ means your home directory so ~/test means test is in the
> home directory.
> 
> Try
> 
> normalizePath("~")
> normalizePath("~/test")
> etc.
> 
> to see what they refer to.
> 
> Regards.
> 
> >
> > Sorry for the long mail. Appreciate all for building a
> great community and for the wonderful software in R.
> > Thanks for Gabor Grothendieck for bring sqldf to this
> great community.
> >
> > Any help or direction you can provide in this is
> highly appreciated.
> >
> > Thanks all.
> >
> > ______________________________________________
> > R-help at r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 
> 
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>



More information about the R-help mailing list