[R] Lahman Baseball Data Using R DBI Package

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Thu Oct 8 20:16:43 CEST 2020


Hello,


Às 17:26 de 08/10/20, Bill Dunlap escreveu:
> This is really a feature of SQL, not R.  SQL requires that you double quote
> column names that start with numbers, include spaces, etc., or that are SQL
> key words.  


Right, but there's no need to escape the double quotes, just put the SQL 
statement between single quotes and it becomes more readable.

sqldf::sqldf('select "Order","Where","From" from d WHERE "From"="me"')


Hope this helps,

Rui Barradas

E.g.,
> 
>> d <- data.frame(Order=c("sit","stay","heel"),
> Where=c("here","there","there"), From=c("me","me","you"))
>> sqldf::sqldf("select Order,Where,From from d WHERE From=\"me\"")
> Error: near "Order": syntax error
>> sqldf::sqldf("select \"Order\",\"Where\",\"From\" from d Where
> \"From\"=\"me\"")
>    Order Where From
> 1   sit  here   me
> 2  stay there   me
> 
> You may as well double quote all column names in SQL queries.
> 
> -Bill
> 
> On Wed, Oct 7, 2020 at 9:57 PM William Michels <wjm1 using caa.columbia.edu>
> wrote:
> 
>> Hi Philip,
>>
>> You've probably realized by now that R doesn't like column names that
>> start with a number. If you try to access an R-dataframe column named
>> 2B or 3B with the familiar "$" notation, you'll get an error:
>>
>>> library(DBI)
>>> library(RSQLite)
>>> con2 <- dbConnect(SQLite(), "~/R_Dir/lahmansbaseballdb.sqlite")
>>> Hack12Batting <- dbGetQuery(con2,"SELECT * FROM batting WHERE yearID =
>> 2018 AND AB >600 ORDER BY AB DESC")
>>> Hack12Batting$AB
>>   [1] 664 661 639 632 632 632 626 623 620 618 617 613 606 605 602
>>> Hack12Batting$3B
>> Error: unexpected numeric constant in "Hack12Batting$3"
>>
>> How to handle? You can rename columns on-the-fly by piping. See
>> reference [1] and use either library(magrittr) or library(dplyr) or a
>> combination thereof:
>>
>> library(magrittr)
>> dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
>> ORDER BY AB DESC") %>% set_colnames(make.names(colnames(.)))
>>
>> #OR one of the following:
>>
>> library(dplyr)
>> dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
>> ORDER BY AB DESC") %>% rename(X2B = `2B`, X3B = `3B`)
>>
>> library(dplyr)
>> dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
>> ORDER BY AB DESC") %>% `colnames<-`(make.names(colnames(.)))
>>
>> library(dplyr)
>> dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
>> ORDER BY AB DESC") %>% magrittr::set_colnames(make.names(colnames(.)))
>>
>> Best, Bill.
>>
>> W. Michels, Ph.D.
>>
>> [1]
>> https://stackoverflow.com/questions/28100780/use-with-replacement-functions-like-colnames
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Oct 2, 2020 at 7:34 PM Bill Dunlap <williamwdunlap using gmail.com>
>> wrote:
>>>
>>> The double quotes are required by SQL if a name is not of the form
>>> letter-followed-by-any-number-of-letters-or-numbers or if the name is a
>> SQL
>>> keyword like 'where' or 'select'.  If you are doing this from a function,
>>> you may as well quote all the names.
>>>
>>> -Bill
>>>
>>> On Fri, Oct 2, 2020 at 6:18 PM Philip <herd_dog using cox.net> wrote:
>>>
>>>> The \”2B\” worked.  Have no idea why.  Can you point me somewhere that
>> can
>>>> explain this to me.
>>>>
>>>> Thanks,
>>>> Philip
>>>>
>>>> *From:* Bill Dunlap
>>>> *Sent:* Friday, October 2, 2020 3:54 PM
>>>> *To:* Philip
>>>> *Cc:* r-help
>>>> *Subject:* Re: [R] Lahman Baseball Data Using R DBI Package
>>>>
>>>> Have you tried putting double quotes around 2B and 3B:  "...2B, 3B,
>> ..."
>>>> -> "...\"2B\",\"3B\",..."?
>>>>
>>>> -Bill
>>>>
>>>> On Fri, Oct 2, 2020 at 3:49 PM Philip <herd_dog using cox.net> wrote:
>>>>
>>>>> I’m trying to pull data from one table (batting) in the Lahman
>> Baseball
>>>>> database.  Notice X2B for doubles and X3B for triples – fourth and
>> fifth
>>>>> from the right.
>>>>>
>>>>> The dbGetQuery function runs fine when I leave there two out but I get
>>>>> error messages (in red) when I include 2B/3B or X2B/X3B.
>>>>>
>>>>> Can anyone give me some direction?
>>>>>
>>>>> Thanks,
>>>>> Philip Heinrich
>>>>>
>>>>>
>> ***************************************************************************************************************************************************
>>>>> tail(dbReadTable(Lahman,"batting"))
>>>>>
>>>>> ID                       playerID      yearID    stint teamID team_ID
>>>>> lgID   G        G_batting   AB     R     H       X2B    X3B   HR
>>   RBI   SB
>>>>> 107414 107414 yastrmi01      2019       1       SFN       2920
>>>>> NL     107        NA          371   64  101      22       3     21
>>>>> 55    2
>>>>> 107416 107416 yelicch01      2019        1       MIL       2911
>>>>> NL     130        NA          489 100  161      29       3     44
>> 97   30
>>>>> 107419 107419 youngal01     2019       1       ARI        2896
>>>>> NL       17        NA            25     1      1        0       0
>>>>> 0      0     0
>>>>> 107420 107420 zagunma01   2019        1      CHN       2901      NL
>>>>> 30        NA             36     2      9        3       0      0
>>>>> 5     0
>>>>> 107422 107422 zavalse01      2019        1      CHA       2900
>>>>> AL        5        NA             12     1      1        0       0
>>>>> 0       0     0
>>>>> 107427 107427 zimmery01     2019        1      WAS      2925      NL
>>>>> 52        NA           171   20    44        9      0      6      27
>>     0
>>>>> 107428 107428 zobribe01       2019        1      CHN      2901
>>>>> NL      47        NA           150   24   39         5      0      1
>>>>> 17     0
>>>>> 107429 107429 zuninmi01       2019        1      TBA       2922
>>>>> AL      90        NA           266    30  44       10      1      9
>>>>> 32     0
>>>>>
>>>>>
>>>>> Hack11Batting <- dbGetQuery(Lahman,"SELECT
>>>>> playerID,yearID,AB,R,H,2B,3B,HR,
>>>>>                              RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP FROM
>>>>> batting
>>>>>                              WHERE yearID = 2018 AND AB >99")
>>>>> Error: unrecognized token: "2B"
>>>>>
>>>>> Hack11Batting <- dbGetQuery(Lahman,"SELECT
>>>>> playerID,yearID,AB,R,H,X2B,X3B,HR,
>>>>>                              RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP FROM
>>>>> batting
>>>>>                              WHERE yearID = 2018 AND AB >99")
>>>>> Error: no such column: X2B
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list