[R] Help with read.csv.sql()
Ulrik Stervbo
U|r|k@Stervbo @end|ng |rom ruhr-un|-bochum@de
Wed Jul 29 17:14:09 CEST 2020
You might achieve this using readr:
```
library(readr)
lines <- "Id, Date, Time, Quality, Lat, Long
STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"
read_csv(lines)
read_csv(
lines,
skip = 1, # Ignore the header row
col_names = c("myId", "myDate", "myTime", "myQuality", "myLat",
"myLong"),
col_types = cols(
myDate = col_date(format = ""),
myTime = col_time(format = ""),
myLat = col_number(),
myLong = col_number(),
.default = col_character()
)
)
read_csv(
lines,
col_types = cols_only(
Id = col_character(),
Date = col_date(format = ""),
Time = col_time(format = "")
)
)
read_csv(
lines,
skip = 1, # Ignore the header row
col_names = c("myId", "myDate", "myTime", "myQuality", "myLat",
"myLong"),
col_types = cols_only(
myId = col_character(),
myDate = col_date(format = ""),
myTime = col_time(format = "")
)
)
```
HTH
Ulrik
On 2020-07-20 02:07, H wrote:
> On 07/18/2020 01:38 PM, William Michels wrote:
>> Do either of the postings/threads below help?
>>
>> https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>> https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>>
>> Otherwise you can try reading through the FAQ on Github:
>>
>> https://github.com/ggrothendieck/sqldf
>>
>> HTH, Bill.
>>
>> W. Michels, Ph.D.
>>
>>
>>
>> On Sat, Jul 18, 2020 at 9:59 AM H <agents using meddatainc.com> wrote:
>>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>>> Hello,
>>>>
>>>> I don't believe that what you are asking for is possible but like
>>>> Bert suggested, you can do it after reading in the data.
>>>> You could write a convenience function to read the data, then change
>>>> what you need to change.
>>>> Then the function would return this final object.
>>>>
>>>> Rui Barradas
>>>>
>>>> Às 16:43 de 18/07/2020, H escreveu:
>>>>
>>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>>> Is there some reason that you can't make the changes to the data
>>>>>> frame (column names, as.date(), ...) *after* you have read all
>>>>>> your data in?
>>>>>>
>>>>>> Do all your csv files use the same names and date formats?
>>>>>>
>>>>>>
>>>>>> Bert Gunter
>>>>>>
>>>>>> "The trouble with having an open mind is that people keep coming
>>>>>> along and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents using meddatainc.com
>>>>>> <mailto:agents using meddatainc.com>> wrote:
>>>>>>
>>>>>> I have created a dataframe with columns that are characters,
>>>>>> integers and numeric and with column names assigned by me. I am
>>>>>> using read.csv.sql() to read portions of a number of large csv
>>>>>> files into this dataframe, each csv file having a header row with
>>>>>> columb names.
>>>>>>
>>>>>> The problem I am having is that the csv files have header
>>>>>> rows with column names that are slightly different from the column
>>>>>> names I have assigned in the dataframe and it seems that when I
>>>>>> read the csv data into the dataframe, the column names from the
>>>>>> csv file replace the column names I chose when creating the
>>>>>> dataframe.
>>>>>>
>>>>>> I have been unable to figure out if it is possible to assign
>>>>>> column names of my choosing in the read.csv.sql() function? I have
>>>>>> tried various variations but none seem to work. I tried colClasses
>>>>>> = c(....) but that did not work, I tried field.types = c(...) but
>>>>>> could not get that to work either.
>>>>>>
>>>>>> It seems that the above should be feasible but I am missing
>>>>>> something? Does anyone know?
>>>>>>
>>>>>> A secondary issue is that the csv files have a column with a
>>>>>> date in mm/dd/yyyy format that I would like to make into a Date
>>>>>> type column in my dataframe. Again, I have been unable to find a
>>>>>> way - if at all possible - to force a conversion into a Date
>>>>>> format when importing into the dataframe. The best I have so far
>>>>>> is to import is a character column and then use as.Date() to later
>>>>>> force the conversion of the dataframe column.
>>>>>>
>>>>>> Is it possible to do this when importing using
>>>>>> read.csv.sql()?
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org <mailto:R-help using r-project.org> mailing
>>>>>> list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>> code.
>>>>>>
>>>>> Yes, the files use the same column names and date format (at least
>>>>> as far as I know now.) I agree I could do it as you suggest above
>>>>> but from a purist perspective I would rather do it when importing
>>>>> the data using read.csv.sql(), particularly if column names and/or
>>>>> date format might change, or be different between different files.
>>>>> I am indeed selecting rows from a large number of csv files so this
>>>>> is entirely plausible.
>>>>>
>>>>> Has anyone been able to name columns in the read.csv.sql() call
>>>>> and/or force date format conversion in the call itself? The first
>>>>> refers to naming columns differently from what a header in the csv
>>>>> file may have.
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> The documentation for read.csv.sql() suggests that colClasses()
>>> and/or field.types() should work but I may well have misunderstood
>>> the documentation, hence my question in this group.
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> I had read the sqldf() documentation but was left with the impression
> that what I want to do is not easily doable.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list