[R] Can scan() detect end-of-file?
Sarah Goslee
sarah.goslee at gmail.com
Thu Oct 15 23:06:17 CEST 2015
Thus the post-processing, which I assume you'd have to do with scan() as well.
> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
> allfile <- readLines(tcon, n=10000)
> strsplit(paste(allfile, collapse="\n"), "\"")
[[1]]
[1] "A " "Two line\nentry" "\n\n"
"Three\nline\nentry"
[5] " D E"
Or similar, depending on exactly what you want the result to look like.
On Thu, Oct 15, 2015 at 4:56 PM, William Dunlap <wdunlap at tibco.com> wrote:
> readLines() does not work for me since it breaks up
> multiline fields that are enclosed in quotes. E.g., the
> text file line
> A "Two line\nentry"
> should be imported as 2 strings, the second being
> "Two line\nfield", not "\"Two line" with the next call to
> readLines bringing in "fentry\"".
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Oct 15, 2015 at 1:44 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
>> I've always used system("wc -l myfile") to get the number of lines in
>> advance. But here are two other R-only options, both using readLines
>> instead of scan. There's probably something more efficient, too.
>>
>> Your setup:
>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n'
>> tfile <- tempfile()
>> cat(t, file=tfile)
>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>>
>> readLines() produces character(0) for nonexistent lines and "" for empty lines.
>>
>>> readLines(tcon, n=1)
>> [1] "A \"Two line"
>>> readLines(tcon, n=1)
>> [1] "entry\""
>>> readLines(tcon, n=1)
>> [1] ""
>>> readLines(tcon, n=1)
>> [1] "\"Three"
>>> readLines(tcon, n=1)
>> [1] "line"
>>> readLines(tcon, n=1)
>> [1] "entry\" D E"
>>> readLines(tcon, n=1)
>> character(0)
>>> readLines(tcon, n=1)
>> character(0)
>>
>> Or if the file isn't too large for memory, you can read the whole
>> thing in then process it line by line:
>>
>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>> allfile <- readLines(tcon, n=10000)
>>
>>> length(allfile)
>> [1] 6
>>
>> On Thu, Oct 15, 2015 at 4:16 PM, William Dunlap <wdunlap at tibco.com> wrote:
>>> I would like to read a connection line by line with scan but
>>> don't know how to tell when to quit trying. Is there any
>>> way that you can ask the connection object if it is at the end?
>>>
>>> E.g.,
>>>
>>> t <- 'A "Two line\nentry"\n\n"Three\nline\nentry" D E\n'
>>> tfile <- tempfile()
>>> cat(t, file=tfile)
>>> tcon <- file(tfile, "r") # or tcon <- textConnection(t)
>>> scan(tcon, what="", nlines=1)
>>> #Read 2 items
>>> #[1] "A" "Two line\nentry"
>>>> scan(tcon, what="", nlines=1) # empty line
>>> #Read 0 items
>>> #character(0)
>>> scan(tcon, what="", nlines=1)
>>> #Read 3 items
>>> #[1] "Three\nline\nentry" "D" "E"
>>> scan(tcon, what="", nlines=1) # end of file
>>> #Read 0 items
>>> #character(0)
>>> scan(tcon, what="", nlines=1) # end of file
>>> #Read 0 items
>>> #character(0)
>>>
>>> I am reading virtual line by virtual line because the lines
>>> may have different numbers of fields.
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
More information about the R-help
mailing list