[R] [External] Re: read.csv fails in R console in Ubuntu terminal but works in RStudio after R 3.6.3 upgrade to R 4.0.2?

Rasmus Liland jr@| @end|ng |rom po@teo@no
Fri Jul 17 15:23:43 CEST 2020


On 2020-07-17 07:54 -0400, Sam H wrote:
| On 2020-07-17 09:30 +0100, ruipbarradas wrote:
| | On 2020-07-16 20:59 -0500, luke-tierney using uiowa.edu wrote:
| | | Às 08:45 de 15/07/20, Sam H escreveu:
| | | | Hi,
| | | | 
| | | | I am trying to download some 
| | | | data using read.csv and it works 
| | | | perfectly in RStudio and fails 
| | | | in the R console in the terminal 
| | | | in Ubuntu 18.04 after upgrading 
| | | | from R 3.6.3 to 4.0.2. 
| | | 
| | | On my Ubuntu system the download 
| | | with read.csv succeeds in an R 
| | | console if I set the HTTPUserAgent 
| | | and download.file.method options to 
| | | match the ones used by RStudio.
| | | 
| | | Given how picky the server is being 
| | | I would worry about whether this use 
| | | is in line with the site's terms of 
| | | service.
| |
| | Yes, I thought it's a site policy 
| | issue too. But the file can be 
| | accessed and read/downloaded from 
| | RStudio and Firefox so apparently 
| | there's no reason why R console 
| | shouldn't .
| 
| Hello,
| 
| Thank you very much to you all to look into this.
| 
| I came across this problem when I was using TTR::stockSymbols() (
| https://github.com/joshuaulrich/TTR/blob/e6609b9f7621f3a4b1a204c159af61aebc89997e/R/WebData.R)
| .
| 
| As a workaround I added this function 
| to my private R package and instead of 
| read.csv I am now using 
| data.table::fread() which properly 
| (without failing) downloads the file 
| and reads it.

Dear Sam,

Good thing you solved this.  

Like Luke said, to use read.csv you need 
to set the HTTPUserAgent option:

	options("HTTPUserAgent"="User-Agent: RStudio Desktop (1.3.959)")

... or with cURL directly:

	rasmus using twentyfive ~ % curl -H 'User-Agent: RStudio Desktop (1.3.959)' 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'

Às 08:45 de 15/07/20, Sam H escreveu:
| Before upgrading this worked in the R 
| console in the terminal also without 
| any issues.

In version 3.6.3, I was not able to 
run the lines

	> R.Version()$version.string
	[1] "R version 3.6.3 (2020-02-29)"
	> options()[c("download.file.method", "HTTPUserAgent")]
	$<NA>
	NULL
	
	$HTTPUserAgent
	[1] "R (3.6.3 x86_64-pc-linux-gnu x86_64 linux-gnu)"
	
	> x<-"https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download"
	> read.csv(x, as.is=TRUE, na="n/a")
	Error in file(file, "rt") :
	  cannot open the connection to 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download'
	In addition: Warning message:
	In file(file, "rt") :
	  cannot open URL 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download': HTTP status was '403 Forbidden'
	>

Running data.table::fread in 4.0.2:

	> options()[c("download.file.method", "HTTPUserAgent")]
	$<NA>
	NULL
	
	$HTTPUserAgent
	[1] "R (4.0.2 x86_64-pc-linux-gnu x86_64 linux-gnu)"
	> x <- "https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download"
	> data.table::fread(x, header=TRUE)[1:2,]
	   Symbol               Name LastSale
	1:    TXG 10x Genomics, Inc.    89.19
	2:     YI          111, Inc.     6.53
	   MarketCap IPOyear        Sector
	1:    $8.77B    2019 Capital Goods
	2:  $537.81M    2018   Health Care
	                                           industry
	1: Biotechnology: Laboratory Analytical Instruments
	2:                         Medical/Nursing Services
	                       Summary Quote V9
	1: https://old.nasdaq.com/symbol/txg NA
	2:  https://old.nasdaq.com/symbol/yi NA

Does anyone know what data.table::fread 
does different to read.csv here (so 
setting HTTPUserAgent is not needed)?  

Without HTTPUserAgent, I think 
data.table::fread just reports something 
like "libcurl/7.71.1", like read.csv 
would have done ...

Best,
Rasmus



More information about the R-help mailing list