[R] getting data from a webpage

Bos, Roger roger.bos at rothschild.com
Mon Dec 19 17:48:53 CET 2016


R Studio did a webinar on Web Scraping using the rvest package that made it look really easy.  I haven't gotten around to using it yet, but the video should be on their website somewhere.  The link below is the PDF of the slides.  It should be education and will probably give you what you need to know to get the data you need:


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Glenn Schultz
Sent: Monday, December 19, 2016 10:02 AM
To: R Help R
Subject: [R] getting data from a webpage


I was getting data swap rate data from the St. Louis Fed FRED database via the FRED API.  ICE stopped reporting to FRED and now I must get the data from the ICE website.  I would like to use httr to get the data but I really don't know much about website design.  I think the form redirects but I am not sure that is the case much less how to identify what website the form redirects to.  I used the developer and inspect elements to come up with the below which failed miserably.  In addition, I purchase the book Automated Data Collection with R which has not been to useful helping me to understand how to navigate pages using forms and redirects.

Can anyone provide a good reference to understanding how to get data from websites using forms and redirects.  Specifically,

How find the actual webpage that on must submit the POST request.
How to the find the redirected page which really has the data.


#get initial cookies
h <- handle("https://www.theice.com/")
GET(handle = h)
POST(url = "https://www.theice.com/marketdata/reports/180",
body = list(reportDate = "15-Dec-2016", SeriesNameAnRunCode_chosen = "USD Rates 1100"), encode = "form", handle = h) page <- GET(url= "https://www.theice.com/marketdata/reports/icebenchmarkadmin/ISDAFIXHistoricalRates.shtml",
handle = h)
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This message and any attachments are for the intended recipient’s use only. This message may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by an error in transmission.
If you have received this message in error, please immediately notify the sender by e-mail, delete the message, any attachments and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message or any attachments if you are not the intended recipient.

More information about the R-help mailing list