[R] help with web scraping
William Michels
wjm1 @end|ng |rom c@@@co|umb|@@edu
Fri Jul 24 00:46:08 CEST 2020
Hi Spencer,
I tried the code below on an older R-installation, and it works fine.
Not a full solution, but it's a start:
> library(RCurl)
Loading required package: bitops
> url <- "https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975"
> M_sos <- getURL(url)
> print(M_sos)
[1] "\r\n<!DOCTYPE html>\r\n\r\n<html
lang=\"en-us\">\r\n<head><title>\r\n\tSOS, Missouri - Elections:
Offices Filed in Candidate Filing\r\n</title><meta name=\"viewport\"
content=\"width=device-width, initial-scale=1.0\" [...remainder
truncated].
HTH, Bill.
W. Michels, Ph.D.
On Thu, Jul 23, 2020 at 2:55 PM Spencer Graves
<spencer.graves using effectivedefense.org> wrote:
>
> Hello, All:
>
>
> I've failed with multiple attempts to scrape the table of
> candidates from the website of the Missouri Secretary of State:
>
>
> https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975
>
>
> I've tried base::url, base::readLines, xml2::read_html, and
> XML::readHTMLTable; see summary below.
>
>
> Suggestions?
> Thanks,
> Spencer Graves
>
>
> sosURL <-
> "https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975"
>
> str(baseURL <- base::url(sosURL))
> # this might give me something, but I don't know what
>
> sosRead <- base::readLines(sosURL) # 404 Not Found
> sosRb <- base::readLines(baseURL) # 404 Not Found
>
> sosXml2 <- xml2::read_html(sosURL) # HTTP error 404.
>
> sosXML <- XML::readHTMLTable(sosURL)
> # List of 0; does not seem to be XML
>
> sessionInfo()
>
> R version 4.0.2 (2020-06-22)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Catalina 10.15.5
>
> Matrix products: default
> BLAS:
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets
> [6] methods base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.0.2 tools_4.0.2 curl_4.3
> [4] xml2_1.3.2 XML_3.99-0.3
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list