[R] Scraping info from a web site?
Spencer Graves
spencer.graves at effectivedefense.org
Wed Jan 31 11:36:04 CET 2018
Hi, All:
What would you suggest one use to read the data on members of the
US Congress and their positions on net neutrality from
"https://www.battleforthenet.com/scoreboard" into R?
I found recommendations for the "rvest" package to "Easily
Harvest (Scrape) Web Pages". I tried the following:
URL <- 'https://www.battleforthenet.com/scoreboard/'
library(rvest)
Bftn <- read_html(URL)
str(Bftn)
List of 2
$ node:<externalptr>
$ doc :<externalptr>
- attr(*, "class")= chr [1:2] "xml_document" "xml_node"
However, I don't know what to do with <externalptr>.
The "Selectorgadget" vignette with rvest suggested selecting what
I wanted on the web page and pasting that as an argument into
"html_node". This led me to try the following:
Bftn_nodes <- html_nodes(Bftn,
'.psb-unknown , #house, #senate, #senate p')
str(Bftn_nodes)
List of 4
$ :List of 2
..$ node:<externalptr>
..$ doc :<externalptr>
..- attr(*, "class")= chr "xml_node"
$ :List of 2
..$ node:<externalptr>
..$ doc :<externalptr>
..- attr(*, "class")= chr "xml_node"
$ :List of 2
..$ node:<externalptr>
..$ doc :<externalptr>
..- attr(*, "class")= chr "xml_node"
$ :List of 2
..$ node:<externalptr>
..$ doc :<externalptr>
..- attr(*, "class")= chr "xml_node"
- attr(*, "class")= chr "xml_nodeset"
This seems like it may be progress, but I'm still confused on
what to do next. Or maybe I should be using a different package? Or
posting this question to someplace else like StackOverflow.com?
Thanks,
Spencer Graves
More information about the R-help
mailing list