[R] reading tables from multiple HTML pages
s1oliver
s1oliver at ucsd.edu
Mon Aug 29 18:04:43 CEST 2011
Hi, beginner to R and was having some problems scraping data from tables in
html using the XML package. I have included some code below.
I am trying to loop through a series of html pages, each of which contains a
single table from which I want to scrape data. However, some of the pages
are blank - and so it throws me an error message when it gets to
htmlParse(). The loop then closes out and I get the error message below:
Error in htmlParse(url) :
error in creating parser for
http://www.szrd.gov.cn/viewcommondbfc.do?id=728
How might be best to go about keeping the loop running so I can parse the
rest?
****************************************************
library(XML)
url_root<-"http://www.szrd.gov.cn/viewcommondbfc.do?id="
for(i in 700:750){
url = paste(url_root, i, sep="")
doc = htmlParse(url)
tableNodes = getNodeSet(doc, "//table")
tbl = readHTMLTable(tableNodes[[3]])
}
****************************************************
Steve Oliver
Department of Political Science
University of California at San Diego
9500 Gilman Dr.
La Jolla, CA 92092
--
View this message in context: http://r.789695.n4.nabble.com/reading-tables-from-multiple-HTML-pages-tp3776605p3776605.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list