[R] htmlParse hangs or crashes
Simon Kiss
sjkiss at gmail.com
Mon Sep 5 23:48:57 CEST 2011
Dear colleagues,
each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message.
The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the URLs of the articles linked to on this search page). Regardless, I'd still like to understand why htmlParse doesn't work.
Thank you for any insight.
Yours,
Simon Kiss
myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011")
.x<-htmlParse(myurl)
class(.x)
#returns "HTMLInternalDocument" "XMLInternalDocument"
.x
#returns
*** caught segfault ***
address 0x1398754, cause 'memory not mapped'
Traceback:
1: .Call("RS_XML_dumpHTMLDoc", doc, as.integer(indent), as.character(encoding), as.logical(indent), PACKAGE = "XML")
2: saveXML(from)
3: saveXML(from)
4: asMethod(object)
5: as(x, "character")
6: cat(as(x, "character"), "\n")
7: print.XMLInternalDocument(<pointer: 0x11656d3e0>)
8: print(<pointer: 0x11656d3e0>)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_3.4-0 RCurl_1.5-0 bitops_1.0-4.1
*********************************
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606
More information about the R-help
mailing list