[R] Remove superscripts from HTML objects

Chris Stubben stubben at lanl.gov
Thu Apr 12 03:56:14 CEST 2012


Is there some way to remove superscripts from objects returned by
html/xmlParse (XML package)?

h <- "<html><p>Cat<sup>a</sup></p><p>Dog</p></html>"
doc <- htmlParse(h)
 xpathSApply(doc, "//p", xmlValue)
[1] "Cata" "Dog"

I could probably remove the  <sup> tags from the "h" object above, but I'd
rather just work with the results from htmlParse if possible (and not use
readLines to load raw HTML first).

Thanks,
Chris Stubben
 


--
View this message in context: http://r.789695.n4.nabble.com/Remove-superscripts-from-HTML-objects-tp4550738p4550738.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list