[R] retrieve certain part from html
Romain Francois
romain.francois at dbmail.com
Wed Sep 23 14:39:54 CEST 2009
Hi,
The R4X package can help you. (I have wrapped your td's into one tr)
> x <- xml( "<tr><td><a href='2005-01.html'>2005-01</a></td><td><a
+ href='2006-01.html'>2006-01</a></td><td><a
+ href='2007-01.html'>2007-01</a></td><td><a
+ href='2008-01.html'>2008-01</a></td><td><a
+ href='2009-01.html'>2009-01</a></td></tr>" )
> x["td/a/#"]
td td td td td
"2005-01" "2006-01" "2007-01" "2008-01" "2009-01"
> x["td/a/@href"]
td td td td td
"2005-01.html" "2006-01.html" "2007-01.html" "2008-01.html" "2009-01.html"
Romain
On 09/23/2009 02:29 PM, Rene wrote:
>
> Dear All,
>
> Can someone please guide me how to get the certain part from a long html
> language?
>
> e.g.
>
>
>
> "<td><a href='2005-01.html'>2005-01</a></td><td><a
> href='2006-01.html'>2006-01</a></td><td><a
> href='2007-01.html'>2007-01</a></td><td><a
> href='2008-01.html'>2008-01</a></td><td><a
> href='2009-01.html'>2009-01</a></td>"
>
>
>
> How to get only the wording of "2005-01.html", "2006-01.html",
> "2007-01.html"," 2008-01.html"," 2009-01.html" from the above html code? I
> have tried to use gsub function, but not working.
>
>
>
> Please guide me on this.
>
>
>
> Thanks a lot.
>
> Rene.
--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc
|- http://tr.im/yw8E : New R package : sos
`- http://tr.im/y8y0 : search the graph gallery from R
More information about the R-help
mailing list