[R] Developing a web crawler
antujsrv
antujsrv at gmail.com
Thu Mar 3 10:22:44 CET 2011
Hi,
I wish to develop a web crawler in R. I have been using the functionalities
available under the RCurl package.
I am able to extract the html content of the site but i don't know how to go
about analyzing the html formatted document.
I wish to know the frequency of a word in the document. I am only acquainted
with analyzing data sets.
So how should i go about analyzing data that is not available in table
format.
Few chunks of code that i wrote:
w <-
getURL("http://www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B003DZ1Y8Q/ref=dp_reviewsanchor#FullQuotes")
write.table(w,"test.txt")
t <- readLines(w)
readLines also didnt prove out to be of any help.
Any help would be highly appreciated. Thanks in advance.
--
View this message in context: http://r.789695.n4.nabble.com/Developing-a-web-crawler-tp3332993p3332993.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list