[R] Download CSV Files from EUROSTAT Website

David Winsemius dwinsemius at comcast.net
Mon Nov 4 20:26:46 CET 2013


On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:

> Thanks.
> I had already introduced this minor adjustments in the code, but the real problem (to me) is the information that gets lost: the informative name of the columns, the indicator type and the units.

Maybe you should use their "download" facility rather than trying to deparse a complex webpage with lots of special user interaction "features":

http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do

-- 
David.

> Cheers
> 
> Lorenzo
> 
> On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> 
>> Hello,
>> 
>> If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using Jean's code a bit changed,
>> 
>> library(XML)
>> 
>> mylines <- readLines(url("http://bit.ly/1coCohq"))
>> closeAllConnections()
>> mytable <- readHTMLTable(mylines, which = 2, asText=TRUE, stringsAsFactors = FALSE)
>> 
>> str(mytable)
>> 
>> mytable[] <- lapply(mytable, function(x) gsub("\\(.*\\)", "", x))
>> mytable[] <- lapply(mytable, function(x) gsub(",", "", x))
>> mytable[] <- lapply(mytable, as.numeric)
>> 
>> colnames(mytable) <- 2000:2013
>> 
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> 
>> Em 04-11-2013 09:53, Lorenzo Isella escreveu:
>>> Hello,
>>> And thanks a lot.
>>> This is indeed very close to what I need.
>>> I am trying to figure out how not to "lose" the headers and how to avoid
>>> downloading labels like "(p)" together with the numerical data I am
>>> interested in.
>>> If anyone on the list knows how to make this minor modifications, s/he
>>> will make my life much easier.
>>> Cheers
>>> 
>>> Lorenzo
>>> 
>>> 
>>> On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean <jvadams at usgs.gov> wrote:
>>> 
>>>> Lorenzo,
>>>> 
>>>> I may be able to help you get started.  You can use the XML package to
>>>> grab the information >off the internet.
>>>> 
>>>> library(XML)
>>>> 
>>>> mylines <- readLines(url("http://bit.ly/1coCohq"))
>>>> closeAllConnections()mylist <- readHTMLTable(mylines,
>>>> asText=TRUE)mytable <- mylist1$xTable
>>>> 
>>>> However, when I look at the resulting object, mytable, it doesn't have
>>>> informative row or >column headings.  Perhaps someone else can figure
>>>> out how to get that information.
>>>> 
>>>> Jean
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
>>>> <lorenzo.isella at gmail.com> wrote:
>>>>> Dear All,
>>>>> I often need to do some work on some data which is publicly available
>>>>> on the EUROSTAT >>website.
>>>>> I saw several ways to download automatically mainly the bulk data
>>>>> from EUROSTAT to later on >>postprocess it with R, for instance
>>>>> 
>>>>> http://bit.ly/HrDICj
>>>>> http://bit.ly/HrDL10
>>>>> http://bit.ly/HrDTgT
>>>>> 
>>>>> However, what I would like to do is to be able to download directly
>>>>> the csv file >>corresponding to a properly formatted dataset
>>>>> (typically a dynamic dataset) from EUROSTAT.
>>>>> To fix the ideas, please consider the dataset at the following link
>>>>> 
>>>>> http://bit.ly/1coCohq
>>>>> 
>>>>> what I would like to do is to automatically read its content into R,
>>>>> or at least to >>automatically download it as a csv file (full
>>>>> extraction, single file, no flags and >>footnotes) which I can then
>>>>> manipulate easily.
>>>>> Any suggestion is appreciated.
>>>>> Cheers
>>>>> 
>>>>> Lorenzo
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list