[R] How to load data from Statistics Canada

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Wed May 20 22:49:12 CEST 2009


Peter Dalgaard wrote:
> guox at ucalgary.ca wrote:
>> We would like to load data from Statistics Canada
>> (http://www.statcan.gc.ca/) using R,
>> for example, Employment and unemployment rates.
>> It seems to me that the tables are displayed in HTML.
>> I was wondering if you know how to load these tables. Thanks,
>
> I suspect the answer is "with some difficulty". You can do stuff like
> this, based on using the clipboard. Go to

or maybe
   
    library(XML)
    document =
htmlParse('http://www.statcan.gc.ca/daily-quotidien/090520/t090520b1-eng.htm')
    rows = xpathSApply(document, '//table/tbody/tr')

and then use further xpaths to extract the content of interest.

vQ

>
> http://www.statcan.gc.ca/daily-quotidien/090520/t090520b1-eng.htm
>
> mark the contents of the table, then
>
> > dd <- t(read.delim("clipboard", colClasses="character"))
> > dd1 <- dd[-1,] # 1st row are labels
> > dd2 <- as.numeric(gsub(",","",dd1)) # strip thousands separators
> Warning message:
> NAs introduced by coercion
> > dim(dd2) <- dim(dd1)
> > dd2
>      [,1]  [,2]  [,3]   [,4]    [,5]     [,6]  [,7] [,8] [,9]   [,10]
> [,11]
> [1,]   NA 226.8 123.1 2948.0 11630.0 178768.0 122.5   NA 37.6 27822.0 
> 1.760
> [2,]   NA 224.6 117.7 2945.0 10709.0 181862.0 121.7   NA 37.1 28822.0 
> 1.750
> [3,]   NA 222.0 109.5 2932.0  9694.0 185068.0 121.1   NA 36.9 27801.0 
> 1.730
> [4,]   NA 218.8 101.2 2924.0  8968.0 187636.0 120.6   NA 36.7 26560.0 
> 1.690
> [5,]   NA 215.6  97.2 2920.0  8759.0 189702.0 120.1   NA 36.4 23762.0 
> 1.640
> [6,]   NA 213.3  96.0 2918.0  8770.0 191343.0 119.7   NA 36.2 22029.0 
> 1.600
> [7,]   NA  -1.1  -1.2   -0.1     0.1      0.9  -0.3   NA -0.5    -7.3
> -0.045
>      [,12]  [,13]  [,14] [,15]
> [1,]    NA 2959.0 9637.0 221.8
> [2,]    NA 2963.0 9635.0 218.4
> [3,]    NA 2966.0 9587.0 217.1
> [4,]    NA 2939.0 9368.0 211.2
> [5,]    NA 2915.0 9325.0 209.4
> [6,]    NA 2879.0 9199.0 210.5
> [7,]    NA   -1.2   -1.4   0.5




More information about the R-help mailing list