[R] How to load data from Statistics Canada
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Wed May 20 22:49:12 CEST 2009
Peter Dalgaard wrote:
> guox at ucalgary.ca wrote:
>> We would like to load data from Statistics Canada
>> (http://www.statcan.gc.ca/) using R,
>> for example, Employment and unemployment rates.
>> It seems to me that the tables are displayed in HTML.
>> I was wondering if you know how to load these tables. Thanks,
>
> I suspect the answer is "with some difficulty". You can do stuff like
> this, based on using the clipboard. Go to
or maybe
library(XML)
document =
htmlParse('http://www.statcan.gc.ca/daily-quotidien/090520/t090520b1-eng.htm')
rows = xpathSApply(document, '//table/tbody/tr')
and then use further xpaths to extract the content of interest.
vQ
>
> http://www.statcan.gc.ca/daily-quotidien/090520/t090520b1-eng.htm
>
> mark the contents of the table, then
>
> > dd <- t(read.delim("clipboard", colClasses="character"))
> > dd1 <- dd[-1,] # 1st row are labels
> > dd2 <- as.numeric(gsub(",","",dd1)) # strip thousands separators
> Warning message:
> NAs introduced by coercion
> > dim(dd2) <- dim(dd1)
> > dd2
> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> [,11]
> [1,] NA 226.8 123.1 2948.0 11630.0 178768.0 122.5 NA 37.6 27822.0
> 1.760
> [2,] NA 224.6 117.7 2945.0 10709.0 181862.0 121.7 NA 37.1 28822.0
> 1.750
> [3,] NA 222.0 109.5 2932.0 9694.0 185068.0 121.1 NA 36.9 27801.0
> 1.730
> [4,] NA 218.8 101.2 2924.0 8968.0 187636.0 120.6 NA 36.7 26560.0
> 1.690
> [5,] NA 215.6 97.2 2920.0 8759.0 189702.0 120.1 NA 36.4 23762.0
> 1.640
> [6,] NA 213.3 96.0 2918.0 8770.0 191343.0 119.7 NA 36.2 22029.0
> 1.600
> [7,] NA -1.1 -1.2 -0.1 0.1 0.9 -0.3 NA -0.5 -7.3
> -0.045
> [,12] [,13] [,14] [,15]
> [1,] NA 2959.0 9637.0 221.8
> [2,] NA 2963.0 9635.0 218.4
> [3,] NA 2966.0 9587.0 217.1
> [4,] NA 2939.0 9368.0 211.2
> [5,] NA 2915.0 9325.0 209.4
> [6,] NA 2879.0 9199.0 210.5
> [7,] NA -1.2 -1.4 0.5
More information about the R-help
mailing list