[R] forecast: bias in sampling from seasonal ARIMA model?
Nicolas Chapados
nicolas.chapados at gmail.com
Wed Jul 6 16:49:36 CEST 2011
[Reposting, since there was problems with encodings the first time around.]
Dear all,
I stumbled upon what appears to be a troublesome issue when sampling from an
ARIMA model (from Rob Hyndman's excellent 'forecast' package) that contains
a seasonal AR component.
Here's how to reproduce the issue. (I'm using R 2.9.2 with forecast 2.19;
see sessionInfo() below).
First some data:
> x <- c(
0.132475, 0.143119, 0.108104, 0.247291, 0.029510, -0.119591, -0.133313,
-0.098128, 0.192698, 0.110328, 0.163671, -0.004925, -0.239209, -0.055122,
-0.051121, 0.154108, 0.008665, -0.074702, 0.066534, -0.098728, -0.068668,
0.150935, -0.022547, 0.028625, 0.107092, -0.065396, -0.253247, -0.115240,
-0.113535, -0.064191, -0.006032, 0.039233, 0.129013, -0.068462, 0.022398,
-0.052427, -0.005586, 0.011447, -0.022667, -0.120536, -0.234398, -0.164087,
-0.177160, -0.120624, -0.025104, 0.001144, -0.193424, -0.260674, -0.036976,
-0.009590, -0.004920, 0.130545, 0.120527, 0.041121, -0.123321, 0.023836,
-0.188418, 0.015807, -0.056012, 0.000496, 0.051806, -0.067574, 0.012775,
0.244083, 0.148857, 0.013874, 0.235252, 0.151935, 0.036986, 0.134482,
-0.003359, -0.019422, 0.086195, 0.206569, 0.123565, 0.070835, -0.183189,
-0.046513, 0.071920, -0.038360, 0.135293, 0.054746, -0.280340, 0.110638,
0.009729, 0.115541, 0.021397, 0.097835, -0.028434, -0.218416, 0.044552,
0.442563, 0.084317, 0.044149, 0.201100, 0.076112, -0.134955, 0.023870,
0.077111, 0.085490, 0.023154, 0.099757, -0.026509, -0.189839, 0.026614,
0.184916, -0.007266, 0.081276, 0.312526, 0.051199, -0.104707, -0.004206,
0.062440, 0.126385, -0.018100, 0.092513, 0.186459, -0.170184, -0.126168,
0.122739, 0.097495, 0.008633, -0.034519, 0.187264, -0.153409, 0.009440,
0.150561, 0.067744, 0.045129, 0.230831, -0.079700, -0.162694, -0.044251,
-0.007663, 0.048986, 0.065724, 0.159706, 0.040067, -0.059949, 0.024810,
-0.154852, 0.018080, 0.165935, 0.203050, 0.011035, -0.232585, -0.162248,
-0.104872, -0.062516, -0.089766, 0.100304, 0.142170, -0.144969, -0.032500,
-0.002131, 0.165890, 0.107629, 0.075752, 0.119003, 0.095955, 0.039842,
0.081208, 0.348529, 0.145694, -0.210700, 0.384966, -0.054503, 0.293329,
0.184295, 0.368986, 0.135270, 0.124917, 0.185286, -0.252088, -0.169708,
-0.010204, 0.021934, 0.003572, 0.180148, 0.075836, -0.232065, -0.127255,
-0.147122, 0.056163, 0.067004, 0.217810, 0.074513, -0.167389, 0.172578,
-0.148127, 0.057025, 0.042623, 0.094214, 0.047004, -0.345453, -0.265104,
-0.082897, 0.052705, -0.067002, 0.191941, 0.010989, -0.298567, -0.162841,
0.043773, 0.185459, 0.126305, 0.383101, 0.092747, -0.368453, -0.325097,
0.029564, -0.015390, 0.013807, 0.152062, -0.047015, -0.429245, -0.097742,
0.104502, -0.007547, -0.000245, 0.062830, 0.030093, -0.381043, -0.267704,
-0.125930, -0.032264, -0.041657, 0.040073, 0.084431, -0.276316, -0.305253,
-0.019942, 0.045390, 0.046090, 0.145700, 0.069920, -0.210079, 0.050967,
0.042283, 0.248840, 0.007883, 0.203171, 0.050722, -0.109773, -0.110301,
-0.095433, 0.071133, 0.023793, 0.192476, 0.057746)
First, a CORRECT model, containing a seasonal MA component but no seasonal
AR component. After estimation, I forecast for 1 time-step, and I take the
mean of sampling 10000 times from the same model:
> my.arima1 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(0,0,2), period=7), include.mean=FALSE)
> forecast(my.arima1, 1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
251 -0.03143283 -0.1882245 0.1253589 -0.271225 0.2083594
> set.seed(1827) ; mean(sapply(seq_len(10000), function(i) as.numeric(simulate(my.arima1, 1)) ))
[1] -0.03258454
The results ("Point Forecast" versus the output of mean()) are identical to
some sampling error.
Now the INCORRECT model arises from adding one seasonal AR component:
> my.arima2 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(1,0,2), period=7), include.mean=FALSE)
> forecast(my.arima2, 1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
251 -0.1848579 -0.322421 -0.04729492 -0.3952424 0.02552655
> set.seed(1827) ; mean(sapply(seq_len(10000), function(i) as.numeric(simulate(my.arima2, 1)) ))
[1] -0.05416299
For the results are substantially different (-0.18 versus -0.05), and the
latter does not change much if I take a much bigger sample.
Did anybody encounter this in the past? Is this a bug?
For reference, here are the results of sessionInfo():
> sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-pc-linux-gnu
locale:
C
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] KernSmooth_2.23-3 digest_0.4.2 forecast_2.19 fracdiff_1.3-2
[5] tseries_0.10-12 zoo_1.4-0 quadprog_1.4-11 glmnet_1.7
[9] Matrix_0.999375-30 biglm_0.7 DBI_0.2-4 inline_0.3.7
[13] XML_2.6-0 timeSeries_2110.86 timeDate_2110.87 RODBC_1.3-2
[17] reshape_0.8.3 plyr_0.1.9 MASS_7.2-48 nnet_7.2-48
[21] latticeExtra_0.5-1 RColorBrewer_1.0-2 lattice_0.17-25 gsubfn_0.5-7
[25] proto_0.3-8 multicore_0.1-3 data.table_1.4.1
loaded via a namespace (and not attached):
[1] grid_2.9.2 tools_2.9.2
Many thanks for any help or pointers!
+ Nicolas Chapados
More information about the R-help
mailing list