[R] predict() works with the design matrix but throws error with some rows of that matrix
peter dalgaard
pdalgd at gmail.com
Fri Nov 20 10:57:15 CET 2015
On 20 Nov 2015, at 10:07 , peter dalgaard <pdalgd at gmail.com> wrote:
>>
>> On 20 Nov 2015, at 04:53 , Damir Cosic <damir.cosic at gmail.com> wrote:
>>
>> Hello,
>>
>> I am having problems with predict() after a multinomial logit regression by
>> multinom(). I generate a design matrix with model.matrix() and use it to
>> estimate the model. Then, if I pass the entire design matrix to predict(),
>> it returns the same output as fitted(), which is expected. But if I pass
>> only a few rows of the design matrix, it throws this error:
>>
>> Error in model.frame.default(Terms, newdata, na.action = na.omit, xlev
>> = object$xlevels) : variable lengths differ (found for 'z') In addition:
>>
>> Warning message: 'newdata' had 6 rows but variables found have 15 rows
>
> Offhand (sorry, no time for testing things this morning) I suspect that you are mixing paradigms. You can _either_ multiply coefficients with a design matrix _or_ look up variables in a data frame, and I think you are trying to look up variables in a matrix. In particular, I don't expect mm to have a column called "z".
A little further thought later: The crux is that matrices will double as data frames in the newdata argument, but it will only work for numerical variables. Your model contains a numeric and a factor variable so it won't work, for two reasons:
> head(mm)
x ztall
1 -2.4963581 0
2 0.9895450 1
3 1.8755237 1
4 0.8911458 0
5 -2.1458457 1
6 0.6294571 1
I.e., there is no "z" column, but even if there were, it would mismatch with the model
> colnames(mm)[2] <- "z"
> p2<-predict(m,head(mm),"probs")
Error: variable 'z' was fitted with type "factor" but type "numeric" was supplied
In addition: Warning message:
In model.frame.default(Terms, newdata, na.action = na.omit, xlev = object$xlevels) :
variable 'z' is not a factor
At this point, newdata=mm will fail too, as I predicted. (Pun almost unintended.)
Notice that this works fine:
> p2<-predict(m,head(df),"probs")
> p2
good bad ugly
1 9.998923e-01 8.171733e-05 2.598999e-05
2 2.804424e-05 4.170214e-01 5.829506e-01
3 2.892377e-05 3.255878e-01 6.743832e-01
4 9.999315e-01 2.818836e-05 4.031584e-05
5 1.863116e-05 7.420142e-01 2.579672e-01
6 2.740359e-05 4.563101e-01 5.436625e-01
>
-pd
>
> Accordingly, neither of your examples actually work, both cases find z (and x?) in the global environment, it is just only in the latter example that the inconsistency is discovered. I think you want either to use model.frame or an explicit mm %*% coef(model) (or thereabouts).
>
>
>>
>> This is a minimal example:
>>
>> require(nnet)
>>
>> y<-factor(rep(c(1,2,3),5), levels=1:3, labels=c("good","bad","ugly"))
>> x<-rnorm(15)+.2*rep(1:3,5)
>> z<-factor(rep(c(1,2,2),5), levels=1:2, labels=c("short","tall"))
>>
>> df<-data.frame(y=y, x=x, z=z)
>> mm<-model.matrix(~x+z, data=df)[,2:3]
>> m<-multinom(y ~ x+z, data=df)
>>
>> p1<-predict(m,mm,"probs")
>>
>> p2<-predict(m,head(mm),"probs")
>>
>> My actual goal is out-of-sample prediction, but I could not make it work
>> and, while debugging it, I reduced it to this problem.
>>
>> Best,
>>
>> Damir
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list