[R] predict: remove columns with new levels automatically

David Winsemius dwinsemius at comcast.net
Wed Nov 25 15:04:51 CET 2009


On Nov 25, 2009, at 1:48 AM, Andreas Wittmann wrote:

> Sorry for my bad description, i don't want get a constructed  
> algorithm without own work. i only hoped to get some advice how to  
> do this. i don't want to predict any sort of data, i reference only  
> to newdata which variables are the same as in the model data. But if  
> factors in the data than i can by possibly that the newdata has a  
> level which doesn't exist in the original data.
> So i have to compare each factor in the data and in the newdata and  
> if the newdata has a levels which is not in the original data and  
> drop this variable and do compute the model and prediction again.
> I thought this problem is quite common and i can use an algorithm  
> somebody has already implemented.
>
> best regards
>
> Andreas
>
If you use str to look at the lm1 object you will find at the bottom a  
list called "x":

lm1$x will show you the factors that were present in variables at the  
time of the model creation
 > lm1$x
$z
[1] "A" "B" "C"

New testing scenario good level and bad level:

test <- data.frame(x=t<-rnorm(2), y=t+rnorm(2), z=c("B", "D") )
  lm1 <- lm(x ~ ., data=training)
  predict(lm1, subset(test, z %in% lm1$x$z) )  # get prediction for  
good level only
         1
0.4225204

>
>
>
> -------- Original-Nachricht --------
>> Datum: Wed, 25 Nov 2009 00:48:59 -0500
>> Von: David Winsemius <dwinsemius at comcast.net>
>> An: Andreas Wittmann <andreas_wittmann at gmx.de>
>> CC: r-help at r-project.org
>> Betreff: Re: [R] predict: remove columns with new levels  
>> automatically
>
>>
>> On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:
>>
>>> Dear R-users,
>>>
>>> in the follwing thread
>>>
>>> http://tolstoy.newcastle.edu.au/R/help/03b/3322.html
>>>
>>> the problem how to remove rows for predict that contain levels which
>>> are not in the model.
>>>
>>> now i try to do this the other way round and want to remove columns
>>> (variables) in the model which will be later problematic with new
>>> levels for prediction.
>>>
>>> ## example:
>>> set.seed(0)
>>> x <- rnorm(9)
>>> y <- x + rnorm(9)
>>>
>>> training <- data.frame(x=x, y=y, z=c(rep("A", 3), rep("B", 3),
>>> rep("C", 3)))
>>> test <- data.frame(x=t<-rnorm(1), y=t+rnorm(1), z="D")
>>>
>>> lm1 <- lm(x ~ ., data=training)
>>> ## prediction does not work because the variable z has the new level
>>> "D"
>>> predict(lm1, test)
>>>
>>> ## solution: the variable z is removed from the model
>>> ## the prediction happens without using the information of  
>>> variable z
>>> lm2 <- lm(x ~ y, data=training)
>>> predict(lm2, test)
>>>
>>> How can i autmatically recognice this and calculate according to  
>>> this?
>>
>> Let me get this straight. You want us to predict in advance (or more
>> accurately design an algorithm that can see into the future and work
>> around) any sort of newdata you might later construct????
>>
>> --
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>
> -- 
> Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
> http://portal.gmx.net/de/go/dsl02

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list