[R] User defined function within a formula

Fri Jul 17 17:29:06 CEST 2015

Thank you very much. It worked. I think I need to digest this further
later. Thanks again for the help.

On Thu, Jul 16, 2015 at 4:51 PM, William Dunlap <wdunlap at tibco.com> wrote:

> This might do what you want:
>
> OPoly <- function(x, degree=1, weight=1, coefs=NULL, rangeX=NULL){
>   weight <- round(weight,0)# weight need to be integer
>   if(length(weight)!=length(x)) {
>     weight <- rep(1,length(x))
>   }
>   if (is.null(rangeX)) {
>       rangeX <- range(x)
>   }
>   p <- poly(4*(rep(x,weight)-mean(rangeX))/diff(rangeX), degree=degree,
> coefs=coefs)
>   # why t(t(...))?  That strips the attributes.
>   Z <- t( t(p[cumsum(weight),]) * sqrt(attr(p,"coefs")$norm2[-seq(2)]) )[,
> degree, drop=FALSE]
>   class(Z) <- "OPoly"
>   attr(Z, "coefs") <- attr(p, "coefs")
>   attr(Z, "rangeX") <- rangeX
>   Z
> }
>
> makepredictcall.OPoly<-function(var,call)
> {
>   if (is.call(call)) {
>     if (identical(call[[1]], quote(OPoly))) {
>       if (!is.null(tmp <- attr(var, "coefs"))) {
>         call$coefs <- tmp
>       }
>       if (!is.null(tmp <- attr(var, "rangeX"))) {
>         call$rangeX <- tmp
>       }
>       call$weight <- 1 # weight not relevant in predictions
>     }
>   }
>   call
> }
>
> d <- data.frame(Y=1:8, X=log(1:8), Weight=1:8)
> fit <- lm(data=d, Y ~ OPoly(X, degree=2, weight=Weight))
> predict(fit)[c(3,8)]
> predict(fit, newdata=data.frame(X=d$X[c(3,8)])) # same result
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 16, 2015 at 4:39 PM, William Dunlap <wdunlap at tibco.com> wrote:
>
>> OPoly<-function(x,degree=1,weight=1){
>>   weight=round(weight,0)# weight need to be integer
>>   if(length(weight)!=length(x))weight=rep(1,length(x))
>>   p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
>>   Z<-(t(t(p[cumsum(weight),])*sqrt(attr(p,"coefs")$norm2[-
>> seq(2)]))[,degree])
>>   class(Z)<-"OPoly";Z
>> }
>>
>> You need to make OPoly to have optional argument(s) that give
>> the original-regressor-dependent information to OPoly and then
>> have it return, as attributes, the value of those arguments.
>>  makepredictcall
>> will take the attributes and attach them to the call in predvars so
>> predict uses values derived from the original regressors, not value
>> derived
>> from the data to be predicted from.
>>
>> Take a look at a pair like makepredictcall.scale() and scale() for an
>> example:
>> scale has optional arguments 'center' and 'scale' that it returns as
>> attributes
>> and makepredictcall.scale adds those to the call to scale that it is
>> given.
>> Thus when you predict, the scale and center arguments come from the
>> original data, not from the data you are predicting from.
>>
>>
>>
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Thu, Jul 16, 2015 at 3:43 PM, Kunshan Yin <yinkunshan at gmail.com>
>> wrote:
>>
>>> Thanks Bill for your quick reply.
>>>
>>> I tried your solution and it did work for the simple user defined
>>> function xploly. But when I try with other function, it gave me error again:
>>>
>>> OPoly<-function(x,degree=1,weight=1){
>>>   weight=round(weight,0)# weight need to be integer
>>>   if(length(weight)!=length(x))weight=rep(1,length(x))
>>>   p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
>>>
>>> Z<-(t(t(p[cumsum(weight),])*sqrt(attr(p,"coefs")$norm2[-seq(2)]))[,degree])
>>>   class(Z)<-"OPoly";Z
>>> }
>>>
>>> ##this OPoly is an FORTRAN orthogonal polynomial routine, it first maps
>>> the x to range[-2,2] then do QR, then return the results with sqrt(norm2).
>>> Comparing with poly, this transformation will make the model coefficients
>>> within a similar range as other variables, the R poly routine will usually
>>> give you a very large coefficients. I did not find such routine in R, so I
>>> have to define this as user defined function.
>>> #######
>>>
>>> I  also have following function as you suggested:
>>>
>>> makepredictcall.OPoly<-function(var,call)
>>> {
>>>   if (is.call(call)) {
>>>     if (identical(call[[1]], quote(OPoly))) {
>>>       if (!is.null(tmp <- attr(var, "coefs"))) {
>>>         call$coefs <- tmp
>>>       }
>>>     }
>>>   }
>>>   call
>>> }
>>>
>>>
>>> But I still got error for following:
>>>
>>> > g3=glm(lot1 ~ log(u) + OPoly(u,1), data = clotting, family = Gamma)
>>>
>>> > predict(g3,dc)Error in poly(4 * (rep(x, weight) - mean(range(x)))/diff(range(x)), degree) :
>>>   missing values are not allowed in 'poly'
>>>
>>> I thought it might be due to the /diff(range(x) in the function.  But
>>> even I remove that part, it will still give me error. Any idea?
>>>
>>> Many thanks in advance.
>>>
>>> Alex
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 16, 2015 at 2:09 PM, William Dunlap <wdunlap at tibco.com>
>>> wrote:
>>>
>>>> Read about the 'makepredictcall' generic function.  There is a method,
>>>> makepredictcall.poly(), for poly() that attaches the polynomial
>>>> coefficients
>>>> used during the fitting procedure to the call to poly() that predict()
>>>> makes.
>>>> You ought to supply a similar method for your xpoly(), and xpoly()
>>>> needs to return an object of a a new class that will cause that method to
>>>> be used.
>>>>
>>>> E.g.,
>>>>
>>>> xpoly <- function(x,degree=1,...){ ret <- poly(x,degree=degree,...);
>>>> class(ret) <- "xpoly" ; ret }
>>>> makepredictcall.xpoly <- function (var, call)
>>>> {
>>>>     if (is.call(call)) {
>>>>         if (identical(call[[1]], quote(xpoly))) {
>>>>             if (!is.null(tmp <- attr(var, "coefs"))) {
>>>>                 call$coefs <- tmp
>>>>             }
>>>>         }
>>>>     }
>>>>     call
>>>> }
>>>>
>>>> g2 <- glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
>>>> predict(g2,dc)
>>>> #             1              2              3              4
>>>>    5
>>>> #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
>>>> #-0.01398928608
>>>> #             6              7              8              9
>>>> #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
>>>>
>>>> You can see the effects of makepredictcall() in the 'terms' component
>>>> of glm's output.  The 'variables' attribute of it gives the original
>>>> function
>>>> calls and the 'predvars' attribute gives the calls to be used for
>>>> prediction:
>>>>    > attr(g2$terms, "variables")
>>>>    list(lot1, log(u), xpoly(u, 1))
>>>>   > attr(g2$terms, "predvars")
>>>>   list(lot1, log(u), xpoly(u, 1, coefs = list(alpha = 40, norm2 = c(1,
>>>>   9, 8850))))
>>>>
>>>>
>>>>
>>>> Bill Dunlap
>>>> TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>> On Thu, Jul 16, 2015 at 12:35 PM, Kunshan Yin <yinkunshan at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello, I have a question about the formula and the user defined
>>>>> function:
>>>>>
>>>>> I can do following:
>>>>> ###Case 1:
>>>>> > clotting <- data.frame(
>>>>> +     u = c(5,10,15,20,30,40,60,80,100),
>>>>> +     lot1 = c(118,58,42,35,27,25,21,19,18),
>>>>> +     lot2 = c(69,35,26,21,18,16,13,12,12))
>>>>> > g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
>>>>> > dc=clotting
>>>>> > dc$u=1
>>>>> > predict(g1,dc)
>>>>>           1           2           3           4           5
>>>>> 6           7           8           9
>>>>> -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929
>>>>> -0.01398929 -0.01398929 -0.01398929
>>>>>
>>>>> However, if I just simply wrap the poly as a user defined function ( in
>>>>> reality I would have my own more complex function)  then I will get
>>>>> error:
>>>>> ###Case 2:
>>>>> > xpoly<-function(x,degree=1){poly(x,degree)}
>>>>> > g2=glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
>>>>> > predict(g2,dc)
>>>>> Error in poly(x, degree) :
>>>>>   'degree' must be less than number of unique points
>>>>>
>>>>> It seems that the predict always treat the user defined function in the
>>>>> formula with I().  My question is how can I get the  results for Case2
>>>>> same
>>>>> as case1?
>>>>>
>>>>> Anyone can have any idea about this?
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> Alex
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>
>>
>

	[[alternative HTML version deleted]]