[R] User defined function within a formula
Kunshan Yin
yinkunshan at gmail.com
Fri Jul 17 17:29:06 CEST 2015
Thank you very much. It worked. I think I need to digest this further
later. Thanks again for the help.
On Thu, Jul 16, 2015 at 4:51 PM, William Dunlap <wdunlap at tibco.com> wrote:
> This might do what you want:
>
> OPoly <- function(x, degree=1, weight=1, coefs=NULL, rangeX=NULL){
> weight <- round(weight,0)# weight need to be integer
> if(length(weight)!=length(x)) {
> weight <- rep(1,length(x))
> }
> if (is.null(rangeX)) {
> rangeX <- range(x)
> }
> p <- poly(4*(rep(x,weight)-mean(rangeX))/diff(rangeX), degree=degree,
> coefs=coefs)
> # why t(t(...))? That strips the attributes.
> Z <- t( t(p[cumsum(weight),]) * sqrt(attr(p,"coefs")$norm2[-seq(2)]) )[,
> degree, drop=FALSE]
> class(Z) <- "OPoly"
> attr(Z, "coefs") <- attr(p, "coefs")
> attr(Z, "rangeX") <- rangeX
> Z
> }
>
> makepredictcall.OPoly<-function(var,call)
> {
> if (is.call(call)) {
> if (identical(call[[1]], quote(OPoly))) {
> if (!is.null(tmp <- attr(var, "coefs"))) {
> call$coefs <- tmp
> }
> if (!is.null(tmp <- attr(var, "rangeX"))) {
> call$rangeX <- tmp
> }
> call$weight <- 1 # weight not relevant in predictions
> }
> }
> call
> }
>
> d <- data.frame(Y=1:8, X=log(1:8), Weight=1:8)
> fit <- lm(data=d, Y ~ OPoly(X, degree=2, weight=Weight))
> predict(fit)[c(3,8)]
> predict(fit, newdata=data.frame(X=d$X[c(3,8)])) # same result
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 16, 2015 at 4:39 PM, William Dunlap <wdunlap at tibco.com> wrote:
>
>> OPoly<-function(x,degree=1,weight=1){
>> weight=round(weight,0)# weight need to be integer
>> if(length(weight)!=length(x))weight=rep(1,length(x))
>> p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
>> Z<-(t(t(p[cumsum(weight),])*sqrt(attr(p,"coefs")$norm2[-
>> seq(2)]))[,degree])
>> class(Z)<-"OPoly";Z
>> }
>>
>> You need to make OPoly to have optional argument(s) that give
>> the original-regressor-dependent information to OPoly and then
>> have it return, as attributes, the value of those arguments.
>> makepredictcall
>> will take the attributes and attach them to the call in predvars so
>> predict uses values derived from the original regressors, not value
>> derived
>> from the data to be predicted from.
>>
>> Take a look at a pair like makepredictcall.scale() and scale() for an
>> example:
>> scale has optional arguments 'center' and 'scale' that it returns as
>> attributes
>> and makepredictcall.scale adds those to the call to scale that it is
>> given.
>> Thus when you predict, the scale and center arguments come from the
>> original data, not from the data you are predicting from.
>>
>>
>>
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Thu, Jul 16, 2015 at 3:43 PM, Kunshan Yin <yinkunshan at gmail.com>
>> wrote:
>>
>>> Thanks Bill for your quick reply.
>>>
>>> I tried your solution and it did work for the simple user defined
>>> function xploly. But when I try with other function, it gave me error again:
>>>
>>> OPoly<-function(x,degree=1,weight=1){
>>> weight=round(weight,0)# weight need to be integer
>>> if(length(weight)!=length(x))weight=rep(1,length(x))
>>> p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
>>>
>>> Z<-(t(t(p[cumsum(weight),])*sqrt(attr(p,"coefs")$norm2[-seq(2)]))[,degree])
>>> class(Z)<-"OPoly";Z
>>> }
>>>
>>> ##this OPoly is an FORTRAN orthogonal polynomial routine, it first maps
>>> the x to range[-2,2] then do QR, then return the results with sqrt(norm2).
>>> Comparing with poly, this transformation will make the model coefficients
>>> within a similar range as other variables, the R poly routine will usually
>>> give you a very large coefficients. I did not find such routine in R, so I
>>> have to define this as user defined function.
>>> #######
>>>
>>> I also have following function as you suggested:
>>>
>>> makepredictcall.OPoly<-function(var,call)
>>> {
>>> if (is.call(call)) {
>>> if (identical(call[[1]], quote(OPoly))) {
>>> if (!is.null(tmp <- attr(var, "coefs"))) {
>>> call$coefs <- tmp
>>> }
>>> }
>>> }
>>> call
>>> }
>>>
>>>
>>> But I still got error for following:
>>>
>>> > g3=glm(lot1 ~ log(u) + OPoly(u,1), data = clotting, family = Gamma)
>>>
>>> > predict(g3,dc)Error in poly(4 * (rep(x, weight) - mean(range(x)))/diff(range(x)), degree) :
>>> missing values are not allowed in 'poly'
>>>
>>> I thought it might be due to the /diff(range(x) in the function. But
>>> even I remove that part, it will still give me error. Any idea?
>>>
>>> Many thanks in advance.
>>>
>>> Alex
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 16, 2015 at 2:09 PM, William Dunlap <wdunlap at tibco.com>
>>> wrote:
>>>
>>>> Read about the 'makepredictcall' generic function. There is a method,
>>>> makepredictcall.poly(), for poly() that attaches the polynomial
>>>> coefficients
>>>> used during the fitting procedure to the call to poly() that predict()
>>>> makes.
>>>> You ought to supply a similar method for your xpoly(), and xpoly()
>>>> needs to return an object of a a new class that will cause that method to
>>>> be used.
>>>>
>>>> E.g.,
>>>>
>>>> xpoly <- function(x,degree=1,...){ ret <- poly(x,degree=degree,...);
>>>> class(ret) <- "xpoly" ; ret }
>>>> makepredictcall.xpoly <- function (var, call)
>>>> {
>>>> if (is.call(call)) {
>>>> if (identical(call[[1]], quote(xpoly))) {
>>>> if (!is.null(tmp <- attr(var, "coefs"))) {
>>>> call$coefs <- tmp
>>>> }
>>>> }
>>>> }
>>>> call
>>>> }
>>>>
>>>> g2 <- glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
>>>> predict(g2,dc)
>>>> # 1 2 3 4
>>>> 5
>>>> #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
>>>> #-0.01398928608
>>>> # 6 7 8 9
>>>> #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
>>>>
>>>> You can see the effects of makepredictcall() in the 'terms' component
>>>> of glm's output. The 'variables' attribute of it gives the original
>>>> function
>>>> calls and the 'predvars' attribute gives the calls to be used for
>>>> prediction:
>>>> > attr(g2$terms, "variables")
>>>> list(lot1, log(u), xpoly(u, 1))
>>>> > attr(g2$terms, "predvars")
>>>> list(lot1, log(u), xpoly(u, 1, coefs = list(alpha = 40, norm2 = c(1,
>>>> 9, 8850))))
>>>>
>>>>
>>>>
>>>> Bill Dunlap
>>>> TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>> On Thu, Jul 16, 2015 at 12:35 PM, Kunshan Yin <yinkunshan at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello, I have a question about the formula and the user defined
>>>>> function:
>>>>>
>>>>> I can do following:
>>>>> ###Case 1:
>>>>> > clotting <- data.frame(
>>>>> + u = c(5,10,15,20,30,40,60,80,100),
>>>>> + lot1 = c(118,58,42,35,27,25,21,19,18),
>>>>> + lot2 = c(69,35,26,21,18,16,13,12,12))
>>>>> > g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
>>>>> > dc=clotting
>>>>> > dc$u=1
>>>>> > predict(g1,dc)
>>>>> 1 2 3 4 5
>>>>> 6 7 8 9
>>>>> -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929
>>>>> -0.01398929 -0.01398929 -0.01398929
>>>>>
>>>>> However, if I just simply wrap the poly as a user defined function ( in
>>>>> reality I would have my own more complex function) then I will get
>>>>> error:
>>>>> ###Case 2:
>>>>> > xpoly<-function(x,degree=1){poly(x,degree)}
>>>>> > g2=glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
>>>>> > predict(g2,dc)
>>>>> Error in poly(x, degree) :
>>>>> 'degree' must be less than number of unique points
>>>>>
>>>>> It seems that the predict always treat the user defined function in the
>>>>> formula with I(). My question is how can I get the results for Case2
>>>>> same
>>>>> as case1?
>>>>>
>>>>> Anyone can have any idea about this?
>>>>>
>>>>> Thank you very much.
>>>>>
>>>>> Alex
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>
>>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list