[R] generating an expression for a formula automatically

Marc Schwartz (via MN) mschwartz at mn.rr.com
Thu Aug 24 23:39:19 CEST 2006


On Thu, 2006-08-24 at 14:01 -0700, Maria Montez wrote:
> Hi!
> 
> I would like to be able to create formulas automatically. For example, I 
> want to be able to create a function that takes on two values: resp and 
> x, and then creates the proper formula to regress resp on x.
> 
> My code:
> 
> fit.main <- function(resp,x) {
>  form <- expression(paste(resp," ~ ",paste(x,sep="",collapse=" + "),sep=""))
>   z <- lm(eval(form))
>  z
> }
> main <- fit.main("y",c("x1","x2","x3","x4"))
> 
> and I get this error:
> Error in terms.default(formula, data = data) :
>         no terms component
> 
> Any suggestions?
> 
> Thanks, Maria

See the last example in ?as.formula:

BTW, I would pay note to the ability to use subset()'s of data frames in
model functions. For example, let's say that your data frame above is
called DF and contains columns 'y' and then 'x1' through 'x50' in
sequence. However, you only want to use the columns you have indicated
in your code above.  You can then do:

  lm(y ~ ., data = subset(DF, select = y:x4))

The use of the '.' on the RHS of the formula indicates to use all other
columns besides the response column in the formula.  In the subset()
function, you can specify a sequential group of columns using the ':'
operator.

For a specific example, let's use the iris data set, which has columns:

> names(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"
[5] "Species"

We want to use 'Sepal.Length' as the response variable and then all
columns, other than 'Species', as terms:

> lm(Sepal.Length ~ ., data = subset(iris, select = -Species))

Call:
lm(formula = Sepal.Length ~ ., data = subset(iris, select = -Species))

Coefficients:
 (Intercept)   Sepal.Width  Petal.Length   Petal.Width
      1.8560        0.6508        0.7091       -0.5565


In this case, I excluded the Species columns by using the '-' before the
column name.  However, I could have easily used:

> lm(Sepal.Length ~ ., 
     data = subset(iris, select = Sepal.Length:Petal.Width))

Call:
lm(formula = Sepal.Length ~ ., data = subset(iris, select =
Sepal.Length:Petal.Width))

Coefficients:
 (Intercept)   Sepal.Width  Petal.Length   Petal.Width
      1.8560        0.6508        0.7091       -0.5565



See ?subset for additional information.

HTH,

Marc Schwartz



More information about the R-help mailing list