[R] generating an expression for a formula automatically
Marc Schwartz (via MN)
mschwartz at mn.rr.com
Thu Aug 24 23:39:19 CEST 2006
On Thu, 2006-08-24 at 14:01 -0700, Maria Montez wrote:
> Hi!
>
> I would like to be able to create formulas automatically. For example, I
> want to be able to create a function that takes on two values: resp and
> x, and then creates the proper formula to regress resp on x.
>
> My code:
>
> fit.main <- function(resp,x) {
> form <- expression(paste(resp," ~ ",paste(x,sep="",collapse=" + "),sep=""))
> z <- lm(eval(form))
> z
> }
> main <- fit.main("y",c("x1","x2","x3","x4"))
>
> and I get this error:
> Error in terms.default(formula, data = data) :
> no terms component
>
> Any suggestions?
>
> Thanks, Maria
See the last example in ?as.formula:
BTW, I would pay note to the ability to use subset()'s of data frames in
model functions. For example, let's say that your data frame above is
called DF and contains columns 'y' and then 'x1' through 'x50' in
sequence. However, you only want to use the columns you have indicated
in your code above. You can then do:
lm(y ~ ., data = subset(DF, select = y:x4))
The use of the '.' on the RHS of the formula indicates to use all other
columns besides the response column in the formula. In the subset()
function, you can specify a sequential group of columns using the ':'
operator.
For a specific example, let's use the iris data set, which has columns:
> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
[5] "Species"
We want to use 'Sepal.Length' as the response variable and then all
columns, other than 'Species', as terms:
> lm(Sepal.Length ~ ., data = subset(iris, select = -Species))
Call:
lm(formula = Sepal.Length ~ ., data = subset(iris, select = -Species))
Coefficients:
(Intercept) Sepal.Width Petal.Length Petal.Width
1.8560 0.6508 0.7091 -0.5565
In this case, I excluded the Species columns by using the '-' before the
column name. However, I could have easily used:
> lm(Sepal.Length ~ .,
data = subset(iris, select = Sepal.Length:Petal.Width))
Call:
lm(formula = Sepal.Length ~ ., data = subset(iris, select =
Sepal.Length:Petal.Width))
Coefficients:
(Intercept) Sepal.Width Petal.Length Petal.Width
1.8560 0.6508 0.7091 -0.5565
See ?subset for additional information.
HTH,
Marc Schwartz
More information about the R-help
mailing list