[R] Is this a valid syntax for lm()

Wed Nov 12 18:32:07 CET 2025

Às 17:12 de 12/11/2025, Rui Barradas escreveu:
> Às 16:30 de 12/11/2025, Brian Smith escreveu:
>> Hi,
>>
>> I have below code
>>
>> ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
>> trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
>> group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
>> group1 <- head(gl(2, 10, 22, labels = c("Ctl1","Trt1")), 20)
>> weight <- c(ctl, trt)
>> dat = as.data.frame(cbind(weight, group, group1))
>> lm.D9 <- lm(weight ~ group * group1 - 1 - group1, dat)
>>
>> I want to incorporate interaction between 2 variables group and
>> group1, however do not want to incorporate level-0 for group1 not the
>> intercept.
>>
>> Therefore I used (-1 - group1) in the formula.
>>
>> I would like to know if above is a valid syntax for the stated model.
>>
>> Thanks and regards,
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide https://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> Yes, that syntax is valid. But isn't
> 
> lm.D9b <- lm(weight ~ 0 + group + group:group1, dat)
> 
> 
> more readable?
> 
> You can check that the two models are the same with
> 
> 
> summary(lm.D9)
> summary(lm.D9b)
> 
> 
> This will tell where the objects returned by those two calls to lm() are 
> different, giving further arguments to prefer model lm.D9b.
> 
> 
> all.equal(lm.D9, lm.D9b, check.attributes = FALSE)
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,

Sorry for my hasty post, there is another problem with your code.
The dat creation code is wrong:

dat = as.data.frame(cbind(weights, group, group1))

first creates a matrix with cbind then coerces the matrix to data.frame. 
The error is in creating a matrix. Matrices can only have one data class 
so all variables become numeric and the factors group and group1 are no 
longer factors.

This error will impact everything that follows.

The correct way is to use data.frame(weights, group, group1). See the 
code below. The models coefficients, s.e's, etc are different. And so 
are the predictions from the models.

ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
group1 <- head(gl(2, 10, 22, labels = c("Ctl1","Trt1")), 20)
weight <- c(ctl, trt)

wrong_dat <- as.data.frame(cbind(weight, group, group1))
right_dat <- data.frame(weight, group, group1)
str(wrong_dat)
#> 'data.frame':    20 obs. of  3 variables:
#>  $ weight: num  4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
#>  $ group : num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ group1: num  1 1 1 1 1 1 1 1 1 1 ...
str(right_dat)
#> 'data.frame':    20 obs. of  3 variables:
#>  $ weight: num  4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
#>  $ group : Factor w/ 2 levels "Ctl","Trt": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ group1: Factor w/ 2 levels "Ctl1","Trt1": 1 1 1 1 1 1 1 1 1 1 ...

wrong_lm.D9 <- lm(weight ~ group * group1 - 1 - group1, wrong_dat)
right_lm.D9 <- lm(weight ~ group * group1 - 1 - group1, right_dat)
summary(wrong_lm.D9)
#>
#> Call:
#> lm(formula = weight ~ group * group1 - 1 - group1, data = wrong_dat)
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -1.0710 -0.4938  0.0685  0.2462  1.3690
#>
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)
#> group          7.7335     0.4540   17.04 1.51e-12 ***
#> group:group1  -2.7015     0.2462  -10.97 2.10e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.6964 on 18 degrees of freedom
#> Multiple R-squared:  0.9818, Adjusted R-squared:  0.9798
#> F-statistic: 485.1 on 2 and 18 DF,  p-value: < 2.2e-16
summary(right_lm.D9)
#>
#> Call:
#> lm(formula = weight ~ group * group1 - 1 - group1, data = right_dat)
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -1.0710 -0.4938  0.0685  0.2462  1.3690
#>
#> Coefficients: (2 not defined because of singularities)
#>                     Estimate Std. Error t value Pr(>|t|)
#> groupCtl              5.0320     0.2202   22.85 9.55e-15 ***
#> groupTrt              4.6610     0.2202   21.16 3.62e-14 ***
#> groupCtl:group1Trt1       NA         NA      NA       NA
#> groupTrt:group1Trt1       NA         NA      NA       NA
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.6964 on 18 degrees of freedom
#> Multiple R-squared:  0.9818, Adjusted R-squared:  0.9798
#> F-statistic: 485.1 on 2 and 18 DF,  p-value: < 2.2e-16

# generate data for predict()
g <- gl(2, 1, labels = c("Ctl","Trt"))
g1 <- gl(2, 1, labels = c("Ctl1","Trt1"))
# wrong_new must be coerced to numeric
wrong_new <- expand.grid(group = g, group1 = g1)
wrong_new[] <- lapply(wrong_new, as.numeric)
# keep right_new as factors
right_new <- expand.grid(group = g, group1 = g1)

predict(wrong_lm.D9, newdata = wrong_new)
#>       1       2       3       4
#>  5.0320 10.0640  2.3305  4.6610
predict(right_lm.D9, newdata = right_new)
#> Warning in predict.lm(right_lm.D9, newdata = right_new): prediction from
#> rank-deficient fit; attr(*, "non-estim") has doubtful cases
#>     1     2     3     4
#> 5.032 4.661 5.032 4.661
#> attr(,"non-estim")
#> 2 3
#> 2 3

Hope this helps,

Rui Barradas