[R] Post-hoc tests in MASS using glm.nb

Timothy Bates timothy.c.bates at gmail.com
Tue May 17 11:22:51 CEST 2011


Dear Bryony: the suggestion was not to change the name of the data object, but to explicitly tell glm.nb what dataset it should look in to find the variables you mention in the formula.

so the salient difference is:

m1 <- glm.nb(Cells ~ Cryogel*Day, data = side)

instead of

attach(side)
m1 <- glm.nb(Cells ~ Cryogel*Day)

This works for other functions also, but not uniformly as yet (how I wish it did and I could say 
   hist(x, data=side)
Instead of  
   hist(side$x)

this inconsistency encourages the need for attach()

best, tim

On 17 May 2011, at 9:09 AM, Bryony Tolhurst wrote:

> Dear Bill
> 
> Many thanks. I will try this.
> 
> One question: why is the attach()function problematic? I have always done it that way (well in my very limited R-using capacity!) as dictated by 'Statistics, an Introduction using R' by Michael Crawley. My dataset is called Side ('Side.txt') so how do I import this data without using attach(data)? I have tried:
> 
> side<-read.table('Side.txt',T)
> attach(side)
> 
> instead of:
> 
> data<-read.table('Side.txt',T) 
> attach(data)
> 
> But obviously I am still using the attach function, if not with 'data'!!
> 
> Thanks again
> 
> Bryony Tolhurst
> 
> -----Original Message-----
> From: Bill.Venables at csiro.au [mailto:Bill.Venables at csiro.au] 
> Sent: 17 May 2011 03:21
> To: Bryony Tolhurst; r-help at r-project.org
> Subject: RE: [R] Post-hoc tests in MASS using glm.nb
> 
> ?relevel
> 
> Also, you might want to fit the models as follows
> 
> Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData)
> 
> myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2"))
> Model2 <- update(Model1, data = myData1) 
> 
> &c
> 
> You should always spedify the data set when you fit a model if at all possible.  I would recommend you NEVER use attach() to put it on the search path, (under all but the most exceptional circumstances).
> 
> You could fit your model as 
> 
> Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData)
> 
> This will give you the subclass means as the regression coefficients.  You can then use vcov(Model0) to get the variance matrix and compare any two you like using directly calculated t-statistics.  This is pretty straightforward as well.
> 
> Bill Venables.
> 
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of bryony
> Sent: Tuesday, 17 May 2011 3:46 AM
> To: r-help at r-project.org
> Subject: [R] Post-hoc tests in MASS using glm.nb
> 
> I am struggling to generate p values for comparisons of levels (post-hoc
> tests) in a glm with a negative binomial distribution
> 
> I am trying to compare cell counts on different days as grown on different media (e.g. types of cryogel) so I have 2 explanatory variables (Day and Cryogel), which are both factors, and an over-dispersed count variable (number of cells) as the response. I know that both variables are significant, and that there is a significant interaction between them.
> However, I seem unable to generate multiple comparisons between the days and cryogels. 
> 
> So my model is 
> 
> Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel)
> 
> The output gives me comparisons between levels of the factors relative to a reference level (Day 0 and Cryogel 1) as follows:
> 
> Coefficients:
>               Estimate Std. Error z value Pr(>|z|)    
> (Intercept)      1.2040     0.2743   4.389 1.14e-05 ***
> Day14            3.3226     0.3440   9.658  < 2e-16 ***
> Day28            3.3546     0.3440   9.752  < 2e-16 ***
> Day7             3.3638     0.3440   9.779  < 2e-16 ***
> Cryogel2         0.7097     0.3655   1.942  0.05215 .  
> Cryogel3         0.7259     0.3651   1.988  0.04677 *  
> Cryogel4         1.4191     0.3539   4.010 6.07e-05 ***
> Day14:Cryogel2  -0.7910     0.4689  -1.687  0.09162 .  
> Day28:Cryogel2  -0.5272     0.4685  -1.125  0.26053    
> Day7:Cryogel2   -1.1794     0.4694  -2.512  0.01199 *  
> Day14:Cryogel3  -1.0833     0.4691  -2.309  0.02092 *  
> Day28:Cryogel3   0.1735     0.4733   0.367  0.71395    
> Day7:Cryogel3   -1.0907     0.4690  -2.326  0.02003 *  
> Day14:Cryogel4  -1.2834     0.4655  -2.757  0.00583 ** 
> Day28:Cryogel4  -0.6300     0.4591  -1.372  0.16997    
> Day7:Cryogel4   -1.3436     0.4596  -2.923  0.00347 ** 
> 
> 
> HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc on each of the days. I realise that such multiple comparsions need to be approached with care to avoid Type 1 error, however it is easy to do this in other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to be difficult in R. I have tried the glht (multcomp) function but it gives me the same results. I assume that there is some way of entering the data differently so as to tell R to use a different reference level each time and re-run the analysis for each level, but don't know what this is.
> Please help!
> 
> Many thanks for your input
> 
> Bryony
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list