[R] vglm(), t values and p values

Wed Nov 4 19:40:55 CET 2009

On Wed, 2009-11-04 at 18:19 +0000, Federico Calboli wrote:
> On 4 Nov 2009, at 18:11, Gavin Simpson wrote:
> > Is there a particular reason for choosing a VGLM here? My reading of
> > your post suggests the response is an univariate, ordered factor and
> > VGLMs are especially for multivariate responses. In which case, can  
> > you
> > not use polr() in package MASS that comes with R or the lms() function
> > in the rms package (available from CRAN).
> 
> I have used polr() but:
> 
> 1) seems to be even more reluctant to give me at least the t value for  
> my marker: for the very same data I used to understand what was going  
> on vglm does give a t value for x (my marker), polr does not. As much  
> as my data is *dire* I still need to use it and get as many p-values  
> as I can (ideally one for each marker). The goodness of the model is  
> going to be assessed, perversely, on the whole sheabang of p-values  
> and  their aderence to a null distribution.
> 
> Additionally, while extracting the t value is a piece of cake with  
> polr(), the p-value I get a nowhere close to a null distribution.

Yes - I see that polr() also doesn't produce p-values in the output from
summary. You can use it to get "a" p-value for x if you use a LRT; fit
the model using polr() with and without 'x' and then supply both fitted
models to the anova() method for polr objects.

> 
> I will try lms() and hope for the best.

Sorry, I meant the lrm() function in package rms

G

> >
> > I haven't really used either of these functions in earnest, but one or
> > both may provide the p-values you desire, out of the box.
> 
> I hope so!
> 
> Thanks,
> 
> F
> 
> 
> >
> > HTH
> >
> > G
> >
> >>
> >> My response variable is the severity of diseases, going from 0 to 5  
> >> (the
> >> severity is actually an ordered factor).
> >>
> >> The independent variables are: 1 genetic marker, time of medical  
> >> observation,
> >> age, sex. What I *need* is a p-value for the genetic marker.  
> >> Because I have ~1.5
> >> million markers I'd rather not faffing around too much.
> >>
> >> My model is:
> >>
> >>> mod.vglm = vglm(disease.status ~ x + time + age + sex, family =
> >> cumulative(par = T))
> >>
> >> where x is my genetic marker, coded as 0/1/2, time is days of  
> >> medical observation.
> >>
> >>> summary(mod.vglm) works:
> >>
> >> Call:
> >> vglm(formula = disease.status ~ x + time + age + sex, family =  
> >> cumulative(par = T))
> >>
> >> Pearson Residuals:
> >>                    Min       1Q   Median       3Q     Max
> >> logit(P[Y<=1]) -0.6642 -0.28704 -0.18329 -0.11681  3.8919
> >> logit(P[Y<=2]) -2.5580 -0.48080 -0.23315  0.47388  2.5983
> >> logit(P[Y<=3]) -2.1565 -0.56961  0.22089  0.44349 10.7964
> >> logit(P[Y<=4]) -3.3175  0.13064  0.20117  0.43176 12.5233
> >>
> >> Coefficients:
> >>                     Value Std. Error  t value
> >> (Intercept):1 -2.4460e+00 4.2791e-01  -5.7162
> >> (Intercept):2 -7.1078e-01 4.1628e-01  -1.7074
> >> (Intercept):3  3.7619e-01 4.1545e-01   0.9055
> >> (Intercept):4  1.7467e+00 4.2092e-01   4.1496
> >> x              4.1421e-01 1.9762e-01   2.0959
> >> time          -3.6021e-04 3.0387e-05 -11.8540
> >> age           -2.6115e-05 9.2504e-06  -2.8232
> >> sexM           1.0188e-01 1.2491e-01   0.8156
> >>
> >> Number of linear predictors:  4
> >>
> >> Names of linear predictors:
> >> logit(P[Y<=1]), logit(P[Y<=2]), logit(P[Y<=3]), logit(P[Y<=4])
> >>
> >> Dispersion Parameter for cumulative family:   1
> >>
> >> Residual Deviance: 2475.937 on 3460 degrees of freedom
> >>
> >> Log-likelihood: -1237.969 on 3460 degrees of freedom
> >>
> >> #######################
> >>
> >> So here are my questions:
> >>
> >> 1) I need to get the t value for x, so I can use "1 - pt(tvalue,1)"  
> >> to find some
> >> sort of probability value for x. That's not trivial. Additionally,  
> >> I assume df
> >> for x is 1, hence I plan to use  "1 - pt(tvalue,1)", though I might  
> >> well be
> >> wrong. In any case getting the darned t value seems impossible
> >>
> >> 2) because of the difficulty of getting (1), it there a way of  
> >> getting vglm() to
> >> spit out a p-value for x please?
> >>
> >> I do recon many people might scoff at my crass desire for a p- 
> >> value, but I'm
> >> dealing with some dire phenotype in a whole genome analysis where  
> >> the *only*
> >> thing that matters are p-values. I have to be quite unsophysticated  
> >> I'm afraid.
> >>
> >> Best,
> >>
> >> Federico
> >>
> >>
> > -- 
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> > ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> > Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> > Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> > UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> 
> --
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St. Mary's Campus
> Norfolk Place, London W2 1PG
> 
> Tel +44 (0)20 75941602   Fax +44 (0)20 75943193
> 
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
> 
> 
> 
> 
> 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%