[R] Simple lm/regression question

Mon Feb 6 11:36:50 CET 2012

On Feb 6, 2012, at 10:57 , Achim Zeileis wrote:

> On Mon, 6 Feb 2012, James Annan wrote:
> 
> 
> The summary() shows under "Residuals" the contributions to the objective function, i.e. sqrt(1/w) (y - x'b) in the notation above.
> 
> However, by using the residuals extractor function you can get the unweighted residuals:
> 
> residuals(lm(y~x,weights=c(.01,.01,.01,.01)))
> 
>> The uncertainties on the parameter estimates, however, have *not* changed, which seems very odd to me.
> 
> lm() interprets the weights as precision weights, not as case weights.
> 
> Thus, the scaling in the variances is done by the number of (non-zero) weights, not by the sum of weights.
> 
>> The behaviour of IDL is rather different and intuitive to me:
>> 
>> IDL> vec=linfit(x,y,sigma=sig,measure_errors=[1,1,1,1])
>> IDL> print,vec,sig
>>    -5.00000      5.00000
>>     1.22474     0.447214
>> 
>> IDL> vec=linfit(x,y,sigma=sig,measure_errors=[10,10,10,10])
>> IDL> print,vec,sig
>>    -5.00000      5.00000
>>     12.2474      4.47214
> 
> This appears to use sandwich standard errors. 

Actually, I think the issue is slightly different:  IDL assumes that the errors _are_ something (notice that setting measure_errors to 1 is not equvalent to omitting them), R assumes that they are _proportional_ to the inverse weights, and proportionality to c(.01,.01,.01,.01) is not different from proportionality to c(1,1,1,1)...

There are a couple of ways to avoid the use of the estimated multiplicative dispersion parameter in R, one is to extract cov.unscaled from the summary, another is to use summary.glm with dispersion=1, but I'm not quite sure how they interact with weights (and I don't have the time to check just now.)

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com