[R] Simple lm/regression question
peter dalgaard
pdalgd at gmail.com
Mon Feb 6 11:36:50 CET 2012
On Feb 6, 2012, at 10:57 , Achim Zeileis wrote:
> On Mon, 6 Feb 2012, James Annan wrote:
>
>
> The summary() shows under "Residuals" the contributions to the objective function, i.e. sqrt(1/w) (y - x'b) in the notation above.
>
> However, by using the residuals extractor function you can get the unweighted residuals:
>
> residuals(lm(y~x,weights=c(.01,.01,.01,.01)))
>
>> The uncertainties on the parameter estimates, however, have *not* changed, which seems very odd to me.
>
> lm() interprets the weights as precision weights, not as case weights.
>
> Thus, the scaling in the variances is done by the number of (non-zero) weights, not by the sum of weights.
>
>> The behaviour of IDL is rather different and intuitive to me:
>>
>> IDL> vec=linfit(x,y,sigma=sig,measure_errors=[1,1,1,1])
>> IDL> print,vec,sig
>> -5.00000 5.00000
>> 1.22474 0.447214
>>
>> IDL> vec=linfit(x,y,sigma=sig,measure_errors=[10,10,10,10])
>> IDL> print,vec,sig
>> -5.00000 5.00000
>> 12.2474 4.47214
>
> This appears to use sandwich standard errors.
Actually, I think the issue is slightly different: IDL assumes that the errors _are_ something (notice that setting measure_errors to 1 is not equvalent to omitting them), R assumes that they are _proportional_ to the inverse weights, and proportionality to c(.01,.01,.01,.01) is not different from proportionality to c(1,1,1,1)...
There are a couple of ways to avoid the use of the estimated multiplicative dispersion parameter in R, one is to extract cov.unscaled from the summary, another is to use summary.glm with dispersion=1, but I'm not quite sure how they interact with weights (and I don't have the time to check just now.)
--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list