[R] weight in lm

peter dalgaard pdalgd at gmail.com
Mon Aug 14 14:17:24 CEST 2017


> On 14 Aug 2017, at 13:43 , Spencer Graves <spencer.graves at effectivedefense.org> wrote:
> 
> 
> 
> On 2017-08-14 5:53 AM, peter dalgaard wrote:
>>> On 14 Aug 2017, at 10:13 , Troels Ring <tring at gvdnet.dk> wrote:
>>> 
>>> Dear friends - I hope you will accept a naive question on lm: R version 3.4.1, Windows 10
>>> 
>>> I have 204 "baskets" of three types corresponding to factor F, each of size from 2 to 33 containing measurements, and need to know if the standard deviation on the measurements  in each basket,sdd, is different across types, F. Plotting the observed sdd  versus the sizes from 2 to 33, called "k" , does show a decreasing spread as k increases towards 33.
>>> 
>>> I tried lm(sdd ~ F,weight=k) and got different results if omitting the weight argument but would it be the correct way to use sqrt(k) as weight instead?
>>> 
>> I doubt that there is a "correct" way, but theory says that if the baskets have the same SD and data are normally distributed, then the variance of the sample VARIANCE is proportional to 1/f = 1/(k-1). Weights in lm are inverse-variance, so the "natural" thing to do would seem to be to regress the square of sdd with weights (k-1).
>> 
>> (If the distribution is not normal, the variance of the sample variance is complicated by a term that involves both n and the excess kurtosis, whereas the variance of the sample SD is complicated in any case. All according to the gospel of St.Google.)
> 
> 
>      The Wikipedia article on "standard deviation" gives the more general formula.  (That article does NOT give a citation for that formula.  I you know one, please add it -- or post it here, to make it easier for someone else to add it.)
> 

Er, I don't see that (i.e. var(S) etc.) in there? 

My sources were

https://math.stackexchange.com/questions/72975/variance-of-sample-variance
https://stats.stackexchange.com/questions/631/standard-deviation-of-standard-deviation

which contains further links, but no references to publications. I suspect that this stuff is easy enough to do ab initio that people don't bother to fire up a literature search.

-pd


> 
>      Thanks, Peter.
>      Spencer Graves
>> 
>> -pd
>> 
>> 
>>> Best wishes
>>> 
>>> Troels Ring
>>> Aalborg, Denmark
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list