[R] Mimicking SPSS weighted least squares

Tue Mar 11 03:26:46 CET 2008

On 11 Mar 2008 at 14:09, Rolf Turner wrote:

> 
> It would appear that the SPSS procedure would then give exactly the same
> point estimates of the parameters, and change the inference structure by
> changing the ``denominator degrees of freedom'' from n-p to sum(w) - p.
> 

Well, if that IS what SPSS does, then it sounds like what Stata calls frequency weights, the 
general idea being that each "observation" in fact represents some non-negative number (w) of 
actual observations that have identical values.  Not much more than a glorified version of a 
frequency distribution table.

I don't see anything fundamentally wrong with frequency weights, given an appropriate situation.

---JRG

John R. Gleason

> This seems to me to make little sense ...  But then, it ***is***  
> SPSS. :-)
> 
> 	cheers,
> 
> 		Rolf
> 
> On 11/03/2008, at 11:35 AM, Peter Dalgaard wrote:
> 
> > Rolf Turner wrote:
> >> On 11/03/2008, at 4:04 AM, Ben Domingue wrote:
> >>
> >>
> >>> Howdy,
> >>> In SPSS, there are 2 ways to weight a least squares regression:
> >>> 1. You can do it from the regression menu.
> >>> 2. You can set a global weight switch from the data menu.
> >>> These two options have no, in my experience, been equivalent.
> >>> Now, when I run lm in R with the weights= switch set accordingly, I
> >>> get the same set of results you would see with option #1 in SPSS.
> >>> Does anybody know how to duplicate option #2 from SPSS in R?
> >>>
> >>
> >> I think it's up to you to find out what ``option #2 from SPSS''  
> >> actually
> >> *does*.  If you know that, then you can (with a modicum of effort)
> >> duplicate that option in R.  The help file for lm() tells you that
> >> R uses the weights by minimizing sum(w*e^2) where w = weights and
> >> e = ``errors'' or residuals.
> >>
> >>
> >>
> > I believe case weighting in SPSS effectively replicates the  
> > relevant row (not sure if anything sensible comes out if weights  
> > are non-integer).  So
> >
> > lm(...., data=mydata[rep(1:nrow(mydata),w),])
> >
> > or thereabouts should do it. Might not be too efficient though.
> >
> > -- 
> >   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
> >  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> > (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45)  
> > 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45)  
> > 35327907
> >
> >
> 
> ######################################################################
> Attention: 
> This e-mail message is privileged and confidential. If you are not the 
> intended recipient please delete the message and notify the sender. 
> Any views or opinions presented are solely those of the author.
> 
> This e-mail has been scanned and cleared by MailMarshal 
> www.marshalsoftware.com
> ######################################################################
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.