[R] Logistic regression on aggregate data

Mon Aug 11 10:33:55 CEST 2008

Vibhanshu Abhishek wrote:
> Hi,
>
> I want to run a logistic regression on summary data and I was not able to
> find the adequate R function to do so. The data is summarized daily and
> instead of the binary y_t I have n_d and m_d, where is the number of
> instances in which a choice could have been made and m_d is the number of
> instances in which the choice was made. The other characteristics have been
> averaged throughout the day.
>
> sample file:
> impression clicks pos
> 5049 1 16.68251
> 4983 1 16.75457
> 4908 1 16.76956
> 5093 0 16.70803
> 5049 254 2.299663
> 5023 307 2.300418
> 4946 252 2.315609
> 4932 247 2.303933
>
> Thanks for the help.
> Vibhanshu
>   
There are two ways:

d <- read.table("clipboard", header=TRUE)

glm(clicks/impression~pos, d, family=binomial, weights=impression)
glm(cbind(clicks, noclicks=impression - clicks)~pos, d, family=binomial)

I.e. the LHS can be a rate, in which case you need weights to tell the
difference between 5/10 and 500/1000, or it can be a matrix with two
columns, events and non-events (NB: not the number of trials!)

I think you'll find this in most textbooks on R...

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907