[R] Logistic regression on aggregate data
Peter Dalgaard
P.Dalgaard at biostat.ku.dk
Mon Aug 11 10:33:55 CEST 2008
Vibhanshu Abhishek wrote:
> Hi,
>
> I want to run a logistic regression on summary data and I was not able to
> find the adequate R function to do so. The data is summarized daily and
> instead of the binary y_t I have n_d and m_d, where is the number of
> instances in which a choice could have been made and m_d is the number of
> instances in which the choice was made. The other characteristics have been
> averaged throughout the day.
>
> sample file:
> impression clicks pos
> 5049 1 16.68251
> 4983 1 16.75457
> 4908 1 16.76956
> 5093 0 16.70803
> 5049 254 2.299663
> 5023 307 2.300418
> 4946 252 2.315609
> 4932 247 2.303933
>
> Thanks for the help.
> Vibhanshu
>
There are two ways:
d <- read.table("clipboard", header=TRUE)
glm(clicks/impression~pos, d, family=binomial, weights=impression)
glm(cbind(clicks, noclicks=impression - clicks)~pos, d, family=binomial)
I.e. the LHS can be a rate, in which case you need weights to tell the
difference between 5/10 and 500/1000, or it can be a matrix with two
columns, events and non-events (NB: not the number of trials!)
I think you'll find this in most textbooks on R...
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list