[R] Problem with binomial gam{mgcv}

Erin Conlisk erin.conlisk at gmail.com
Sat Oct 10 02:00:55 CEST 2015


Hello,

I am having trouble testing for the significance using a binomial model in
gam{mgcv}.  Have I stumbled on a bug?  I doubt I would be so lucky, so
could someone tell me what I am doing wrong?

Please see the following code:
________________________________

# PROBLEM USING cbind

x1 <- runif(500, 0, 100)  # Create 500 random variables to use as my
explanatory variable

y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve as
binomial "successes"

y2 <- 100-y1 # Create 500 binomial "failures", assuming a total of 100
trials and the successes recorded in y1

Model <- gam(cbind(y1, y2) ∼ 1 + s(x1), family=binomial)
summary(Model)
________________________________

The result is that my random variable, x1, is highly significant.  This
can't be right...

So what happens when I change the observations from a "batch" of 100 trials
to individual successes and failures?
________________________________

# NOW MAKE ALL THESE DATA 0 and 1

r01<-rep(0,500)
data01<-cbind(x1, y1, y2, r01)
rownames(data01)<-seq(1,500, 1)
colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
data01<-data.frame(data01)

expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4]  # Creates a
replicate of the      #  explanatory variables for each individual "success"

r01<-rep(1,500)
data01<-cbind(x1, y1, y2, r01)
rownames(data01)<-seq(1,500, 1)
colnames(data01)<-c('x1', 'y1', 'y2', 'r01')
data01<-data.frame(data01)

expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4]  # Creates a
replicate of the      #  explanatory variables for each individual "failure"

data01<-rbind(expanded0,expanded1)

Model2 <- gam(r01 ∼ 1 + s(x1), family=binomial)
summary(Model2)
___________________________________

The result is what I expect.  Now my random variable, x1, is NOT
significant.

What is going on here?

I should say that I didn't just make this up.  My question arose playing
with my real data, where I was getting high significance, but a terrible
proportion of deviance explained.

My apologies if this is explained elsewhere, but I have spent hours
searching for an answer online.

Thank you kindly,
Erin Conlisk

-- 
Postdoctoral Researcher
UC Berkeley
Energy and Resources Group
310 Barrows Hall
Berkeley, CA 94720

cell: 858-776-2939

	[[alternative HTML version deleted]]



More information about the R-help mailing list