[R] before-after control-impact analysis with R

Mon Aug 18 01:11:55 CEST 2008

Hello everybody,

In am trying to analyse a BACI experiment and I really want to do it 
with R (which I find really exciting).  So, before moving on I though it 
would be a good idea to repeat some known experiments which are quite 
similar to my own. I tried to reproduce 2 published examples but without 
much success. The first one in particular is a published dataset 
analysed with SAS by McDonald et al. (2000). They also provide the 
original data as well as the SAS code. I don't know much about SAS and i 
really want to stick to R. So here follow 3 questions based on these 2 
papers.

Paper 1
(McDonald et al. 2000. Analysis of count data from before-after 
control-impact studies. J. Agric. Biol. Envir Stat., 5:262-279)

Data were collected from 1984, 1989, 1990, 1991, 1993, and 1996. Samples 
from 1984 are considered as before an impact and the remaining years as 
  after the impact. Each year, 96 transects were sampled (36 in the 
oiled area and 60 in the non-oiled area; "0" is for oiled and "1" for 
non-oiled). The authors compare 3 different ways of analysing the data 
including glmm. The data can be reproduced with the following commands 
(density is fake numbers but I can provide the original data since I've 
typed them in anyway):

 >x<-c(rep("0",36),rep("1",60))
 >oiled<-rep(x,6)
 >year<-c(rep(1984,96), rep(1989,96), rep(1990,96), rep(1991,96), 
rep(1993,96), rep(1996,96))
 >density<-runif(576, min=0, max=10)
 >transect<-c(rep(1:96,6))
 >oil<-data.frame("oiled"=oiled, "transect"=transect, "year"=year, 
"density"=density)

Question 1:
I can reproduce the results of the repeated measures anova with:
 >oil.aov1<-aov(density~factor(year)*factor(oiled)+Error(factor(transect))

But why is the following command not working?
 >oil.aov2<-aov(density~oiled*year + Error(oiled/transect), data=oil)

After reading the R-help archive, as well as Chambers and Hasties 
(Statistical Models in S) and Pinheiro's and Bates (Mixed effects models 
in S and S-plus) I would expect that the correct model is the oil.aov2. 
As you might see from the data, transect is nested within oiled and I 
still get the wrong results when I try +Error(transect) or 
+Error(oiled:transect) and many other variants.

Question 2 (on the same paper):
The authors conclude that it is better to analyse the data with a 
Generalized Linear (Mixed) Model Technique. I tried lme and after 
reading Douglas Bates article (May 2005, vol. 5/1) also lmer as follows:
 >oil.lme<-lme(density~year*oiled, random=~1|oiled/transect)
or
 >harvest.lmer<-lmer(abund~H*BA+BA/year+BA/year:H+site:BA+(1|H/site))
but again no luck. I will get always some error messages or some 
interactions missing.

Paper 2
(Keough & Quinn, 2000. Legislative vs. practical protection of an 
intertidal shoreline in southeastern Australia. Ecol. Appl. 10: 871-881)

Data were collected from 1989, 1990, 1991, 1993, 1994, 1995, 1996, 
1997). the 1989-1991 are the before impact years and the other 4 years 
are the after the impact years. Eight sites were samples (2 protected 
and 6 impacted). In this dataset, site is nested within harvest (H) and 
year is nested within before-after (BA). Also, site is considered as 
random by the authors. The data (fake again) can be produced with the 
following commands:

 >site<-c(rep(c("A1","A2", "RR1", "RR2", "WT1", "WT2", "WT3", "WT4"),8))
 >H<-c(rep(c("exp", "exp", "prot", "pro", "exp", "exp", "exp", "exp"), 8))
 >year<-c(rep(1989,8), rep(1990,8), rep(1991,8), rep(1993,8), 
rep(1994,8), rep(1995,8), rep(1996,8), rep(1997,8))
 >BA<-c(rep("bef",24), rep("after",40))
 >abund<-runif(64, min=0, max=10)
 >harvest<-data.frame(abund, BA, H, site, year)

Question 3.
The authors use a complex anova design and here is part of their anova 
table which shows the design and the df.

source.of.var      df
Harvesting(H)     1, 6
before-after(BA)  1, 6
H x BA            1, 6
Site(H)           6, 31
Year(BA)          6, 31
Site x BA         6, 31
Year x H          6, 31
Resid.            31

My question again is: Why can't I reproduce the results? When I try a 
simple anova without any random factors:
 >harvest.lm<-lm(abund~H*BA+H/site+BA/year+site:BA+year:H)
I get close enought but the results are different (at least I think they 
are different because the nomin. df are different).

So obviously I need to assign sites as a random factor somehow. So when 
I try to include site (which is nested in H) as a random factor and also 
year nested in BA (as the authors did) the best I can come up with is:
 >harvest.lme<-lme(abund~H*BA+BA/year+BA/year:H, random=~1|H/site)
But I get a warning message (Warning message:In pf(q, df1, df2, 
lower.tail, log.p) : NaNs produced) and also I don't know where to put 
the site x BA term (whatever I try I get error messages).

I really apologise for the long post but after a week of studying and 
trying as many ideas and examples I could find and think of I felt that 
I really need advice from some more experienced users if I really want 
to use this magnificent tool correctly.

Thanks in advance

-- 
-------------------------------------------------------
Nikolaos Lampadariou
Hellenic Centre for Marine Research
P.O. Box 2214
710 03 Heraklion Crete
GREECE

e-mail: nlamp at her.hcmr.gr
Ph. +30 281 0337849, +30 281 0337806
FAX +30 281 0337822
Web site: http://www.hcmr.gr/