[R] before-after control-impact analysis with R
Nikolaos Lampadariou
nlamp at her.hcmr.gr
Mon Aug 18 01:11:55 CEST 2008
Hello everybody,
In am trying to analyse a BACI experiment and I really want to do it
with R (which I find really exciting). So, before moving on I though it
would be a good idea to repeat some known experiments which are quite
similar to my own. I tried to reproduce 2 published examples but without
much success. The first one in particular is a published dataset
analysed with SAS by McDonald et al. (2000). They also provide the
original data as well as the SAS code. I don't know much about SAS and i
really want to stick to R. So here follow 3 questions based on these 2
papers.
Paper 1
(McDonald et al. 2000. Analysis of count data from before-after
control-impact studies. J. Agric. Biol. Envir Stat., 5:262-279)
Data were collected from 1984, 1989, 1990, 1991, 1993, and 1996. Samples
from 1984 are considered as before an impact and the remaining years as
after the impact. Each year, 96 transects were sampled (36 in the
oiled area and 60 in the non-oiled area; "0" is for oiled and "1" for
non-oiled). The authors compare 3 different ways of analysing the data
including glmm. The data can be reproduced with the following commands
(density is fake numbers but I can provide the original data since I've
typed them in anyway):
>x<-c(rep("0",36),rep("1",60))
>oiled<-rep(x,6)
>year<-c(rep(1984,96), rep(1989,96), rep(1990,96), rep(1991,96),
rep(1993,96), rep(1996,96))
>density<-runif(576, min=0, max=10)
>transect<-c(rep(1:96,6))
>oil<-data.frame("oiled"=oiled, "transect"=transect, "year"=year,
"density"=density)
Question 1:
I can reproduce the results of the repeated measures anova with:
>oil.aov1<-aov(density~factor(year)*factor(oiled)+Error(factor(transect))
But why is the following command not working?
>oil.aov2<-aov(density~oiled*year + Error(oiled/transect), data=oil)
After reading the R-help archive, as well as Chambers and Hasties
(Statistical Models in S) and Pinheiro's and Bates (Mixed effects models
in S and S-plus) I would expect that the correct model is the oil.aov2.
As you might see from the data, transect is nested within oiled and I
still get the wrong results when I try +Error(transect) or
+Error(oiled:transect) and many other variants.
Question 2 (on the same paper):
The authors conclude that it is better to analyse the data with a
Generalized Linear (Mixed) Model Technique. I tried lme and after
reading Douglas Bates article (May 2005, vol. 5/1) also lmer as follows:
>oil.lme<-lme(density~year*oiled, random=~1|oiled/transect)
or
>harvest.lmer<-lmer(abund~H*BA+BA/year+BA/year:H+site:BA+(1|H/site))
but again no luck. I will get always some error messages or some
interactions missing.
Paper 2
(Keough & Quinn, 2000. Legislative vs. practical protection of an
intertidal shoreline in southeastern Australia. Ecol. Appl. 10: 871-881)
Data were collected from 1989, 1990, 1991, 1993, 1994, 1995, 1996,
1997). the 1989-1991 are the before impact years and the other 4 years
are the after the impact years. Eight sites were samples (2 protected
and 6 impacted). In this dataset, site is nested within harvest (H) and
year is nested within before-after (BA). Also, site is considered as
random by the authors. The data (fake again) can be produced with the
following commands:
>site<-c(rep(c("A1","A2", "RR1", "RR2", "WT1", "WT2", "WT3", "WT4"),8))
>H<-c(rep(c("exp", "exp", "prot", "pro", "exp", "exp", "exp", "exp"), 8))
>year<-c(rep(1989,8), rep(1990,8), rep(1991,8), rep(1993,8),
rep(1994,8), rep(1995,8), rep(1996,8), rep(1997,8))
>BA<-c(rep("bef",24), rep("after",40))
>abund<-runif(64, min=0, max=10)
>harvest<-data.frame(abund, BA, H, site, year)
Question 3.
The authors use a complex anova design and here is part of their anova
table which shows the design and the df.
source.of.var df
Harvesting(H) 1, 6
before-after(BA) 1, 6
H x BA 1, 6
Site(H) 6, 31
Year(BA) 6, 31
Site x BA 6, 31
Year x H 6, 31
Resid. 31
My question again is: Why can't I reproduce the results? When I try a
simple anova without any random factors:
>harvest.lm<-lm(abund~H*BA+H/site+BA/year+site:BA+year:H)
I get close enought but the results are different (at least I think they
are different because the nomin. df are different).
So obviously I need to assign sites as a random factor somehow. So when
I try to include site (which is nested in H) as a random factor and also
year nested in BA (as the authors did) the best I can come up with is:
>harvest.lme<-lme(abund~H*BA+BA/year+BA/year:H, random=~1|H/site)
But I get a warning message (Warning message:In pf(q, df1, df2,
lower.tail, log.p) : NaNs produced) and also I don't know where to put
the site x BA term (whatever I try I get error messages).
I really apologise for the long post but after a week of studying and
trying as many ideas and examples I could find and think of I felt that
I really need advice from some more experienced users if I really want
to use this magnificent tool correctly.
Thanks in advance
--
-------------------------------------------------------
Nikolaos Lampadariou
Hellenic Centre for Marine Research
P.O. Box 2214
710 03 Heraklion Crete
GREECE
e-mail: nlamp at her.hcmr.gr
Ph. +30 281 0337849, +30 281 0337806
FAX +30 281 0337822
Web site: http://www.hcmr.gr/
More information about the R-help
mailing list