[R] Loop overwrite and data output problems
RCulloch
ross.culloch at dur.ac.uk
Fri Feb 26 14:31:06 CET 2010
Hello R users,
I have been using R for a while now for basic stats but I'm now trying to
get my head around looping scripts and in some places I am failing!
I have a data set with c. 1200 data points on 98 individual animals with
data on each row representing a daily measure and I am asking the question
"what variables affect the animal's behaviour?"
the dataset includes these variables for analyses:
presence of behaviour, absence of behaviour, site, year, rain, air temp, ID,
Day
Listed below as they appear in the data set:
BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2,
DAY
with BEH_T & BEH_F = the response variable for a binomial GLM
here is the head of the dataset
(NB there are only two years and two sites)
BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY
[1,] 14 10 1 2007 0 12 10.98750 1 1
[2,] 37 23 1 2007 0 13 11.47333 1 2
[3,] 56 22 1 2007 0 14 12.16667 1 3
[4,] 43 23 1 2007 0 16 10.91515 1 5
[5,] 62 16 1 2007 0 17 12.81026 1 6
[6,] 30 20 1 2007 0 19 8.67037 1 8
(Sorry the headings are skewed)
Because I don't want to do too complex a model to start with (just wanting
to learn first with a 'simple' model) I have issues with independence of the
data as there are repeats of individuals - i.e. data taken on the same IDs
on different days. So in order to account for that I have decided to random
sample one data point for each ID then run the GLM on that data for x number
of simulations to see if the explanatory variables are the same/similar
across all models. (This will reduce my data set to 98 data points, but it
is the best way I can see of doing this without doing mixed-effects models,
since not all IDs are seen at both sites in both years).
I am also using the MuMIn package for running all subsets of your model
the code I'm using is:
for (S in 1:2){
Sample.dat<-ALL.R[1,]
for (I in 1:98) {
tmp<-ALL.R[ALL.R$ID2==I,]
max<-dim(tmp)[1]
if (I==1) Sample.dat<-tmp[sample(1:max,1),] else {
Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),])
m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE +
Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST +
Sample.dat$MO_AIR_TEMP, family="binomial")
mod<-dredge(m1.R)}}}
At this point I have two issues if I do it manually then it seems to work
i.e. gives me one output (e.g shown at bottom of post) where I then want to
take the first line, the model with the best AIC using mod[1,] - no problem!
However, letting the code run and for example using print ((mod[1,])) at the
end it prints out the first line of 98 outputs - so I'm not too sure what
I've done wrong here, but it appears to be running a model for each ID -
something basic no doubt!
Ideally, what I want to do is take a random sample of the data then run the
model get one output for that take the top line (i.e. the best AIC) and save
this, then run this routine say 100 times, saving that top line every time,
then having a look at the results and take a model average. Anytime I've got
close to this I have issues with overwriting the previous first line of the
model selection and I can't seem to identify how to set this loop up
properly.
Any advice or guidance would be most appreciated, I have tried to explain my
issues clearly but if more info is required please just ask,
Many thanks in advance to those of you that took the time to read this!
Ross
Ross Culloch
Ph.D. Student
Durham University
UK
Here is an example of the model selection table from usingMuMIn:
Model selection table
(Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AIC AICc
delta weight
30 645.8000 0.03841 -0.02148 0.2882 -0.3212 5 304.0 687.1 687.7
0.000 0.707
32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0 690.0
2.249 0.230
26 785.1000 -0.02543 0.4678 -0.3905 4 312.8 693.9 694.3
6.630 0.026
31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5 696.2
8.493 0.010
22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8 696.2
8.529 0.010
21 582.8000 0.06893 -0.01967 -0.2899 4 314.9 696.0 696.4
8.717 0.009
29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4 698.0
10.330 0.004
28 600.1000 0.06612 0.0046710 -0.02092 -0.2985 5 314.4 697.4 698.1
10.370 0.004
20 0.7526 0.05509 -0.01808 0.2450 4 321.0 702.0 702.5
14.770 0.000
10 530.4000 0.07447 -0.2639 3 324.0 703.1 703.3
15.640 0.000
27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9 704.6
16.850 0.000
19 530.0000 0.07455 -0.0001489 -0.2637 4 324.0 705.1 705.5
17.820 0.000
16 743.4000 0.4875 -0.3698 3 328.7 707.8 708.0
20.310 0.000
9 0.5512 0.06094 0.2286 3 328.8 707.9 708.1
20.430 0.000
8 0.6828 0.08019 -0.01688 3 328.9 708.0 708.2
20.540 0.000
18 0.5584 0.06173 -0.0059840 0.2481 4 327.8 708.9 709.3
21.620 0.000
25 739.9000 -0.0016930 0.4944 -0.3681 4 328.6 709.7 710.1
22.410 0.000
17 0.6856 0.07953 0.0012680 -0.01720 4 328.9 709.9 710.4
22.670 0.000
2 0.4985 0.08406 2 335.8 712.8 713.0
25.270 0.000
7 0.4996 0.08516 -0.0023780 3 335.6 714.7 714.9
27.240 0.000
14 1.0760 -0.02288 0.5151 3 340.8 719.9 720.1
32.420 0.000
23 1.0760 0.0003492 -0.02296 0.5136 4 340.8 721.9 722.3
34.590 0.000
5 0.8587 0.5304 2 354.0 731.0 731.1
43.440 0.000
12 0.8663 -0.0042170 0.5473 3 353.5 732.5 732.8
45.070 0.000
24 967.8000 0.0198500 -0.03274 -0.4813 4 358.3 739.4 739.8
52.140 0.000
15 942.8000 -0.02909 -0.4689 3 370.5 749.5 749.8
62.090 0.000
13 915.8000 0.0151200 -0.4556 3 384.7 763.7 764.0
76.290 0.000
6 900.3000 -0.4478 2 391.8 768.9 769.0
81.320 0.000
11 1.3530 0.0176300 -0.02957 3 402.3 781.3 781.6
93.890 0.000
4 1.3940 -0.02630 2 412.3 789.4 789.5
101.800 0.000
3 1.1010 0.0134300 2 424.4 801.4 801.6
113.800 0.000
1 1.1550 1 430.3 805.4 805.4
117.700 0.000
>
--
View this message in context: http://n4.nabble.com/Loop-overwrite-and-data-output-problems-tp1570593p1570593.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list