[R] When lack of data is data and not n/a

Wayne Gray wgray.999 at gmail.com
Tue Apr 3 16:04:22 CEST 2012


Greetings.

Here is a problem I don't know how to handle, even by brute force. We have an 800k line data file that includes eye fixations for subjects in a 3 x 2 factorial design. There are several screen locations where information is available while Ss do their task. These locations vary by condition so there is no reason for people in some conditions (i.e., the 3-factor one) to look at some locations. So I am analyzing these conditions, two pairs at a time (i.e., the 2-factor one).

So the problem: My ANOVA of these data had messed up the within and between factors the way aov does when there are different numbers of Ss contributing data to some of the variables than for others. A bit of sleuthing with plyr revealed that one of our Ss in one of our conditions never looked at one of our locations. 

Given the nature of the DV, zero is a fine number. Although a little unexpected in this condition, it is reasonable and cannot be ignored. However, R recognizes, rightly, that lack of data is not the same as zero. 

I guess I could "subset" and "aggregate" the dataset to pull out a data.frame that contains the data aggregated at the right level for this aov and then add one record for this one Ss (well - actually I would add 8 records as "block" is a within-Ss factor that we have been looking at). But there must be a more elegant way of doing this especially as we are still in the exploratory phase and will be pulling out factors such as mean dwell time, total dwell time (dwell time per fix, summed over all fix) and other factors (e.g., tallies of sequential fixation chains -- e.g., obj-A, obj-B, vs obj-A, obj-C, etc, etc, etc). 

As always, your thoughts, comments, and code would be appreciated.

Thanks,

Wayne Gray



More information about the R-help mailing list