[R] Getting number of students with zeroes in long format

Douglas Bates bates at stat.wisc.edu
Wed Apr 6 23:03:24 CEST 2011


On Wed, Apr 6, 2011 at 3:44 PM, Christopher Desjardins
<cddesjardins at gmail.com> wrote:
> Hi,
> I have longitudinal school suspension data on students. I would like to
> figure out how many students (id_r) have no suspensions (sus), i.e. have a
> code of '0'. My data is in long format and the first 20 records look like
> the following:
>
>> suslm[1:20,c(1,7)]
>   id_r sus
>   11   0
>   15  10
>   16   0
>   18   0
>   19   0
>   19   0
>   20   0
>   21   0
>   21   0
>   22   0
>   24   0
>   24   0
>   25   3
>   26   0
>   26   0
>   30   0
>   30   0
>   31   0
>   32   0
>   33   0
>
> Each id_r is unique and I'd like to know the number of id_r that have a 0
> for sus not the total number of 0. Does that make sense?

You say you have longitudinal data so may we assum that a particular
id_r can occur multiple times in the data set?  It is not clear to me
what you want the result to be for students who have no suspensions at
one time but may have a suspension at another time.  Are you
interested in the number of students who have only zeros in the sus
column?

One way to approach this task is to use tapply.  I would create a data
frame and convert id_r to a factor.

df <- within(as.data.frame(suslm), id_r <- factor(id_r))
counts <- with(df, lapply(sus, id_r, function(sus) all(sus == 0)))

The tapply function will split the vector sus according to the levels
of id_r and apply the function to the subvectors.

I just say Jorge's response and he uses the same tactic but he is
looking for students who had any value of sus==0



More information about the R-help mailing list