[R] query about counting rows of a dataframe

David Winsemius dwinsemius at comcast.net
Thu Nov 3 22:40:19 CET 2011


On Nov 3, 2011, at 12:28 PM, Stefano Sofia wrote:

> Dear R users,
> I have got the following data frame, called my_df:
>
>   gender day_birth month_birth year_birth labour
> 1           F             22                  10            
> 2001          1
> 2           M            29                  10            
> 2001          2
> 3           M              1                   11           
> 2001          1
> 4           F               3                  11            
> 2001          1
> 5           M              3                  11            
> 2001          2
> 6           F              4                   11            
> 2001          1
> 7           F              4                   11            
> 2001          2
> 8           F              5                   12            
> 2001          2
> 9           M           22                   14            
> 2001          2
> 10         F           29                   13            
> 2001          2
> ...
>
> I need to count data in different ways:
>
> 1. count the births for each day (having 0 when necessary)  
> independently from the value of the "labour" column

xtabs sometimes give better results. If you want all 31 days then make  
day_birth a factor with levels=1:31)

 > xtabs(  ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001

          month_birth
day_birth 10 11 12 13 14
        1   0  1  0  0  0
        3   0  2  0  0  0
        4   0  2  0  0  0
        5   0  0  1  0  0
        22  1  0  0  0  1
        29  1  0  0  1  0

>
> 2. count the births for each day (having 0 when necessary), divided  
> by the value of "labour" (which can have two valuers, 1 or 2)

Cannot figure out what is being asked here. What to do with the two  
values? Just count them? This would give a partitioned count

 > xtabs( labour==1 ~ day_birth + month_birth , data=dat)
          month_birth
day_birth 10 11 12 13 14
        1   0  1  0  0  0
        3   0  1  0  0  0
        4   0  1  0  0  0
        5   0  0  0  0  0
        22  1  0  0  0  0
        29  0  0  0  0  0
 > xtabs( labour==2 ~ day_birth + month_birth , data=dat)
          month_birth
day_birth 10 11 12 13 14
        1   0  0  0  0  0
        3   0  1  0  0  0
        4   0  1  0  0  0
        5   0  0  1  0  0
        22  0  0  0  0  1
        29  1  0  0  1  0


>
> 3. count the births for each day of all the years (i.e. the 22nd of  
> October of all the years present in the data frame) independently  
> from the value of "labour"

If I understand correctly:

 > xtabs(  ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001

          month_birth
day_birth 10 11 12 13 14
        1   0  1  0  0  0
        3   0  2  0  0  0
        4   0  2  0  0  0
        5   0  0  1  0  0
        22  1  0  0  0  1
        29  1  0  0  1  0

>
> 4. count the births for each day of all the years (i.e. the 22nd of  
> October of all the years present in the data frame), divided by the  
> value of "labour"

Again confusing. Do you mean to use separate tables for labour==1 and  
labour==2? Perhaps context to explain what these values represent.  
Some of us are "concrete". The results of xtabs are tables and can be  
divided like matrices.

>
> I tried with the command
>
> table(my_df$year_birth, my_df$month_birth, my_df$day_birth)
>
> which satisfies (partially) question numer 1 (I am not able to have  
> 0 in the not available days).
>
> Is there a smart way to do that without invoking too many loops?
>
> thank you for your help


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list