[R] Beginner help with retrieving frequency and transforming a matrix

Roland Rau roland.rproject at gmail.com
Fri Mar 28 16:52:14 CET 2008


Hi Sean,

is this roughly what you are looking for (please note that in the 
example data you provided there is only one level of ID given, no "S-4", 
...) ?

 > DF
     ID  Cl Co  Brd    Ind A AB AB.1 frq
1  S-3 IND  A BR_F BR_F01 1  0    0 1.0
2  S-3 IND  A BR_F BR_F01 1  0    0 1.0
3  S-3 IND  A BR_F BR_F01 1  0    0 1.0
4  S-3 IND  A BR_F BR_F01 1  0    0 1.0
5  S-3 IND  A BR_F BR_F01 1  0    0 1.0
6  S-3 IND  A BR_F BR_F01 0  1    0 0.5
7  S-3 IND  A BR_F BR_F02 0  0    1 0.0
8  S-3 IND  A BR_F BR_F02 0  1    0 0.5
9  S-3 IND  A BR_F BR_F02 1  0    0 1.0
10 S-3 IND  A BR_F BR_F02 1  0    0 1.0
11 S-3 IND  A BR_F BR_F02 0  1    0 0.5
12 S-3 IND  A BR_F BR_F02 1  0    0 1.0
 > DF2 <- aggregate(x=DF$frq, by=list(ID=DF$ID, Ind=DF$Ind), FUN=mean)
 > DF2
    ID    Ind         x
1 S-3 BR_F01 0.9166667
2 S-3 BR_F02 0.6666667
 > FinalDF <- tapply(X=DF$frq, INDEX=list(Ind=DF$Ind, ID=DF$ID), FUN=mean)
 > FinalDF
         ID
Ind            S-3
   BR_F01 0.9166667
   BR_F02 0.6666667
 >

Best,
Roland


Sean MacEachern wrote:
> Hi All,
> 
> Just hoping some one can give me a hand with a problem...
> 
> I have a dataframe (DF) with about 5 million entries that looks something
> like the following:
> 
>> DF
>     ID  Cl Co  Brd    Ind A AB  AB
> 1  S-3 IND  A BR_F BR_F01 1  0   0
> 2  S-3 IND  A BR_F BR_F01 1  0   0
> 3  S-3 IND  A BR_F BR_F01 1  0   0
> 4  S-3 IND  A BR_F BR_F01 1  0   0
> 5  S-3 IND  A BR_F BR_F01 1  0   0
> 6  S-3 IND  A BR_F BR_F01 0  1   0
> 7  S-3 IND  A BR_F BR_F02 0  0   1
> 8  S-3 IND  A BR_F BR_F02 0  1   0
> 9  S-3 IND  A BR_F BR_F02 1  0   0
> 10 S-3 IND  A BR_F BR_F02 1  0   0
> 11 S-3 IND  A BR_F BR_F02 1  0   0
> 12 S-3 IND  A BR_F BR_F02 1  0   0
> 
> I am interested in retrieving the frequency of A for everything with the
> same Ind code.
> 
> I have initially created a column called 'frq' that calculates the
> individual A frequency
> 
> 
>> DF$frq=apply(DF,1,function(x) if(x[6]==1)1 else if (x[7]==1)0.5 else 0)
> 
>> DF
> 
>     ID  Cl Co  Brd    Ind A AB  AB  frq
> 1  S-3 IND  A BR_F BR_F01 1  0   0   1
> 2  S-3 IND  A BR_F BR_F01 1  0   0   1
> 3  S-3 IND  A BR_F BR_F01 1  0   0   1
> 4  S-3 IND  A BR_F BR_F01 1  0   0   1
> 5  S-3 IND  A BR_F BR_F01 1  0   0   1
> 6  S-3 IND  A BR_F BR_F01 0  1   0  0.5
> 7  S-3 IND  A BR_F BR_F02 0  0   1   0
> 8  S-3 IND  A BR_F BR_F02 0  1   0  0.5
> 9  S-3 IND  A BR_F BR_F02 1  0   0   1
> 10 S-3 IND  A BR_F BR_F02 1  0   0   1
> 11 S-3 IND  A BR_F BR_F02 0  1   0  0.5
> 12 S-3 IND  A BR_F BR_F02 1  0   0   1
> 
> I've created a new DF that contains the info I'm interested in:
> 
>> DF2 = cbind(DF[1],DF[5],DF[9])
> 
>> DF2
> 
>     ID    Ind frq
> 1  S-3 BR_F01 1 
> 2  S-3 BR_F01 1 
> ...
> ...
> ...
> 11 S-3 BR_F02 0.5
> 12 S-3 BR_F02 1 
> 
> 
> I am wondering is there a method that I can call to calculate the frequency
> of A or frq for all individuals with the same Ind code so the DF (matrix)
> looks something like the following? (I've saw something in a tut based on
> t-tests that I thought would work, but no joy...)
> 
> 
>> NewDF
> 
>     ID    Ind frq
> 1  S-3 BR_F01 0.9167
> 2  S-3 BR_F02 0.6667
>  
> 
> Further, is there to then transform the matrix to look something like the
> following?
> 
> 
>> FinalDF
> 
> Ind       S-3  S-4  S-5.... S-1000000
> BR_F01 0.9167  0.5   1         0.6667
> BR_F02 0.6667  0.2   1         0.5
> ...
> ...
> ...
> BR_Z98   0.5    1   0.3         1
> BR_Z99    1    0.6   1         0.5
> 
> 
> 
> Thanks in advance for any help you can offer, and please let me know if
> there is any further information I can provide.
> 
> Sean
> 
> 
>> sessionInfo()
> R version 2.6.0 (2007-10-03)
> i386-apple-darwin8.10.1
> 
> locale:
> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list