[R] Tabulating Sparse Contingency Table
Charles C. Berry
cberry at tajo.ucsd.edu
Sat Mar 29 03:17:39 CET 2008
Dear 'Born',
There was thread on this recently, but I cannot seem to find it.
The best suggestion (IMHO) was along these lines:
aggregate( rep(1,40), as.data.frame(diag(4)[sample(1:4,40,repl=T),]), sum )
See also
http://thread.gmane.org/gmane.comp.lang.r.general/104798/focus=104841
and if you have a really big problem and access to unix utilities you
might consider something like this:
dat <- read.table( pipe('sort file.dat | uniq -c' ) )
HTH,
Chuck
p.s. the 'netiquette' of this list is to identify yourself with an
appropriate email handle or signature block.
On Fri, 28 Mar 2008, born.to.b.wyld at gmail.com wrote:
> I have a sparse contingency table (most cells are 0):
>
>> xtabs(~.,data[,idx:(idx+4)])
> , , x3 = 1, x4 = 1, x5 = 1
>
> x2
> x1 1 2 3
> 1 0 0 31
> 2 0 0 112
> 3 0 0 94
>
> , , x3 = 2, x4 = 1, x5 = 1
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 3, x4 = 1, x5 = 1
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 1, x4 = 2, x5 = 1
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 2, x4 = 2, x5 = 1
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 18 0
> 3 0 27 0
>
> , , x3 = 3, x4 = 2, x5 = 1
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 1, x4 = 3, x5 = 1
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 2, x4 = 3, x5 = 1
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 3, x4 = 3, x5 = 1
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 1 0 0
> 3 2 0 0
>
> , , x3 = 1, x4 = 1, x5 = 2
>
> x2
> x1 1 2 3
> 1 0 0 142
> 2 0 0 340
> 3 0 0 1
>
> , , x3 = 2, x4 = 1, x5 = 2
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 3, x4 = 1, x5 = 2
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 1, x4 = 2, x5 = 2
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 2, x4 = 2, x5 = 2
>
> x2
> x1 1 2 3
> 1 0 4 0
> 2 0 41 0
> 3 0 0 0
>
> , , x3 = 3, x4 = 2, x5 = 2
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 1, x4 = 3, x5 = 2
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 2, x4 = 3, x5 = 2
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 3, x4 = 3, x5 = 2
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 1, x4 = 1, x5 = 3
>
> x2
> x1 1 2 3
> 1 0 0 173
> 2 0 0 4
> 3 0 0 0
>
> , , x3 = 2, x4 = 1, x5 = 3
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 3, x4 = 1, x5 = 3
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 1, x4 = 2, x5 = 3
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 2, x4 = 2, x5 = 3
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 3, x4 = 2, x5 = 3
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 1, x4 = 3, x5 = 3
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 2, x4 = 3, x5 = 3
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
> , , x3 = 3, x4 = 3, x5 = 3
>
> x2
> x1 1 2 3
> 1 0 0 0
> 2 0 0 0
> 3 0 0 0
>
>
>
>
>
>
>
> Now, I do can do the following to get the sparse representation 'y' for the
> table above:
>
>> idx<-2
>> y<-as.data.frame.table(xtabs(~.,data[,idx:(idx+4)]))
>> y<-y[y$Freq>0,]
>> z<-sort(y$Freq,decreasing=T,index.return=T)
>> y<-y[z$ix,]
>> y
> x1 x2 x3 x4 x5 Freq
> 89 2 3 1 1 2 340
> 169 1 3 1 1 3 173
> 88 1 3 1 1 2 142
> 8 2 3 1 1 1 112
> 9 3 3 1 1 1 94
> 122 2 2 2 2 2 41
> 7 1 3 1 1 1 31
> 42 3 2 2 2 1 27
> 41 2 2 2 2 1 18
> 121 1 2 2 2 2 4
> 170 2 3 1 1 3 4
> 75 3 1 3 3 1 2
> 74 2 1 3 3 1 1
> 90 3 3 1 1 2 1
>
>
>
>
> I am wondering if there is an R function, or a simple R routine which would
> help me make the data frame 'y' without using 'xtabs'. I need to study
> contingency tables of 20 (or even more) dimensions. R is unable to store a
> full 3^20 contingency table. But since the tables of interest are highly
> sparse, I figure the problem at hand could be highly simplified if I have
> something that would create a sparse representation.
>
> Any help or suggestions would be greatly appreciated.
>
> Thanks,
> A
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list