[R] computing marginal values based on multiple columns?
Gerrit Eichner
Gerrit.Eichner at math.uni-giessen.de
Tue Dec 4 11:59:08 CET 2012
Hello, Simon,
see below!
On Tue, 4 Dec 2012, Simon wrote:
> Hello all,
>
> I have what feels like a simple problem, but I can't find an simple
> answer. Consider this data frame:
>
>> x <- data.frame(sample1=c(35,176,182,193,124),
> sample2=c(198,176,190,23,15), sample3=c(12,154,21,191,156),
> class=c('a','a','c','b','c'))
>
>> x
> sample1 sample2 sample3 class
> 1 35 198 12 a
> 2 176 176 154 a
> 3 182 190 21 c
> 4 193 23 191 b
> 5 124 15 156 c
>
> Now I wish to know: for each sample, for values < 20% of the sample mean,
> what percentage of those are class a?
>
> I want to end up with a table like:
>
> sample1 sample2 sample3
> 1 1.0 0 0.5
I can't reproduce this result from your description above, but if I
understand the latter correctly, maybe the following does what you want:
x.wo.class <- subset( x, select = -class)
# extract only the sample-columns
x.small.and.a <- x.wo.class < 0.2 * colMeans( x.wo.class) & x$class == "a"
apply( x.small.and.a, 2, function( xx) mean( x$class[ xx] == "a"))
Hth -- Gerrit
> I can calculate this for an individual sample using this rather clumsy
> expression:
>
> length(which(x$sample1 < mean(x$sample1) & x$class=='a')) /
> length(which(x$sample1 < mean(x$sample1)))
>
> I'd normally propagate it across the data frame using apply, but I
> can't because it depends on more than one column.
>
> Any help much appreciated!
>
> Cheers,
>
> Simon
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list