[R] construct boxplots from data with varying column widths
Rory Campbell-Lange
rory at campbell-lange.net
Sun Jul 17 06:47:24 CEST 2011
On 16/07/11, David Winsemius (dwinsemius at comcast.net) wrote:
> From: David Winsemius <dwinsemius at comcast.net>
> On Jul 16, 2011, at 12:15 PM, Rory Campbell-Lange wrote:
> >On 16/07/11, David Winsemius (dwinsemius at comcast.net) wrote:
> >>On Jul 16, 2011, at 11:19 AM, Rory Campbell-Lange wrote:
> >>
> >>>I'm an R beginner, and I would like to construct a set of boxplots
> >>>showing database function runtimes.
> >
> >>>I can easily reformat the base data to provide it to R in a format
> >>>such as:
> >>>
> >>>function1,12.5
> >>>function1,13.11
> >>>function1,35.2
> >
> I would have guessed you would get an error, but maybe if ave() is
> given no grouping factor it just returns a grand mean.
You are correct, and my apologies for cross posting this question here
but also on stackoverflow.
> Try instead one of these:
>
> aggregate(data2, data2$function. , FUN=mean)
>
> tapply(data2$runtime, data2$function. , FUN=mean)
The two above error because of 'by'
> aggregate(data2, data2$dbfunc , FUN=mean)
Error in aggregate.data.frame(data2, data2$dbfunc, FUN = mean) :
'by' must be a list
I tried to construct a list of names for the 'by' clause and tried
again:
> funcnames <- levels(data2$dbfunc)
aggregate(data2, funcnames , FUN=mean)
but that causes the same error.
> data2$grpmean <- ave( data2$runtime, data2$function. , FUN=mean)
>
> The last one adds a column in the dataframe and could be useful for
> identifying items that are some particular diastance away from thier
> group mean.
I failed initially to see the purpose of adding the grpmean column.
However, I think I now 'get it' -- it allows one to filter.
a. build data frame
dbfunc runtime
1 fn_slot03_byperson 38.083
2 fn_slot03_byperson 32.396
3 fn_slot03_byperson 41.246
4 fn_slot03_byperson 92.904
5 fn_slot03_byperson 130.512
6 fn_slot03_byperson 113.853
b. add groupmean
data2$grpmean <- ave(data2$runtime, data2$dbfunc. , FUN=mean)
dbfunc runtime grpmean
1 fn_slot03_byperson 38.083 41.8108
2 fn_slot03_byperson 32.396 41.8108
3 fn_slot03_byperson 41.246 41.8108
4 fn_slot03_byperson 92.904 41.8108
5 fn_slot03_byperson 130.512 41.8108
6 fn_slot03_byperson 113.853 41.8108
c. filter by grpmean where grpmean over 150 ms
data3 <- data2[data2$grpmean > 150,]
d. attempt to plot
boxplot(runtime ~ dbfunc, data3)
this produces a set of circles for each function, rather that the box
and whisker plot I'm expecting.
I'm not sure how to 'fold' the results to get the equivalent of an SQL
'group by' in the results.
Thanks very much for your help, and my apologies for the cross-posting
on stackoverflow
(http://stackoverflow.com/questions/6720036/r-summarise-data-frame-with-repeating-rows-into-boxplots)
Rory
More information about the R-help
mailing list