[R] FW: Bubble plots
hadley wickham
h.wickham at gmail.com
Sat Aug 2 15:24:33 CEST 2008
On Sat, Aug 2, 2008 at 8:10 AM, Frank E Harrell Jr
<f.harrell at vanderbilt.edu> wrote:
> Cody Hamilton wrote:
>>
>> Is there a way to create a 'bubble plot' in R?
>>
>> For example, if we define the following data frame containing the level of
>> y observed for 5 patients at three time points:
>>
>> time<-c(rep('time 1',5),rep('time 2',5),rep('time 3',5))
>> y<-c('a','b','c','d','a','b','c','a','d','a','a','a','b','c','d')
>> D<-data.frame(cbind(y,time))
>>
>> I would like to display the percentage of subjects in each level of y at
>> each time point as a bubble whose size is proportional to the percentage of
>> subjects in the given level of y at the given time point. Thus, in the case
>> of the data frame above the plot would have the levels of y
>> ('a','b','c','d') on the y-axis and the levels of time ('time 1','time 2',
>> time 3') on the x-axis with four bubbles above each time point (e.g. the
>> size of the bubble in the bottom left corner of the plot would be
>> proportional to the percentage of patients with y='a' at time='time 1').
>>
>> I am running R 2.7.1 under windows.
>>
>> Regards,
>> -Cody
>>
>
> The xYplot function in the Hmisc package can do that. It may be more
> elegant using ggplot2.
It's certainly possible to do it with ggplot2:
tab <- prop.table(table(D), margin = 2)
df <- as.data.frame(tab, responseName = "freq")
library(ggplot2)
qplot(y, time, data = df, size = freq)
qplot(y, time, data = df, size = freq) + scale_area()
qplot(y, time, data = df, size = freq) + scale_area(to=c(1,5))
But it wouldn't recommend it - you're trying to visualise an important
number (frequency) using a perceptual mapping (size) that humans
aren't very good at. Why not do a scatterplot of frequency vs time?
qplot(time, freq, data=df, colour = y)
There are only a few different values of freq for this example, so a
little jittering helps:
qplot(time, freq, data=df, colour = y, geom="jitter")
Since you have time on the x-axis it's common to use a line plot:
df$time <- as.numeric(gsub("time ", "", df$time))
qplot(time, freq, data=df, colour = y, geom="line")
although again you have an overplotting problem, which you could solve
with jittering:
qplot(time, freq, data=df, colour = y, geom="line", position="jitter")
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list