[R] Scatterplot Showing All Points
Duncan Murdoch
murdoch at stats.uwo.ca
Tue Dec 18 19:05:03 CET 2007
On 12/18/2007 12:44 PM, Antony Unwin wrote:
> On 18 Dec 2007, at 4:49 pm, Duncan Murdoch wrote:
>
>>> One good alternative here is the fluctuation diagram variant of a
>>> mosaic plot:
>>> xx<-as.factor(x)
>>> yy<-as.factor(y)
>>> imosaic(xx,yy, type="f")
>>
>> That plot is better than jittering, but there's the problem in the
>> mosaic plot of understanding the scale of the rectangles: is it
>> area or diameter that encodes the count?
>
> Area is used.
>
>> With a jittered plot, you lose resolution when the number of points
>> gets too high because you just see a mess of ink, but at least you
>> only require the viewer to count in order to get a close numerical
>> reading from the plot.
>
> If someone needs a count, they should be given a table. Graphics
> are for qualitative conclusions not details. Anyway, counting will
> only work for really small datasets.
>
>> I could also claim that while imperfect, at least jittering is
>> widely applicable. For example, if the data were not on a regular
>> grid, perhaps because they had been generated like this:
>>
>> xloc <- rnorm(50)
>> yloc <- rnorm(50)
>> index <- sample(1:50, 5000, rep=TRUE, prob = abs(xloc))
>> x <- xloc[index]
>> y <- yloc[index]
>>
>> then jittering still works as well (or as poorly), but the imosaic
>> would not work at all.
>
> That's right and that's (almost) the sort of example I was thinking
> of. For a limited number of locations like this a bubble plot would
> be best (which has already been suggested in this thread, I think).
> For many locations and few replications I would still go for varying
> pointsize and transparency.
>
> Incidentally, to check your suggestion I ran your code and discovered
> that the transparency in iplot does not seem to like replications.
> Very strange, we'll have to check why. I then looked closely at the
> numbers of replications generated and discovered that case 25 was
> picked 325 times and case 40 only once. Rather too extreme for my
> liking! Running it again gave very similar results, though not
> exactly the same: this time it was 325 times for case 25 and case 40
> was not picked at all. Other numbers varied slightly. This is not
> what I expected, any ideas?
abs(xloc) typically varies by a factor of about 100 from smallest to
largest, but sometimes the small end is really small, and so the ratio
is really big.
Duncan Murdoch
>
>> P.S. iplots 1.1-1 may have an init problem in Windows: in my first
>> attempt, the plot made the boxes too large to fit in their cells,
>> but it fixed itself when I resized the window, and the bug doesn't
>> seem to be repeatable.
>
> Thanks. This happens occasionally on the Mac too. Refreshing solves
> it in practice, but we need to find out why it can happen (and stop
> it happening!).
>
> Antony Unwin
> Professor of Computer-Oriented Statistics and Data Analysis,
> University of Augsburg,
> Germany
More information about the R-help
mailing list