[R] Graphics and LaTeX documents with the same font [double-Y-axis graphs]
Michael Friendly
friendly at yorku.ca
Sat Sep 29 19:41:40 CEST 2007
hadley wickham wrote:
> On 9/29/07, hadley wickham <h.wickham at gmail.com> wrote:
>> On 9/29/07, Michael Friendly <friendly at yorku.ca> wrote:
>>> hadley wickham wrote:
>>>> I was interested to see that you have code for drawing scatterplots
>>>> with multiple y-axes. As far as I know the only legitimate use for a
>>>> double-axis plot is to confuse or mislead the reader (and this is not
>>>> a very ethical use case). Perhaps you have a counter-example?
>>>>
>>>> Hadley
>>>>
>>> While it is true that the double-Y-axis graph is generally considered
>>> sinful, it can be used effectively to show the relation of two time
>>> series in ways that other graphs can't do as well.
>>>
>>> For one striking example,
>>> a political, presentation graphic, see:
>>> http://www.math.yorku.ca/SCS/Gallery/images/commonsenserevolution6.pdf
>>> described on my Graphical Excellence page,
>>> http://www.math.yorku.ca/SCS/Gallery/excellence.html
>>> I found it easy to excuse the sin by the 'wow effect' produced by the
>>> graph.
>> While I agree that the double y-axis plot can be used to compare two
>> time series, I'm not sure whether or not it actually is effective.
>> The appearance of the display is so critically dependent on the
>> relative scales of the axes, that it is easy to draw the wrong
>> conclusion. Why not use a scatterplot or path plot (i.e. connect
>> subsequent observations with edges) if you want to understand the
>> relationship between two variables?
>
> To compare the scatterplot vs double axis plot, I used graphclick
> (http://www.arizona-software.ch/graphclick/) to digitise the graphic,
> to get the following dataset:
>
> csr <- structure(list(year = c(1985, 1986, 1987, 1988, 1989, 1990, 1991,
> 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
> 2003, 2004, 2005, 2006), deaths = c(1, 1, 7, 5, 12, 3, 7, 5,
> 4, 6, 8, 19, 26, 20, 42, 41, 45, 41, 27, 52, 67, 50), income = c(NA,
> 8572, NA, NA, 9264, 10071, 10338, 10687, 10666, 10666, 9907,
> 8141, 8059, 7997, 7874, 7648, 7484, 7319, 7135, 7135, 7011, NA
> )), .Names = c("year", "deaths", "income"), row.names = c(NA,
> -22L), class = "data.frame")
>
> and produce the attached graphic (I'm not sure if the attachment will
> make it to r-help, but the code should be reproducible on any system):
>
> library(ggplot2)
> ggplot(csr, aes(x=deaths, y=income)) +
> geom_path(colour="grey80") + geom_point()
>
> # or without connecting lines
> ggplot(csr, aes(x=deaths, y=income)) + geom_point()
>
> I find this graph much easier to interpret - one can see outliers, the
> suggestion of non-linearity etc. It would also be easy to add the
> political party with colour or shape.
>
> I'm not sure if it's a good idea to include the line or not - the
> gestalt principle of connectedness makes it very difficult to
> interpret the points as separate objects even when the line connecting
> them is so faint.
>
> Hadley
>
Thanks for trying this, Hadley, because the comparison
is instructive in terms of the difference between the
communication goals of analysis and presentation graphs.
Actually, one should regard income as the independent variable,
deaths as response, so what you want is
> ggplot(csr, aes(y=deaths, x=income)) +
+ geom_path(colour="grey80") + geom_point()
>
but, instead of/in addition to geom_path, a bolder loess smooth
would show the trend better.
This does, indeed show the inverse, and non-linear relation
between welfare income and deaths more directly, a few outliers.
Good for an analysis graph, but it fails the Interocular Traumatic
Test for a presentation graph-- the message should hit you between
the eyes.
Even
with use of color/shape to represent the party in power,
the stark message of the original is lost: When the Mike
Harris conservatives came to power in Ontario in June 1995, they slashed
welfare payments, and the number deaths of homeless people
increased dramatically. This trend continued under the McGuinty
liberals, elected in Oct 2003. It's particularly poignant that
bars for deaths are made from the names of the homeless who died
(and sad to see the number of John/Jane Doe among them).
To explore this further, I added a column for party to the
csr dataframe, but the transitions between parties occurred
in different months, and one would need a separate datafram
to represent that precisely.
year deaths income party
1 1985 1 NA Liberal
2 1986 1 8572 Liberal
3 1987 7 NA Liberal
4 1988 5 NA Liberal
5 1989 12 9264 Liberal
6 1990 3 10071 NDP
7 1991 7 10338 NDP
8 1992 5 10687 NDP
9 1993 4 10666 NDP
10 1994 6 10666 NDP
11 1995 8 9907 Conservative
12 1996 19 8141 Conservative
13 1997 26 8059 Conservative
14 1998 20 7997 Conservative
15 1999 42 7874 Conservative
16 2000 41 7648 Conservative
17 2001 45 7484 Conservative
18 2002 41 7319 Conservative
19 2003 27 7135 Liberal
20 2004 52 7135 Liberal
21 2005 67 7011 Liberal
22 2006 50 NA Liberal
>
-Michael
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list