multiple downloads of data when evaluating plot() vs. xyplot()

Uwe Ligges ligges at statistik.tu-dortmund.de
Sun Aug 23 20:18:31 CEST 2009

Greg Hirson wrote:
> I have noticed an interesting behavior when comparing how the base 
> plot() function deals with a data argument that downloads data from the 
> internet vs. how xyplot() in lattice performs the same task.
> The goal is to plot hourly temperature data. The data is downloaded and 
> formatted for R using the function cimishourly() in the package cimis. 
> There is a line within the function that outputs the name of the file 
> being downloaded using cat().
> When using plot() to plot the data, the following is written to the 
> console:
> library(cimis)
> plot(air_temp ~ datetime, data = cimishourly("006"))
> Downloading:  ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv
> Downloading:  ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv
> When using xyplot() to perform the same plot, the data is only 
> downloaded once:
> library(lattice)
> xyplot(air_temp ~ datetime, data = cimishourly("006"))
> Downloading:  ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv
> Is this caused by a difference in how the two functions evaluate the 
> data argument?

Looks like nobody answered so far:

Yes, there are several differences.
I think you should not encapsulate downloading-functions into others 
anyway and download the data once before anything else and then start to 
work on it.

It is evaluated in plot.formula at two positions:

     if (is.matrix(eval(m$data, parent.frame())))

     mf <- eval(m, parent.frame())

Generally this is not a big issue but for your function it shows quite 
some performance penalty that can easily be avoided by downloading in 

Uwe Ligges

> Even more interesting, when adding a type = "l" argument to plot, the 
> data is downloaded 3 times.
> Thank you for your time,
> Greg

