[R] diff() for panel data

hadley wickham h.wickham at gmail.com
Sun Sep 21 03:25:40 CEST 2008


Hi Gabriel,

On Sat, Sep 20, 2008 at 7:54 PM, Gabriel Paul Mihalache
<mihalache at gmail.com> wrote:
> I was suggested that more details with help re: my question on first
> differences in panel data...
> The data set in question is PWT6.2:
>
>> str(pwt6.2)
> 'data.frame':   10340 obs. of  27 variables:
>  $ country: Factor w/ 188 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
>  $ isocode: Factor w/ 188 levels "AFG","ALB","DZA",..: 1 1 1 1 1 1 1 1 1 1 ...
>  $ year   : int  1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ...
>  $ pop    : num  8150 8284 8425 8573 8728 ...
>  ...
>  $ cgdp   : num  NA NA NA NA NA NA NA NA NA NA ...
>  ...
>  $ grgdpch: num  NA NA NA NA NA NA NA NA NA NA ...
>
> The panel has countries as units and years for time.
> What I want to do is have a fist different of logs of cgdp and pop.
>
> The reason why I can't use diff(log(pop)) is because when the data for
> a country ends, e.g. Afghanistan 2004, the next observation belongs to
> another country, e.g. Albania 1950, and R will do a first log
> difference between the population of Albania in 1950 and that of
> Afghanistan in 2004, instead of NA (since I don't have data for
> Albania 1949).
>
> I need a diff() that's aware of the panel structure of the data (i.e.
> the countries).
> The plm package does this, but only in a regression. I found no
> function that I can use to save the log differences.

This doesn't help you immediately, but in the next couple of days I'll
be releasing the plyr function which would allow you do something
like:

ddply(pwt6.2, .(country), transform, dpop = c(NA, diff(pop))

to add a new column dpop with the population differences (assuming the
data is already sorted by year), within each country.

Hadley


-- 
http://had.co.nz/



More information about the R-help mailing list