[R] subsetting data frame using by() or tapply() or other

Marc Schwartz (via MN) mschwartz at mn.rr.com
Thu Oct 13 23:04:00 CEST 2005


On Thu, 2005-10-13 at 14:28 -0600, Brian S Cade wrote:
> Ok so I see the problem that I'm having creating a new variable (LAG1DBC) 
> in the example data transformation below is that tapply() is creating a 
> list that is not dimensionally consistent with the data frame (data).  So 
> how do I go from the list output of tapply() to create a dimensionally 
> consistent vector that can create the new variable in my original data 
> frame?  I've been trying to use a function like
> data$LAG1DBC <- tapply(data$DBC, data$LOCID, function(x) c(NA, 
> x[-length(x)]))
> which creates a list of dimension much smaller than the nrows in data. And 
> I've tried things like using as.data.frame.array() or as.data.frame.list() 
> in front of tapply() and still have the same problem.  I know this can't 
> be that unusual of a data manipulation and that someone has to have done 
> similar things before.
> 
> I want to go from something like this:
> 
>        LOCID  POPULATION  YEAR        DBC
> 1      algb-1           A 1992 0.70451575
> 2      algb-1           A 1993 0.59506851
> 3      algb-1           A 1997 0.84837544
> 4      algb-1           A 1998 0.50283182
> 5      algb-1           A 2000 0.91242707
> 6      algb-2           A 1992 0.09747155
> 7      algb-2           A 1993 0.84772253
> 8      algb-2           A 1997 0.43974081
> 9      algb-2           A 1998 0.83108544
> 10     algb-2           A 2000 0.22291192
> 11     algb-3           A 1992 0.44234175
> 12     algb-3           A 1993 0.54089534
> 5680 taylr-73           B 2001 0.43918082
> 5681 taylr-73           B 2002 0.34694427
> 5682 taylr-73           B 2003 3.35619190
> 5683 taylr-73           B 2004 0.71575815
> 5684 taylr-73           B 2005 0.42038506
> 5685 taylr-74           B 1992 3.88410354
> 5686 taylr-74           B 1993 3.32472557
> 5687 taylr-74           B 1994 3.29861501
> 5688 taylr-74           B 1996 0.48153827
> 5689 taylr-74           B 1997 3.63570636
> 5690 taylr-74           B 1998 1.94630194
> 
> to something like this:
> 
>        LOCID  POPULATION  YEAR        DBC LAG1DBC
> 1      algb-1           A 1992 0.70451575       NA 
> 2      algb-1           A 1993 0.59506851 0.70451575
> 3      algb-1           A 1997 0.84837544       0.59506851
> 4      algb-1           A 1998 0.50283182 0.84837544
> 5      algb-1           A 2000 0.91242707       0.50283182
> 6      algb-2           A 1992 0.09747155       NA
> 7      algb-2           A 1993 0.84772253 0.09747155
> 8      algb-2           A 1997 0.43974081       0.84772253
> 9      algb-2           A 1998 0.83108544       0.43974081
> 10     algb-2           A 2000 0.22291192       0.83108544
> 11     algb-3           A 1992 0.44234175       NA
> 12     algb-3           A 1993 0.54089534       0.44234175
> 5680 taylr-73           B 2001 0.43918082       NA
> 5681 taylr-73           B 2002 0.34694427       0.43918082
> 5682 taylr-73           B 2003 3.35619190       0.34694427
> 5683 taylr-73           B 2004 0.71575815       3.35619190
> 5684 taylr-73           B 2005 0.42038506       0.71575815
> 5685 taylr-74           B 1992 3.88410354       NA
> 5686 taylr-74           B 1993 3.32472557       3.88410354
> 5687 taylr-74           B 1994 3.29861501       3.32472557
> 5688 taylr-74           B 1996 0.48153827       3.29861501
> 5689 taylr-74           B 1997 3.63570636       0.48153827
> 5690 taylr-74           B 1998 1.94630194       3.63570636
> 
> Brian

Brian,

Use unlist():

> data$LAG1DBC <- unlist(tapply(data$DBC, data$LOCID, 
                         function(x) c(NA, x[-length(x)])))

> data
        LOCID POPULATION YEAR        DBC    LAG1DBC
1      algb-1          A 1992 0.70451575         NA
2      algb-1          A 1993 0.59506851 0.70451575
3      algb-1          A 1997 0.84837544 0.59506851
4      algb-1          A 1998 0.50283182 0.84837544
5      algb-1          A 2000 0.91242707 0.50283182
6      algb-2          A 1992 0.09747155         NA
7      algb-2          A 1993 0.84772253 0.09747155
8      algb-2          A 1997 0.43974081 0.84772253
9      algb-2          A 1998 0.83108544 0.43974081
10     algb-2          A 2000 0.22291192 0.83108544
11     algb-3          A 1992 0.44234175         NA
12     algb-3          A 1993 0.54089534 0.44234175
5680 taylr-73          B 2001 0.43918082         NA
5681 taylr-73          B 2002 0.34694427 0.43918082
5682 taylr-73          B 2003 3.35619190 0.34694427
5683 taylr-73          B 2004 0.71575815 3.35619190
5684 taylr-73          B 2005 0.42038506 0.71575815
5685 taylr-74          B 1992 3.88410354         NA
5686 taylr-74          B 1993 3.32472557 3.88410354
5687 taylr-74          B 1994 3.29861501 3.32472557
5688 taylr-74          B 1996 0.48153827 3.29861501
5689 taylr-74          B 1997 3.63570636 0.48153827
5690 taylr-74          B 1998 1.94630194 3.63570636

HTH,

Marc Schwartz




More information about the R-help mailing list