[R] Matched pairs with two data frames

David Winsemius dwinsemius at comcast.net
Fri Apr 18 15:29:16 CEST 2008


Udo <ukoenig at med.uni-marburg.de> wrote in
news:1208462659.4807ad43cea9d at webmail.med.uni-marburg.de: 

> Daniel,
> thank you!
> 
> I want to perfrom the simplest way of matching:
> a one-to-one exact match (by age and school):
> for every case in "treat" find ONE case (if there is one) in
> "control" . The cases in "control" that could be matched, should be
> tagged as not available or taken away (deleted) from the control
> pool (thus, the used ones are not replaced).
> 
> #treatment group
> treat <- data.frame(age=c(1,1,2,2,2,4),
>                     school=c(10,10,20,20,20,11),
>                     out1=c(9.5,2.3,3.3,4.1,5.9,4.6))
> 
> #control group
> control <- data.frame(age=c(1,1,1,1,3,2),
>                       school=c(10,10,10,10,33,20),
>                       out2=c(1.1,2,3.5,4.9,5.2,6.5))
> 
> #one-to-one exat matching-alorithmus ????
> 
> matched.data.frame <- ?????
> 
> In my example I matched the cases "by hand" to make things clear.
> Case 1 from "treat" was matched with case 1 from "control",
> 2 with 2 and 3 with 6. Case 4, 5 and 6 could not be matched,
> because there is no "partner" in "control" .
> Thus my matched example data frame has 3 cases.

Is it really the case that SPSS would give the output that you describe 
without any warnings about non-uniqueness? How could they live with 
themselves after such arbitrary behavior? This link is evidence that 
SPSS may not behave as you allege.
<http://kb.iu.edu/data/afit.html>

If you really want to persist in what cannot possibly be called "one-
to-one exact matching", but instead "arbitrary convenience matching", 
then you need to construct a function that sequentially marches through 
"treat", grabs the first match (perhaps with something like):

> matched.first <- merge(treat[1,],control, by= c("age","school"))[1,]
> matched.first
  age school out1 out2
1   1     10  9.5  1.1

... except that the "1"'s would be replaced with an index variable, 
then mark that control as "taken" perhaps by using all of the variables 
as identifiers, and then attempt match/marking for each successive case 
among ("taken" == FALSE") controls.

-- 
David Winsemius



More information about the R-help mailing list