[R] [OT] propensity score implementation

Frank E Harrell Jr f.harrell at vanderbilt.edu
Sun Nov 9 03:39:50 CET 2008

Wensui Liu wrote:
> Dear All,
> My question is more a statistical question than a R question. The reason I
> am posting here is that there are lots of excellent statistician on this
> list, who can always give me good advices.
> Per my understanding, the purpose of propensity score is to reduce the bias
> while estimating the treatment effect and its implementation is a 2-stage
> model.
> 1) First of all, if we assume that T = 1 if an individual belongs to
> treatment group and T = 0 otherwise. We further assume that X is a covariate
> matrix to explain the assignment of treatment. Then the propensity score
> should be the probability of treatment exposure T = 1 and can be formulated
> as
> PPscore = Prob(T=1|X) = exp(A * X) / [1 + exp(A * X)] in the range between 0
> and 1.
> 2) At the second stage, let Y = 1 / 0 is a binary outcome variable and Z the
> covariate matrix to explain outcome. In order to balance the probability of
> an individual assigned to the treatment group such that Prob(Y = 1) _|_
> Prob(T = 1|X), we should model the outcome as
> Prob(Y = 1|Z) = exp(B * Z) / [1 + exp(B * Z)] weighting or matching by
> Prob(T=1|X)
> The above is just my general understanding about propensity score. However,
> I was critisized that my understanding is wrong and was also told that the
> response variable should be Y instead of T in the propensity model at the
> 1st stage. I am very confused and like to have the opinion of experts like
> you guys.


If the response were Y then this would not be a propensity model. 
Whoever told you that is off the mark.

Think of the propensity score as a data reduction method that allows you 
to model all known baseline variables against the treatment assignment 
in order to remove confounding bias in all of them.  Then the outcome 
model can have the logit of propensity (plus nonlinear transformations 
of it) as a covariate to account for confounding.  The outcome model 
also needs to have strong predictor variables in it to account for 
outcome heterogeneity not related to confounding.  You can also using 
matching as you mentioned but I prefer to adjust for propensity by 
covariate adjustment once I check the overlap of propensity in the two 


> Any insight will be appreciated.
> Have a nice weekend!
> wensui
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list