[R] (Meta-analysis) How to build|fake a [n]lm[e] object ?

Wed Dec 5 13:47:54 CET 2001

Dear all,

I recently had to review the current litterature about some medical treatment
with two possible variants (let's call them A and B). I collected all available
prospective randomized trials about this treatment : I got four trials for the
A variant and three for the B variant, all studies comparing one variant to a
"suitably choosen" placebo.

Two classes of variables are of interest here :
	a) the net effect of the treatment, which is assessed by some (set of)
numerical
	   values, with distributions not too far from the normal ;
	b) the side effects of the treatment, assessed by the number of occurences of
	   (a set of) undesirable events.

The papers report :
	a) for the numerical variables : sample size, mean and SD (or SE, which allows
to 	   recompute SD) of each group, plus some test statistic (usually Student's
T) ;
	b) for events : the sample size and number of events in each group, plus some
	   test statistic (usually chi-square, sometimes incorrectly used : the
	   continuity correction is often forgotten, an the exact Fisher test is
almost
	   unheard of ...).

It made medical sense to consider the "variant" factor ancillary to the
treatment factor (that is, to *postulate* that the difference in treatment
effects between variant is much smaller that the treatment effect itself);
therefore, it is not a big problem to exclude it in the analysis. So I used the
rmeta package to assess the treatment effects. The results, as far as I can
tell, are not unreasonable.

However, I have two problems with this approach :

A) Assessing the "variant" effect : how ?
=========================================

My main problem is that I can't assess formally the (quite possibly null)
effect of the "variant" factor (i. e. checking, at least a posteriori, that the
"variant" effect is indeed much smaller that the treatment effect). In other
words, if I had had the trials' raw data, what I would have used would have
been, for numerical variables, something along the lines of :

meta.lme<-lme(Variable~Treatment*Variant/Trial, data=xxx, random=~1|Trial)

for a "random trial effect" (à la Der Simonian), and

meta.lm<-lm(Variable~Treatment*Variant/Trial, data=xxx)

for a "fixed trial effect" model, "treatment" and "variant" being of course
fixed effects of interest, the Treatment*Variant interaction being the variable
of interest for the verification of the homogeneity of treatment effect between
variants. (In my case, the trials are somewhat heterogenous (due tio not having
the same inclusion criteria), therefore the "random effect" model makes more
sense).

However, I do *not* have the raw data. Of course, I can trivially rebuild the
"sum-of-data" and "sum-of-squares" in each "cell" of the potential
"experimental plan". But I'm not able to analyse this. I looked in old books
(some dating back from the '50s, wher computers were not readily available for
biostatistics) and saw that all algorithms used back then supposed a *balanced*
experimental plan. Some approximations were used (such as using the harmonic
means of sample sizes to compute the expectations of "between-rows",
"between-columns", "between-cells" and "within-cells" variances under the null
hypothesis, but those approximations can only be used for *mild* unbalances. In
my case, this won't do : Per-group sample size varies between 10 and 244, and
there is always some unbalance between treatment groups (mainly due to
stratification effects). That's *not* "mild" ...

I tried to follow Winer's explanation of what he calls "least-squares
estimation" (that's what all modern ANOVA software, including lm and friends,
do) to see if I could build an algorithm from this ... and got lost (I'm pretty
bad at linear algebra).

However, it appears that a lm object contains just the kind of data one can
extract from a pile of papers : one can build such an object with each group of
each paper a line, with a "residual" computed from the published SD, a "value"
computed from the published mean and a "weight" computed from te sample size.
Given that drop, anova and related functions do not have to re-fit the model to
assess effects, one could then analyse this artificially-reconstructed lm
object.

Hence my questions :
	a) Am I totally wrong ?
	b) If not, how would you build such an object ?
	c) What cautions should be used in interpreting the results ?
	d) Would this approach work with a lme object ? with a (suitably built) nlme
	   object (in order to assess "variant" effect on event data) ?
	e) Would such an approach allow to assess treatment effects for trials with
more
	   than 2 groups (e. g. placebo vs. drug vs. surgery) ?

B) Alternatives to the odds-ration for event data ?
===================================================

The usual way to assess effects for categorical variables is to compute the
log(odds-ratio) for each study and to pool them using inverse variance as
weights (that's what meta.DSL and meta.MH do, respectively for random and fixed
effect model).

However, in some trials, some event have a frequency of zero in one or both
groups. In the first case, one can neglect the said trial for the assessment of
the treatment effect, on the basis that it is not informative. In the second
case, however, the data cannot be used (because the OR is either zero or
infinite, with infine asymptotic variance). The treatment assessment by OR
pooling dismisses these trials (see meta.DSL source, for example ; and this is
also the case in other meta-analysis packages, such as Cochrane's RevMan).

But the asymetry (some events in one group and none in the other) is indeed an
information, and I do not feel at ease with discarding it. The best I can think
of is the ordinary test of independance (Fisher's test, in this case) on a
contingency table "summing" the individual trials' contingency tables. This
analysis confirms the results iof the meta-analysis. But it does not account
for trials' heterogeneity, which is a large part of the point of a
meta-analysis.

Someone suggested to me to add a "small" quantity (say 1, or 0.5, as in the
case of Yate's correction for continuity) to the event counts in these groups,
ant to see if the inclusion of these study would entail a modification of the
results, but I'm "isntinctively" not satisfied with this approach.

In my case, the meta-analysis exhibits an excess of some undesirable events in
one of the treatment groups, while this excess does not reach the sacro-sanctus
"statistical significance threshold" in any of the papers I analysed
(physicians are sometimes bloody p-value worshippers ...). Therefore, I'd like
to be damn sure to *correctly* use *all* available information.

Any suggestions or pointers to litterature ?

Sincerely yours,

						Emmanuel Charpentier

--
Emmanuel Charpentier			Tel :		+33-01 40 27 35 98
Secrétariat scientifique du CEDIT	Fax :		+33-01 40 27 55 65
Direction de la Politique Médicale // Assistance Publique - Hôpitaux de Paris
3, Avenue Victoria // F-75004 Paris /// France

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._