[R] Use of geometric mean .. in good data analysis
John Fox
j|ox @end|ng |rom mcm@@ter@c@
Mon Jan 22 18:36:40 CET 2024
Dear Martin,
Helpful general advice, although it's perhaps worth mentioning that the
geometric mean, defined e.g. naively as prod(x)^(1/length(x)), is
necessarily 0 if there are any 0 values in x. That is, the geometric
mean "works" in this case but isn't really informative.
Best,
John
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
On 2024-01-22 12:18 p.m., Martin Maechler wrote:
> Caution: External email.
>
>
>>>>>> Rich Shepard
>>>>>> on Mon, 22 Jan 2024 07:45:31 -0800 (PST) writes:
>
> > A statistical question, not specific to R. I'm asking for
> > a pointer for a source of definitive descriptions of what
> > types of data are best summarized by the arithmetic,
> > geometric, and harmonic means.
>
> In spite of off-topic:
>
> I think it is a good question, not really only about
> geo-chemistry, but about statistics in applied sciences (and
> engineering for that matter).
>
> Something I sure good applied statisticians in the 1980's and
> 1990's would all know the answer of :
>
> To use the geometric mean instead of the arithmetic mean
> is basically *equivalent* to first log-transform the data
> and then work with that transformed data:
> Not just for computing average, but for more relevant modelling,
> inference, etc.
>
> John W Tukey (and several other of the grands of the time)
> had the log transform among the "First aid transformations":
>
> If the data for a continuous variable must all be positive it is
> also typically the case that the distribution is considerably
> skewed to the right.
> In such a case behave as a good human who sees another human in
> health distress: apply First Aid -- do the things you learned to
> do quickly without too much thought, because things must happen
> fast ---to hopefully save the other's life.
>
> Here: Do log transform all such variables with further ado,
> and only afterwards start your (exploratory and more) data analysis.
>
> Now, mean(log(y)) = log(geometricmean(y)),
> where mean() is the arithmetic mean as in R
> {mathematically; on the computer you need all.equal(), not '==' !!}
>
> I.e., according to Tukey and all the other experienced applied
> statisticians of the past, the geometric mean is the "best thing"
> to do for such positive right-skewed data in the same sense
> that the log-transform is the best "a priori" transformation for
> such data -- with the one advantage even that you need to fiddle
> with zeroes when log-transforming, whereas the geometric mean
> works already for zeroes.
>
> Martin
>
>
> > As an aquatic ecologist I see regulators apply the
> > geometric mean to geochemical concentrations rather than
> > using the arithmetic mean. I want to know whether the
> > geometric mean of a set of chemical concentrations (e.g.,
> > in mg/L) is an appropriate representation of the expected
> > value. If not, I want to explain this to non-technical
> > decision-makers; if so, I want to understand why my
> > assumption is wrong.
>
> > TIA,
>
> > Rich
>
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and
> > more, see https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide
> > commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list