[R] OT: A philosophical question about statistics
Sorkin, John
j@ork|n @end|ng |rom @om@um@ry|@nd@edu
Mon May 5 23:33:23 CEST 2025
Chris,
In all likelihood, if computers had been invented, "traditional" statistics would have been invented, but it would be less fully developed than it currently is.
While resampling, simulations, etc. can give answers, they have at least two drawbacks. First, compared to "traditional" methods the new methods can be very time consuming. Second compared to "traditional" methods they can be very resource intensive (i.e. requiring a lot of electrical power). Traditional methods, but taking advantage of a few assumptions, can often obtain an answer to a question faster and with less usage of resources than the newer methods.
A modern statistician would do well to learn both "traditional" and newer methods, and use the best method given the question and resources at hand.
John
John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;
Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382
________________________________________
From: R-help on behalf of Chris Ryan
Sent: Monday, May 5, 2025 5:02 PM
To: 'R-help email list'
Subject: Re: [R] OT: A philosophical question about statistics
I've often wondered how the field of statistics, and statistical
education, would have evolved if modern-day computers and software and
programming were available in the early years. Would the "traditional"
methods, requiring simplifying assumptions, have been developed at all?
--Chris Ryan
avi.e.gross using gmail.com wrote:
> A brief answer to this OT question is that many disciplines do the same
> thing and teach multiple methods, including some that are historical and are
> no longer really used.
>
> But since you say this was an intro course, it would not prepare you well if
> later courses and the real world expose you to uses of the other methods
> such as being asked to maintain or extend applications already in use from a
> while back that use one or another or combinations.
>
> As others have noted, this is not really a case of either/or. It is both.
> Would you make US students choose between knowing the metric system and the
> one more commonly used now? I see many things labeled with both kinds of
> measures, including car speedometers.
>
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Bert Gunter
> Sent: Monday, May 5, 2025 3:09 PM
> To: Ebert,Timothy Aaron <tebert using ufl.edu>
> Cc: R-help email list <r-help using r-project.org>; Kevin Zembower
> <kevin using zembower.org>
> Subject: Re: [R] OT: A philosophical question about statistics
>
> Heh. I suspect you'll get some interesting responses, but I won't try to
> answer your questions. Instead, I'll just say:
>
> (All just imo, so caveat emptor)
>
> 1. What you have been taught is mostly useless for addressing "real"
> statistical issues;
>
> 2. Most of my 40 or so years of statistical practice involved trying to
> define the questions of interest and determining whether there existed or
> how to best obtain relevant data to answer those questions. Once/if that
> was done, how to obtain answers from the data was usually straightforward.
>
> Cheers,
>
> Bert
> "An educated person is one who can entertain new ideas, entertain others,
> and entertain herself."
>
>
> On Mon, May 5, 2025, 18:12 Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>
>> (adding slightly to Gregg's answer)
>> Why do professionals use both? Computer intensive methods (bootstrap,
>> randomization, jackknife) are data hungry. They do not work well if I have
>> a sample size of 4. One could argue that the traditional methods also have
>> trouble, but one could also think of the traditional approach as assuming
>> unobserved values. Assuming that the true distribution is represented by
> my
>> 4 observations then ...
>> Computer intensive approaches have not been readily available until the
>> invention of widely available faster computers. There is a large body of
>> information and long experience with the traditional methods in all
>> scientific disciplines. If you are unfamiliar with these approaches, then
>> you may not fully understand that key paper published 30 years ago.
>> We like to think we have "the answer" but there are times where the
>> answer we get depends on how we ask the question. The different tests ask
>> the same question in different ways. Does the answer for your data change
>> depending on what approach is used? If so, then what assumption or which
>> test is problematic and why?
>>
>> Tim
>>
>>
>> -----Original Message-----
>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Gregg Powell via
>> R-help
>> Sent: Monday, May 5, 2025 12:06 PM
>> To: Kevin Zembower <kevin using zembower.org>
>> Cc: R-help email list <r-help using r-project.org>
>> Subject: [R] OT: A philosophical question about statistics
>>
>> [External Email]
>>
>> Hi Kevin,
>> It might seem like simulation methods (bootstrapping and randomization)
>> and traditional formulas (Normal or t-distributions) are just two ways to
>> do the same job. So why learn both? Each approach has its own strengths,
>> and statisticians use both in practice.
>>
>> Why do professionals use both?
>> Each method offers something the other can't. In practice, both
>> simulation-based and theoretical techniques have unique strengths and
>> weaknesses, and the better choice depends on the problem and its
>> assumptions (check out - biopharmaservices.com). Simulation methods are
>> very flexible. They don't need strict formulas and still work even if
>> classical conditions (like "data must be Normal") aren't true. Theoretical
>> methods are quicker and widely understood. When their assumptions hold,
>> they give fast, exact results (a simple formula can yield a confidence
>> interval, again, check out - biopharmaservices.com).
>>
>> Advantages of each approach
>> * Simulation-based methods: Intuitive and flexible. They require fewer
>> assumptions, so they work well even for odd datasets.
>> * Theoretical methods: Quick to calculate and convenient. Based on
>> well-known formulas and widely trusted (when standard assumptions hold).
>>
>> Why learn both?
>> Knowing both makes you versatile. Simulations give you a feel for what's
>> happening behind the scenes, while theory provides quick shortcuts and
>> deeper insight. A statistician might use a t-test formula for a simple
> case
>> but switch to bootstrapping for a complex one. Each method can cross-check
>> the other. Mastering both approaches gives you confidence in your results.
>>
>> Will future students learn both?
>> Probably yes. Computers now make simulation methods easy to use, so
>> they're more common in teaching. Meanwhile, classic Normal and t methods
>> aren't going away - they're fundamental and still useful. Future students
>> will continue to learn both, getting the best of both worlds.
>>
>> Good luck in your studies!
>> gregg
>>
>>
>>
>> On Monday, May 5th, 2025 at 8:17 AM, Kevin Zembower via R-help <
>> r-help using r-project.org> wrote:
>>
>>>
>>>
>>> I marked this posting as Off Topic because it doesn't specifically
>>> apply to R and Statistics, but is rather a general question about
>>> statistics and the teaching of statistics. If this is annoying to you,
>>> I apologize.
>>>
>>> As I wrap up my work in my beginning statistics course, I'd like to
>>> ask a philosophical question regarding statistics.
>>>
>>> In my course, we've learned two different ways to solve statistical
>>> problems: simulations, using bootstraps and randomized distributions,
>>> and theoretical methods, using Normal (z) and t-distributions. We've
>>> learned that both systems solve all the questions we've asked of them,
>>> and that both give comparable answers. Out of six chapters that we've
>>> studied in our textbook, the first four only used simulation methods.
>>> Only the last two used theoretical methods.
>>>
>>> My questions are:
>>>
>>> 1) Why don't professional statisticians settle on one or the other,
>>> and just apply that system to their problems and work? What advantage
>>> does one system have over the other?
>>>
>>> 2) As beginning statistics students, why is it important for us to
>>> learn both systems? Do you think that beginning statistics students
>>> will still be learning both systems in the future?
>>>
>>> Thank you very much for your time and effort in answering my questions.
>>> I really appreciate the thoughts of the members of this group.
>>>
>>> -Kevin
>>>
>>>
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat/
>>> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl.edu
>>> %7C17e2085007584244e78708dd8beebce9%7C0d4da0f84a314d76ace60a62331e1b84
>>> %7C0%7C0%7C638820579678440788%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGki
>>> OnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ
>>> %3D%3D%7C0%7C%7C%7C&sdata=C26Jn2LVk5CW1IXEglWxFRCuLfjC7LB3p6QBH2KkVCI%
>>> 3D&reserved=0 PLEASE do read the posting guide
>>> https://www/.
>>> r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu%7C1
>>> 7e2085007584244e78708dd8beebce9%7C0d4da0f84a314d76ace60a62331e1b84%7C0
>>> %7C0%7C638820579678469839%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRy
>>> dWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%
>>> 3D%7C0%7C%7C%7C&sdata=arwwwchCqqRHcCLVTXQSfneEUX2yp6ucFp%2B4IBhrkv8%3D
>>> &reserved=0 and provide commented, minimal, self-contained,
>>> reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> https://www.r-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.r-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list