[R] OT: A philosophical question about statistics

Mon May 5 19:12:39 CEST 2025

(adding slightly to Gregg's answer)
Why do professionals use both? Computer intensive methods (bootstrap, randomization, jackknife) are data hungry. They do not work well if I have a sample size of 4. One could argue that the traditional methods also have trouble, but one could also think of the traditional approach as assuming unobserved values. Assuming that the true distribution is represented by my 4 observations then ... 
   Computer intensive approaches have not been readily available until the invention of widely available faster computers. There is a large body of information and long experience with the traditional methods in all scientific disciplines. If you are unfamiliar with these approaches, then you may not fully understand that key paper published 30 years ago.
   We like to think we have "the answer" but there are times where the answer we get depends on how we ask the question. The different tests ask the same question in different ways. Does the answer for your data change depending on what approach is used? If so, then what assumption or which test is problematic and why? 

Tim

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Gregg Powell via R-help
Sent: Monday, May 5, 2025 12:06 PM
To: Kevin Zembower <kevin using zembower.org>
Cc: R-help email list <r-help using r-project.org>
Subject: [R] OT: A philosophical question about statistics

[External Email]

Hi Kevin,
It might seem like simulation methods (bootstrapping and randomization) and traditional formulas (Normal or t-distributions) are just two ways to do the same job. So why learn both? Each approach has its own strengths, and statisticians use both in practice.

Why do professionals use both?
Each method offers something the other can't. In practice, both simulation-based and theoretical techniques have unique strengths and weaknesses, and the better choice depends on the problem and its assumptions (check out - biopharmaservices.com). Simulation methods are very flexible. They don't need strict formulas and still work even if classical conditions (like "data must be Normal") aren't true. Theoretical methods are quicker and widely understood. When their assumptions hold, they give fast, exact results (a simple formula can yield a confidence interval, again, check out - biopharmaservices.com).

Advantages of each approach
* Simulation-based methods: Intuitive and flexible. They require fewer assumptions, so they work well even for odd datasets.
* Theoretical methods: Quick to calculate and convenient. Based on well-known formulas and widely trusted (when standard assumptions hold).

Why learn both?
Knowing both makes you versatile. Simulations give you a feel for what's happening behind the scenes, while theory provides quick shortcuts and deeper insight. A statistician might use a t-test formula for a simple case but switch to bootstrapping for a complex one. Each method can cross-check the other. Mastering both approaches gives you confidence in your results.

Will future students learn both?
Probably yes. Computers now make simulation methods easy to use, so they're more common in teaching. Meanwhile, classic Normal and t methods aren't going away - they're fundamental and still useful. Future students will continue to learn both, getting the best of both worlds.

Good luck in your studies!
gregg

On Monday, May 5th, 2025 at 8:17 AM, Kevin Zembower via R-help <r-help using r-project.org> wrote:

>
>
> I marked this posting as Off Topic because it doesn't specifically 
> apply to R and Statistics, but is rather a general question about 
> statistics and the teaching of statistics. If this is annoying to you, 
> I apologize.
>
> As I wrap up my work in my beginning statistics course, I'd like to 
> ask a philosophical question regarding statistics.
>
> In my course, we've learned two different ways to solve statistical
> problems: simulations, using bootstraps and randomized distributions, 
> and theoretical methods, using Normal (z) and t-distributions. We've 
> learned that both systems solve all the questions we've asked of them, 
> and that both give comparable answers. Out of six chapters that we've 
> studied in our textbook, the first four only used simulation methods.
> Only the last two used theoretical methods.
>
> My questions are:
>
> 1) Why don't professional statisticians settle on one or the other, 
> and just apply that system to their problems and work? What advantage 
> does one system have over the other?
>
> 2) As beginning statistics students, why is it important for us to 
> learn both systems? Do you think that beginning statistics students 
> will still be learning both systems in the future?
>
> Thank you very much for your time and effort in answering my questions.
> I really appreciate the thoughts of the members of this group.
>
> -Kevin
>
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl.edu
> %7C17e2085007584244e78708dd8beebce9%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638820579678440788%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGki
> OnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ
> %3D%3D%7C0%7C%7C%7C&sdata=C26Jn2LVk5CW1IXEglWxFRCuLfjC7LB3p6QBH2KkVCI%
> 3D&reserved=0 PLEASE do read the posting guide 
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu%7C1
> 7e2085007584244e78708dd8beebce9%7C0d4da0f84a314d76ace60a62331e1b84%7C0
> %7C0%7C638820579678469839%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRy
> dWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%
> 3D%7C0%7C%7C%7C&sdata=arwwwchCqqRHcCLVTXQSfneEUX2yp6ucFp%2B4IBhrkv8%3D
> &reserved=0 and provide commented, minimal, self-contained, 
> reproducible code.