[R] OT: A philosophical question about statistics

Tue May 6 21:53:15 CEST 2025

Actually, what I would love to be discussed here as more On Topic is which functions and packages commonly used in R use the various kind of methods you mention. Are newer packages focused some way or another?

What are the drawbacks and advantages. As an example, if a simulation method can return different answers each time it is called, then I see anomalies if you try comparing it to the results obtained another way. I can imagine two methods that each return an answer between .99 and 1.01 where one may be a tad higher 90% of the time but 10% it is lower. Comparing results to each other or to a classical method that always return 1.00 can be misleading as it is not always ...

One final thought. Sometimes a classical approach may still be useful even if a newer twist seems to have advantages. A recent example I noticed was a discussion in the book The Da Vinci Code of the אתבש (atbash) substitution cipher. The suggestion was to write out the alephbet/alphabet (22 letters in ancient Hebrew) from one direction to the other and then rewrite in immediately below in the opposite order. For example, the short alphabet ABCDEF would be shown as:

ABCDEF
FEDCBA

And you could encode or decode by finding the letter in one line and substituting the corresponding letter in the other line.

A character suggested a new trick that avoids duplication of just folding one copy of the line to make:

ABC
FED

You now can find a letter needed in either line and simply replace it with the same letter in the other line. This is an elegant solution, albeit with an odd number of letters adjusted a bit.

But which is easier to do in a computer? Obviously both can easily be done and perhaps the first is easier to code even if it occupies a bit more space. Basically, you do some form of search in the top line and get an index of where it is found and reference the second entry in the same index location. Then again, the second method can use a simple trick with indices to get a result in less space. But, some might argue that in a language like Python, an even simpler way is to create a dictionary/hash and skip any linear representation by just asking for atbash['A'] and letting it compute a hash and address in linear time no matter how large the alphabet being used can get.

The example may be contrived but I have seen countless places where people debate which of many methods to use and often the answer turns out to be that there are tradeoffs. In one language, there is a sort algorithm that realizes that sorting one, two and three and maybe four things is trivially done by a few IF statements and only does whatever complex sort is needed if the number of items  is larger. It works really fast for small routine tasks and even something like a merge/sort can be faster as it speeds through the regions where it is down to sorting a few things and can skip some more recursive function calls.

R also has some interesting twists in doing some calculations that may help guide what available statistical functions make sense as you can use various data structures in some but not others. Sometimes you can use a matrix or one of many kinds of data.frame, for example. 

So, I am wondering if besides base R functions, are there fairly detailed packages for statistics that perhaps may be a bit like the tidyverse and some people prefer to use a well-designed and integrated .., ?

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Kevin Zembower via R-help
Sent: Tuesday, May 6, 2025 9:15 AM
To: r-help using r-project.org
Subject: Re: [R] OT: A philosophical question about statistics

Thank you to everyone who responded. I gained a lot of insight into
statistical methods and the nature of statistical thinking. I replied
to some people privately, to limit the traffic on this OT question.

And thank you for the patience of all who were annoyed by this off-
topic question, and who didn't write to complain. I promise to limit
off-topic questions in the future.

-Kevin

On Mon, 2025-05-05 at 15:17 +0000, Kevin Zembower wrote:
> I marked this posting as Off Topic because it doesn’t specifically
> apply to R and Statistics, but is rather a general question about
> statistics and the teaching of statistics. If this is annoying to
> you,
> I apologize.
> 
> As I wrap up my work in my beginning statistics course, I’d like to
> ask
> a philosophical question regarding statistics.
> 
> In my course, we’ve learned two different ways to solve statistical
> problems: simulations, using bootstraps and randomized distributions,
> and theoretical methods, using Normal (z) and t-distributions. We’ve
> learned that both systems solve all the questions we’ve asked of
> them,
> and that both give comparable answers. Out of six chapters that we’ve
> studied in our textbook, the first four only used simulation methods.
> Only the last two used theoretical methods.
> 
> My questions are:
> 
> 1) Why don’t professional statisticians settle on one or the other,
> and
> just apply that system to their problems and work? What advantage
> does
> one system have over the other?
> 
> 2) As beginning statistics students, why is it important for us to
> learn both systems? Do you think that beginning statistics students
> will still be learning both systems in the future?
> 
> Thank you very much for your time and effort in answering my
> questions.
> I really appreciate the thoughts of the members of this group.
> 
> -Kevin
> 
> 
> 
> 

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.