[R] Pipe operator

Greg Snow 538280 @end|ng |rom gm@||@com
Tue Jan 3 18:35:29 CET 2023


To expand a little on Christopher's answer.

The short answer is that having the different syntaxes can lead to
more readable code (when used properly).

Note that there are now 2 different (but somewhat similar) pipes
available in R (there could be more in some package(s) that I don't
know about, but will just talk about the main 2).

The %>% pipe comes from the magrittr package, but many other packages
now import that package.  But you need to load the magrittr package,
either directly or indirectly, before you can use that pipe.  The
magrittr pipe is a function call, so there is small increase in time
and memory for using it, but it is a small fraction of a second and a
few bytes of memory, so you probably will not notice the increased
usage.

The core R language now has a built in pipe |> which is handled by the
parser, so no extra function calls and you do not need to load any
extra packages (though you need a somewhat recent version of R, within
the last year or so).

The built-in |> pipe is a little pickier, you need to include the
parentheses in a function call, e.g. 1:10 |> mean() where the magrittr
pipe can work with that call or the function without parentheses, e.g.
1:10 %>% mean or 1:10 %>% mean(), this makes %>% a little easier to
work with anonymous functions.  If the previous return needs to be
passed to an argument other than the first, then %>% uses "." and |>
uses "_".

The magrittr package has additional versions of the pipe and some
functions that wrap around common operators to make it easier to use
them with pipes, so there are still advantages to loading that package
if any of those are helpful.

For a simple case like your example, the pipe probably does not help
with readability much, but as we string more function calls together.
For example, here are 3 ways to compute the geometric mean of the data
in a vector "x":

exp(mean(log(x)))

logx <- log(x)
mlx <- mean(logx)
exp(mtx)

x |>
   log() |>
   mean() |>
   exp()

These all do the same thing, but the first option is read from the
middle outward (which can be tricky) and is even more complicated if
you use additional arguments to any of the functions.
The second option reads top down, but requires creating intermediate
variables.  The last reads similar to the second, but without the
extra variables.  Spreading the series of function calls across
multiple rows makes it easier to read and easily lets you insert a
line like `print() |>` for debugging or checking intermediate results,
and single lines can easily be commented out to skip that step.

I have found myself using code like the following to compute a table,
print it, and compute the proportions all in one step:

table(f, g) |>
  print() |>
  prop.table()

The pipes also work very well with the tidyverse, or even the tidy
data ideas without those packages where we use a single function for
each change, e.g. start with a data frame, select a subset of the
columns, filter to a subset of the rows, mutate a column, join to
another data frame, then pass the final result to a modeling function
like `lm` (and then pass that result to a summary function).  This is
nicely readable when each step is its own line.

On Tue, Jan 3, 2023 at 9:49 AM Sorkin, John <jsorkin using som.umaryland.edu> wrote:
>
> I am trying to understand the reason for existence of the pipe operator, %>%, and when one should use it. It is my understanding that the operator sends the file to the left of the operator to the function immediately to the right of the operator:
>
> c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the result one obtains using the mean function directly, viz. mean(c(1:10)). What is the reason for having two syntactically different but semantically identical ways to call a function? Is one more efficient than the other? Does one use less memory than the other?
>
> P.S. Please forgive what might seem to be a question with an obvious answer. I am a programmer dinosaur. I have been programming for more than 50 years. When I started programming in the 1960s the only pipe one spoke about was a bong.
>
> John
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 using gmail.com



More information about the R-help mailing list