[R] getting summary statistics easily with dplyr
Christopher W Ryan
cry@n @end|ng |rom b|ngh@mton@edu
Tue Nov 5 16:39:27 CET 2019
I'm trying to modernize my way of thinking, and my coding, into the
dplyr/tidyverse way of doing things.
To get basic summary statistics on a variable in a dataframe, with the
output also being a dataframe. I previously would do something like this,
using other packages:
library(doBy)
doBy.output <- summaryBy(mpg ~ am, data = mtcars, FUN = fivenum)
str(doBy.output) ## yes, it's a dataframe
## which I would then incorporate into my report via Sweave and latex
latex(doBy.output, file = "")
## Or this:
library(mosaic)
mosaic.output <- favstats(mpg ~ am, data = mtcars)
str(mosaic.output) ## yes, it's a dataframe
latex(mosaic.output, file = "")
## What would be the "dplyr way" of doing this? I know I could specify
each summary statistic individually:
library(dplyr)
dplyr.output <- mtcars %>% group_by(am) %>% summarise(min = min(mpg),
p25 = quantile(mpg, prob = 0.25),
p50 = median(mpg),
p75 = quantile(mpg, prob = 0.75),
max = max(mpg) )
str(dplyr.output) ## yes, it's a dataframe
latex(dplyr.output, file = "")
## Is there a way to use a single function like fivenum instead of
specifying each desired summary statistic? dplyr summarise() wants a
result of length 1, not 5
dplyr.output.2 <- mtcars %>% group_by(am) %>% summarise(fivenum(mpg) )
group_map or group_modify seem like they might do the job, but I could
use some guidance on the syntax.
Thanks.
--Chris Ryan
[[alternative HTML version deleted]]
More information about the R-help
mailing list