[R] dplyr: summarise across using variable names and a condition

Fri Mar 26 18:08:50 CET 2021

Hello,

Here is a way of doing what the question asks for. There might be 
others, simpler, but this one works.

have %>%
   summarise(across(
     .cols = !contains("ptno"),
     .fns = list(mean = mean, std = sd),
     .names = "{col}_{fn}"
   )) %>%
   select(
     -matches("^gender_.*_std$"),
     -matches("^race_.*_std$")
   ) %>%
   rename_with(
     .cols = matches("^gender|^race"),
     ~sub("mean$", "prop", .x)
   ) %>%
   all.equal(need)
#[1] TRUE

Hope this helps,

Rui Barradas

Às 13:47 de 26/03/21, Paul Miller via R-help escreveu:
> Hello All,
> 
> Would like to be able to summarize across in dplyr using variable names and a condition. Below is an example "have" data set followed by an example "need" data set. After that, I've got a vector of numeric variable names. After that, I've got the very humble beginnings of a dplyr-based solution.
> 
> What I think I need to be able to do is to submit my variable names to dplyr and then to have a conditional function. If the variable is is in my list of names, calculate the mean and the std. If not, then calculate the mean but label it as a proportion. The question is how to do that. It appears that using variable names might involve !!, or possibly enquo, or possibly quo, but I haven't had much success with these. I imagine I might have been very close but not quite have gotten it. The conditional part seems less difficult but I'm not quite sure how to do that either.
> 
> Help with this would be greatly appreciated.
> 
> Thanks,
> 
> Paul
> 
> 
> have <- structure(list(
>          ptno = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
>                   "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"),
>          age1 = c(74, 70, 78, 79, 72, 81, 76, 58, 53, 74, 72, 74, 75,
>                   73, 80, 62, 67, 65, 83, 67, 72, 90, 73, 84, 90, 51),
>          age2 = c(71, 67, 72, 74, 65, 79, 70, 49, 45, 68, 70, 71, 74,
>                   71, 69, 58, 65, 59, 80, 60, 68, 87, 71, 82, 80, 49),
>          gender_male = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L,
>                          1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L),
>          gender_female = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L,
>                            0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L),
>          race_white = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L,
>                         1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
>          race_black = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>                         0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>          race_other = c(1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
>                         0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)),
>          row.names = c(NA, -26L), class = c("tbl_df", "tbl", "data.frame"))
>   
> 
> need <-structure(list(
>         age1_mean = 72.8076923076923, age1_std = 9.72838827666425,
>         age2_mean = 68.2307692307692, age2_std = 10.2227498934785,
>         gender_male_prop = 0.576923076923077, gender_female_prop = 0.423076923076923,
>         race_white_prop = 0.769230769230769, race_black_prop = 0.0384615384615385,
>         race_other_prop = 0.192307692307692),
>         row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
> 
> vars_num <-  c("age1", "age2")
> 
> library(magrittr)
> library(dplyr)
> 
> have %>%
>    summarise(across(
>    .cols = !contains("ptno"),
>    .fns = list(mean = mean, std = sd),
>    .names = "{col}_{fn}"
> ))
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>