[R] dplyr: summarise across using variable names and a condition
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri Mar 26 18:08:50 CET 2021
Hello,
Here is a way of doing what the question asks for. There might be
others, simpler, but this one works.
have %>%
summarise(across(
.cols = !contains("ptno"),
.fns = list(mean = mean, std = sd),
.names = "{col}_{fn}"
)) %>%
select(
-matches("^gender_.*_std$"),
-matches("^race_.*_std$")
) %>%
rename_with(
.cols = matches("^gender|^race"),
~sub("mean$", "prop", .x)
) %>%
all.equal(need)
#[1] TRUE
Hope this helps,
Rui Barradas
Às 13:47 de 26/03/21, Paul Miller via R-help escreveu:
> Hello All,
>
> Would like to be able to summarize across in dplyr using variable names and a condition. Below is an example "have" data set followed by an example "need" data set. After that, I've got a vector of numeric variable names. After that, I've got the very humble beginnings of a dplyr-based solution.
>
> What I think I need to be able to do is to submit my variable names to dplyr and then to have a conditional function. If the variable is is in my list of names, calculate the mean and the std. If not, then calculate the mean but label it as a proportion. The question is how to do that. It appears that using variable names might involve !!, or possibly enquo, or possibly quo, but I haven't had much success with these. I imagine I might have been very close but not quite have gotten it. The conditional part seems less difficult but I'm not quite sure how to do that either.
>
> Help with this would be greatly appreciated.
>
> Thanks,
>
> Paul
>
>
> have <- structure(list(
> ptno = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
> "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"),
> age1 = c(74, 70, 78, 79, 72, 81, 76, 58, 53, 74, 72, 74, 75,
> 73, 80, 62, 67, 65, 83, 67, 72, 90, 73, 84, 90, 51),
> age2 = c(71, 67, 72, 74, 65, 79, 70, 49, 45, 68, 70, 71, 74,
> 71, 69, 58, 65, 59, 80, 60, 68, 87, 71, 82, 80, 49),
> gender_male = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L,
> 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L),
> gender_female = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L,
> 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L),
> race_white = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L,
> 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
> race_black = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
> race_other = c(1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
> 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)),
> row.names = c(NA, -26L), class = c("tbl_df", "tbl", "data.frame"))
>
>
> need <-structure(list(
> age1_mean = 72.8076923076923, age1_std = 9.72838827666425,
> age2_mean = 68.2307692307692, age2_std = 10.2227498934785,
> gender_male_prop = 0.576923076923077, gender_female_prop = 0.423076923076923,
> race_white_prop = 0.769230769230769, race_black_prop = 0.0384615384615385,
> race_other_prop = 0.192307692307692),
> row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
>
> vars_num <- c("age1", "age2")
>
> library(magrittr)
> library(dplyr)
>
> have %>%
> summarise(across(
> .cols = !contains("ptno"),
> .fns = list(mean = mean, std = sd),
> .names = "{col}_{fn}"
> ))
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list