[R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?
Kelly Thompson
kt1572757 @end|ng |rom gm@||@com
Sat Apr 9 03:26:10 CEST 2022
#Q. How can I "apply" a function that takes two or more vectors as
arguments, such as cor(x, y), over a "category" or "grouping variable"
or "index"?
#I'm using cor() as an example, I'd like to find a way to do this for
any function that takes 2 or more vectors as arguments.
#create example data
my_category <- rep ( c("a","b","c"), 4)
set.seed(12345)
my_x <- rnorm(12)
set.seed(54321)
my_y <- rnorm(12)
my_df <- data.frame(my_category, my_x, my_y)
#review data
my_df
#If i wanted to get the correlation of x and y grouped by category, I
could use this code and loop:
my_category_unique <- unique(my_category)
my_results <- vector("list", length(my_category_unique) )
names(my_results) <- my_category_unique
#start i loop
for (i in 1:length(my_category_unique) ) {
my_criteria_i <- my_category == my_category_unique[i]
my_x_i <- my_x[which(my_criteria_i)]
my_y_i <- my_y[which(my_criteria_i)]
my_correl_i <- cor(x = my_x_i, y = my_y_i)
my_results[i] <- list(my_correl_i)
} # end i loop
#review results
my_results
#Q. Is there a better or more "elegant" way to do this, using by(),
aggregate(), apply(), or some other function?
#This does not work and results in this error message: "Error in
FUN(dd[x, ], ...) : incompatible dimensions"
by (data = my_x, INDICES = my_category, FUN = cor, y = my_y)
#This does not work and results in this error message: "Error in
cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like
'x' "
by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor
(my_df$x, my_df$y) } )
#if I wanted the mean of x by category, I could use by() or aggregate():
by (data = my_x, INDICES = my_category, FUN = mean)
aggregate(x = my_x, by = list(my_category), FUN = mean)
#Thanks!
More information about the R-help
mailing list