Jim Lemon
Sat Oct 31 10:28:35 CET 2020
Hi Luigi,
If I understand your request:
library(prettyR)
apply(as.matrix(df),1,Mode)
[1] "C" "B" "D" ">1 mode" ">1 mode" ">1 mode" "D"
[8] "C" "B" ">1 mode"
Jim
On Sat, Oct 31, 2020 at 7:56 PM Luigi Marongiu <marongiu.luigi using gmail.com>
wrote:
> Hello,
> I have a large dataframe (1 000 000 rows, 1000 columns) where the
> columns contain a character. I would like to determine the most common
> character for each row.
> In the example below, I can parse one row at the time and find the
> most common character (apart for ties...). But I think this will be
> very slow and memory consuming.
> Is there a way to run it more efficiently?
> Thank you
>
> ```
> V = c("A", "B", "C", "D")
> df = data.frame(n = 1:10,
> col_01 = sample(V, 10, replace = TRUE, prob = NULL),
> col_02 = sample(V, 10, replace = TRUE, prob = NULL),
> col_03 = sample(V, 10, replace = TRUE, prob = NULL),
> col_04 = sample(V, 10, replace = TRUE, prob = NULL),
> col_05 = sample(V, 10, replace = TRUE, prob = NULL),
> stringsAsFactors = FALSE)
>
> q = vector()
> for(i in 1:nrow(df)) {
> x = as.vector(t(df[i,2:ncol(df)]))
> q[i] = names(which.max(table(x)))
> }
> df$most = q
> ```
>
