[R] replace NA's with row means for specific columns
Marco
marco.prado.bs at gmail.com
Thu Nov 12 17:19:03 CET 2015
Excerpts from Zahra via R-help's message of 2015-11-02 17:49:01 -0200:
> Hi there,
>
> I am looking for some help replacing missing values in R with the row mean. This is survey data and I am trying to impute values for missing variables in each set of questions separately using the mean of the scores for the other questions within that set.
>
> I have a dataset that looks like this
>
> ID A1 A2 A3 B1 B2 B3 C1 C2 C3 C4
> b 4 5 NA 2 NA 4 5 1 3 NA
> c 4 5 1 NA 3 4 5 1 3 2
> d NA 5 1 1 NA 4 5 1 3 2
> e 4 5 4 5 NA 4 5 1 3 2
>
>
> I want to replace any NA's in columns A1:A3 with the row mean for those columns only. So for ID=b, I want the NA in A3[ID=b] to be (4+5)/2 which is the average of the values in A1 and A2 for that row.
> Same thing for columns B1:B3 - I want the NA in B2[ID=b] to be the mean of the values of B1 and B3 in row ID=b so that B2[ID=b] becomes 3 which is (2+4)/2. And same in C1:C4, I want C4[ID=b] to become (5+1+3)/3 which is the mean of C1:C3.
>
> Then I want to go to row ID=c and do the same thing and so on.
>
> Can anybody help me do this? I have tried using rowMeans and subsetting but can't figure out the right code to do it.
>
> Thanks so much.
> Zahra
>
use
is.na(df[ which(df$ID) == 'b']) <- fmean(...), where fmean:
Depends on column selection (Axx, Byy, etc..) and the row id itself (so consider pass
the left hand of assignment entirely). I would use:
fmean <- function(row, col_selection) { # homework for you here }
Best Regards,
--
Marco Arthur @ (M)arco Creatives
More information about the R-help
mailing list