[R] Imputing missing values

Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.ac.be
Wed Sep 1 11:33:17 CEST 2004

Hi Jan,

you could try the following:

dat <- data.frame(Price=c(10,12,NA,8,7,9,NA,9,NA),
                  Crop=c(rep("Rise", 5), rep("Wheat", 4)),
                  Season=c(rep("Summer", 3), rep("Winter", 4),
rep("Summer", 2)))
dat <- dat[order(dat$Season, dat$Crop),]
dat$Price.imp <- unlist(tapply(dat$Price, list(dat$Crop, dat$Season),
  mx <- mean(x, na.rm=TRUE)
  ifelse(is.na(x), mx, x)


However, you should be careful using this imputation technique since
you don't take into account the extra variability of imputing new
values in your data set. I don't know what analysis are you planning
to do but in any case I would recommend to read some standard
references for missing values, e.g., Little, R. and Rubin, D. (2002).
Statistical Analysis with Missing Data, New York: Wiley.

I hope this helps.


Dimitris Rizopoulos
Doctoral Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/

----- Original Message ----- 
From: "Jan Smit" <janpsmit at yahoo.co.uk>
To: <R-help at stat.math.ethz.ch>
Sent: Wednesday, September 01, 2004 10:43 AM
Subject: [R] Imputing missing values

> Dear all,
> Apologies for this beginner's question. I have a
> variable Price, which is associated with factors
> Season and Crop, each of which have several levels.
> The Price variable contains missing values (NA), which
> I want to substitute by the mean of the remaining
> (non-NA) Price values of the same Season-Crop
> combination of levels.
> Price     Crop    Season
> 10        Rice    Summer
> 12        Rice    Summer
> NA        Rice    Summer
> 8         Rice    Winter
> 9         Wheat    Summer
> Price[is.na(Price)] gives me the missing values, and
> by(Price, list(Crop, Season), mean, na.rm = T) the
> values I want to impute. What I've not been able to
> figure out, by looking at by and the various
> incarnations of apply, is how to do the actual
> substitution.
> Any help would be much appreciated.
> Jan Smit
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!

More information about the R-help mailing list