[R] Creating missing values.

Marc Feldesman feldesmanm at pdx.edu
Sun Mar 24 18:12:31 CET 2002

I'm trying to figure out whether there is a simple one or two-pass approach 
to randomly creating missing values for a set of existing (complete) 
data.  For example, I want to randomly make 10% of the entries in the Iris 
dataset missing (i.e. NA).  I don't want any case to have all missing 
values and I don't want any case to be missing the classification 
variable.  I can do this in about 3 passes, but I haven't figured out 
whether there is an efficient way to do this in one or two passes through 
the data.

My approach involves creating a dummy vector with a length equal to the 
full length
of the Iris data (750 elements).  >sample(750, 1:10, replace=T).  I then 
assigned all values of 2 to be 0 and all others to be 1.  This left me with 
approximately 10% of the entries as "missing".  I reshaped this into a 150 
x 5 matrix.   From here, things were pretty straightforward.

Is there anyway to bypass the dummy vector and operate directly on a copy 
of the original Iris matrix and get to the point above without the 
intermediate steps?


Dr. Marc R. Feldesman
Professor and Chairman
Anthropology Department
Portland State University
1721 SW Broadway
Portland, Oregon 97201
email:  feldesmanm at pdx.edu
phone:  503-725-3081
fax:    503-725-3905
PGP Key Available On Request

"Beyond every credibility gap lies a gullibility fill"

Powered by  Latochoerus and Windows 2000, SP1

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list