[R] Basic Dummy Variable Creation

Francisco J. Bido bido at mac.com
Fri Sep 5 18:34:21 CEST 2003

Thanks Douglas,

I see what your saying.  One of the reasons that I ask this question 
(besides being a complete R rookie) is to obtain good form and habits.  
R seems to be extremely capable and flexible (and of course also pretty 
dense) thank God for the mailing list.  Your example and the feedback 
provided by others on the list provide very good guidance.  I got this 
one down.  Thanks again!


On Friday, September 5, 2003, at 11:12 AM, Douglas Bates wrote:

> "Francisco J. Bido" <bido at mac.com> writes:
>> Hi There,
>> While looking through the mailing list archive, I did not come across
>> a simple minded example regarding the creation of dummy variables.
>> The Gauss language provides the command "y = dummydn(x,v,p)" for
>> creating dummy variables.
>> Here:
>> x = Nx1 vector of data to be broken up into dummy variables.
>> v = Kx1 vector specifying the K-1 breakpoints
>> p = positive integer in the range [1,K], specifying which column
>> should be dropped in the matrix of dummy variables.
>> y = Nx(K-1) matrix containing the K-1 dummy variables.
>> My recent mailing list archive inquiry has led me to examine R's
>> "model.matrix" but it has so many options that I'm not seeing the
>> forest because of the trees.  Is that really the easiest way? or is
>> there something similar to the dummydn command described above?
>> To provide a concrete scenario, please consider the following.  Using
>> the above notation, say, I had:
>> x <- c(1:10)      #data to be broken up into dummy variables
>> v <- c(3,5,7)     #breakpoints
>> p =  1                #drop this column to avoid dummy variable trap
>> How can I get a matrix "y" that has the associated dummy variables for
>> columns?
> Don't.
> Consider why you want the dummy variables.  You probably want to use
> them in the specification of a statistical model and R's model
> specification language automatically expands a factor variable into a
> set of contrasts.
> Try
> data(PlantGrowth)
> fm = lm(weight ~ group, data = PlantGrowth)
> summary(fm)
> and you will see that the `group' factor has been expanded to two of
> the three indicator variables (if you use the default setting for
> contrasts - other possibilities exist).
> You can check explicitly how the model matrix is created with
> model.matrix(fm)
> The model specification facilities in R are much more flexible than
> most other languages and you almost never need to create indicators
> explicitly.

More information about the R-help mailing list