Friday, July 31, 2009

Creating Dummy Matrices in R

The following R function creates a matrix of dummy variables. You can access the R code here, and the associated description with the example here.
This is possibly not the most efficient way of creating dummy variables, but it works for what I care.

3 comments:

  1. Hi David. Thanks for posting. I noticed though that you linked the code to a web.ics.purdue.edu address. This link will be taken down after you graduate. I think it would be better to paste in the code directly on this blog to ensure that it remains. Does that make sense? Just a suggestion.

    ReplyDelete
  2. Two things:
    - the link to your code is broken David
    - second, Mesbah found some pretty neat code from Gary King (Harvard) to create dummies in a quick and short fashion.

    http://gking.harvard.edu/zelig/docs/Example_2__Creating.html

    I particularly like his 3rd option (labelled 2.1). If you have your categories or years (or what not) in one vector "state", then you can use the following code:

    idx <- sort(unique(mydata$state))
    for (j in 1:length(idx)) {
    dummy[,j] <- as.integer(mydata$state == idx[j])
    }

    Works like a charm!

    ReplyDelete
  3. If you really like it short, David code can be collapsed (using Benoit's example) to:

    david1 <- lm(y ~ factor(state), data=mydata)
    david2 <- lm(y ~ state, contr.sum(state), data=mydata)

    - the first command will add n-1 dummy variables to your regression.
    - the second will constraint the sum of the dummy estimates to be zero, giving you the seasonal adjustment in David's code. The nice thing about this one is that it allows including the full matrix of dummies + intercept, with an easier interpretation of the fixed effects.

    If state is a character string or a factor, you can ignore factor(), as I did in the second call to lm(). ?factor and ?contrast will help you to relevel the factors so you can change the omitted category group.

    Finally, if you really need the matrix, use

    m <- model.matrix(david1)

    to obtain the design matrix.

    The following link in UCLA's statistical website is quite useful:

    http://www.ats.ucla.edu/stat/r/modules/dummy_vars.htm

    ReplyDelete