[R] Multinomial Analysis
soeren.vogel at eawag.ch
soeren.vogel at eawag.ch
Wed Dec 15 18:44:00 CET 2010
I want to analyse data with an unordered, multi-level outcome variable, y. I am asking for the appropriate method (or R procedure) to use for this analysis.
> N <- 500
> set.seed(1234)
> data0 <- data.frame(y = as.factor(sample(LETTERS[1:3], N, repl = T,
+ prob = c(10, 12, 14))), x1 = sample(1:7, N, repl = T, prob = c(8,
+ 8, 9, 15, 9, 9, 8)), x2 = sample(1:7, N, repl = T, prob = 7:1),
+ x3 = round(rnorm(N, 0, 1), 0) + 4)
In [1], the use of "mlogit" is described (same package name). "mlogit" produces a nice output, but for some reason it does not accept the "." operator in my formula. In addition, it requires me to transform my data frame before running the analysis:
> library(mlogit)
> data1 <- mlogit.data(data0, varying = NULL, choice = "y", shape = "wide")
> summary(mod1 <- mlogit(y ~ 1 | ., data = data1, reflevel = "A"))
Error in terms.formula(object) : '.' in formula and no 'data' argument
> form2 <- paste("y", " ~ 1 | ", paste(names(data0)[-1], sep = "",
+ collapse = " + "), sep = "", collapse = "")
> summary(mod2 <- mlogit(as.formula(form2), data = data1, reflevel = "A"))
Call:
mlogit(formula = as.formula(form2), data = data1, reflevel = "A",
method = "nr", print.level = 0)
Frequencies of alternatives:
A B C
0.282 0.338 0.380
nr method
3 iterations, 0h:0m:0s
g'(-H)^-1g = 8.8E-07
gradient close to zero
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
altB 0.269645 0.569943 0.4731 0.6361
altC 0.642654 0.551250 1.1658 0.2437
altB:x1 0.018484 0.061061 0.3027 0.7621
altC:x1 0.075489 0.059837 1.2616 0.2071
altB:x2 -0.102349 0.068194 -1.5008 0.1334
altC:x2 -0.031934 0.065491 -0.4876 0.6258
altB:x3 0.034394 0.115189 0.2986 0.7653
altC:x3 -0.137739 0.112124 -1.2284 0.2193
Log-Likelihood: -542.05
McFadden R^2: 0.0065888
Likelihood ratio test : chisq = 7.1903 (p.value=0.30361)
This is no big problem if the data frame holds only few variables, however, it can easily become confusing if the number variables in the data frame increases. So, I can use the function "multinom" (package "nnet"), as described in [2] and [3]:
> library(nnet)
> summary(mod3 <- multinom(y ~ ., data = data0, na.action = na.omit,
+ Hess = T, model = T))
# weights: 15 (8 variable)
initial value 549.306144
iter 10 value 542.075434
final value 542.046320
converged
Call:
multinom(formula = y ~ ., data = data0, na.action = na.omit,
Hess = T, model = T)
Coefficients:
(Intercept) x1 x2 x3
B 0.2696466 0.01848389 -0.10234898 0.03439345
C 0.6426552 0.07548893 -0.03193395 -0.13773888
Std. Errors:
(Intercept) x1 x2 x3
B 0.5699428 0.06106070 0.06819403 0.1151887
C 0.5512500 0.05983652 0.06549051 0.1121242
Residual Deviance: 1084.093
AIC: 1100.093
The estimates produced by this function are the same like in the analysis above ("mlogit"), and in principle, I should be able to calculate the z and fit statistics by hand. However, this seem more long-winded than using the "mlogit" function.
Alternatively, I may describe my results in the following way: Given that a person knows B and C (which is the case for all my cases), what drives some to use B, and what drives some to use C? Basically, I am wondering whether this analysis can be done using a set of two, parallel binary logistic regressions, one describing the step from A to B, and another one describing the step from A to C, while keeping predictors the same in both regressions.
More generally, I am not sure whether each multinomial logistic model could not just as well be analysed using consecutive binary logistic regressions. This has two advantages in my opinion, one, the model statistics and fit measures are well-described in binary models compared to multinomial ones (see [1] at the bottom), and two, they are much easier to communicate. But I am wondering, analogously to multiple mean comparisons, whether I run into any traps, and I have no idea what these traps look like.
I would kindly appreciate any advice.
Thanks, Sören
[1] http://www.ats.ucla.edu/stat/r/dae/mlogit.htm
[2] Venables, W. N. and Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer.
[3] http://www.stat.washington.edu/quinn/classes/536/S/multinomexample.html
More information about the R-help
mailing list