[R] mixtures as outcome variables
Kjetil Brinchmann Halvorsen
kjetil at acelerate.com
Wed Mar 23 17:36:41 CET 2005
Jason W. Martinez wrote:
>Dear R-users,
>
>I have an outcome variable and I'm unsure about how to treat it. Any
>advice?
>
>I have spending data for each county in the state of California (N=58).
>Each county has been allocated money to spend on any one of the
>following four categories: A, B, C, and D.
>
>Each county may spend the money in any way they see fit. This also means
>that the county need not spend all the money that was allocated to them.
>The data structure looks something like the one below:
>
>COUNTY A B C D Total
>----------------------------------------------------
>alameda 2534221 1555592 2835475 3063249 9988537
>alpine 3174 8500 0 45558 55232
>amador 0 0 0 0 0
>....
>
>
>The goal is to explain variation in spending patterns, which are
>presumably the result of characteristics for each county.
>
>I may treat the problem like a simple linear regression problem for each
>category, but by definition, money spent in one category will take away
>the amount of money that can be spent in any other category---and each
>county is not allocated the same amount of money to spend.
>
>I have constructed proportions of amount spent on each category and have
>conducted quasibinomial regression, on each dependent outcome but that
>does not seem very convincing to me.
>
>Would anyone have any advice about how to treat an outcome variable of
>this sort?
>
>Thanks for any hints!
>
>Jason
>
>
>
>
>
>
>
If you only concentrate on the relative proportions, this are called
compositional data. I f your data are in
mydata (n x 4), you obtain compositions by
sweep(mydata, 1, apply(mydata, 1, sum), "/")
There are not (AFAIK) specific functions/packages for R for
compositional data AFAIK, but you
can try googling. Aitchison has a monography (Chapman & Hall) and a
paper in JRSS B.
One way to start might be lm's or anova on the symmetric logratio
transform of the
compositons. The R function lm can take a multivariate response, but
some extra programming will be needed
for interpretation. With simulated data:
> slr
function(y) { # y should sum to 1
v <- log(y)
return( v - mean(v) ) }
> testdata <- matrix( rgamma(120, 2,3), 30, 4)
> str(testdata)
num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ...
> comp <- sweep(testdata, 1, apply(testdata,1,sum), "/")
# To get the symmetric logratio transform:
comp <- t(apply(comp, 1, slr))
# Observe:
apply(cov(comp), 1, sum)
[1] -5.551115e-17 2.775558e-17 5.551115e-17 -2.775558e-17
> lm( comp ~ 1)
Call:
lm(formula = comp ~ 1)
Coefficients:
[,1] [,2] [,3] [,4]
(Intercept) 0.17606 0.06165 -0.03783 -0.19988
> summary(lm( comp ~ 1))
Response Y1 :
Call:
lm(formula = Y1 ~ 1)
Residuals:
Min 1Q Median 3Q Max
-1.29004 -0.46725 -0.07657 0.55834 1.20551
Coefficients:
Estimate Std. Error t value Pr(>|t|)
[1,] 0.1761 0.1265 1.391 0.175
Residual standard error: 0.6931 on 29 degrees of freedom
Response Y2 :
Call:
lm(formula = Y2 ~ 1)
Residuals:
Min 1Q Median 3Q Max
-1.2982 -0.5711 -0.1355 0.5424 1.6598
Coefficients:
Estimate Std. Error t value Pr(>|t|)
[1,] 0.06165 0.15049 0.41 0.685
Residual standard error: 0.8242 on 29 degrees of freedom
Response Y3 :
Call:
lm(formula = Y3 ~ 1)
Residuals:
Min 1Q Median 3Q Max
-1.97529 -0.41115 0.03666 0.42785 0.88567
Coefficients:
Estimate Std. Error t value Pr(>|t|)
[1,] -0.03783 0.11623 -0.325 0.747
Residual standard error: 0.6366 on 29 degrees of freedom
Response Y4 :
Call:
lm(formula = Y4 ~ 1)
Residuals:
Min 1Q Median 3Q Max
-2.8513 -0.3955 0.2815 0.5939 1.2475
Coefficients:
Estimate Std. Error t value Pr(>|t|)
[1,] -0.1999 0.1620 -1.234 0.227
Residual standard error: 0.8872 on 29 degrees of freedom
Sorry for not being of more help!
Kjetil
--
Kjetil Halvorsen.
Peace is the most effective weapon of mass construction.
-- Mahdi Elmandjra
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
More information about the R-help
mailing list