[R] odd behaviour in quantreg::rq

Wed Jul 1 22:37:55 CEST 2009

It's not clear to me whether you are looking for an exploratory tool
or something more like formal inference.  For the former, it seems
that estimating a few weighted quantiles would be quite useful.  at  
least
it is rather Tukeyesque.  While I'm appealing to authorities, I can't
resist recalling that Galton's "invention of correlation" paper:  Co- 
relations
and their measurement, Proceedings of the Royal Society, 1888-89,
used medians and interquartile ranges.

url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    rkoenker at uiuc.edu            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Urbana, IL 61801

On Jul 1, 2009, at 2:48 PM, Dylan Beaudette wrote:

> Thanks Roger. Your comments were very helpful. Unfortunately, each of
> the 'groups' in this example are derived from the same set of data,  
> two of
> which were subsets-- so it is not that unlikely that the weighted  
> medians
> were the same in some cases.
>
> This all leads back to an operation attempting to answer the question:
>
> Of the 2 subsetting methods, which one produces a distribution most  
> like the
> complete data set? Since the distributions are not normal, and there  
> are
> area-weights involved others on the list suggested quantile- 
> regression. For a
> more complete picture of how 'different' the distributions are, I  
> have tried
> looking at the differences between weighted quantiles: (0.1, 0.25,  
> 0.5, 0.75,
> 0.9) as a more complete 'description' of each distribution.
>
> I imagine that there may be a better way to perform this comparison...
>
> Cheers,
> Dylan
>
>
> On Tuesday 30 June 2009, roger koenker wrote:
>> Admittedly this seemed quite peculiar....  but if you look at the
>> entrails
>> of the following code you will see that with the weights the first  
>> and
>> second levels of your x$method variable have the same (weighted)  
>> median
>> so the contrast that you are estimating SHOULD be zero.  Perhaps
>> there is something fishy about the data construction that would have
>> allowed us to anticipate this?  Regarding the "fn" option, and the
>> non-uniqueness warning,  this is covered in the (admittedly obscure)
>> faq on quantile regression available at:
>>
>> 	http://www.econ.uiuc.edu/~roger/research/rq/FAQ
>>
>> # example:
>> library(quantreg)
>>
>> # load data
>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
>>
>> # with weights
>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
>> se='ker')
>>
>> #Reproduction with more convenient notation:
>>
>> X0 <- model.matrix(~method, data = x)
>> y <- x$sand
>> w <- x$area_fraction
>> f0 <- summary(rq(y ~ X0 - 1, weights = w),se = "ker")
>>
>> #Second reproduction with orthogonal design:
>>
>> X1 <- model.matrix(~method - 1, data = x)
>> f1 <- summary(rq(y ~ X1 - 1, weights = w),se = "ker")
>>
>> #Comparing f0 and f1 we see that they are consistent!!  How can that
>> be??
>> #Since the columns of X1 are orthogonal estimation of the 3  
>> parameters
>> are separable
>> #so we can check to see whether the univariate weighted medians are
>> reproducible.
>>
>> s1 <- X1[,1] == 1
>> s2 <- X1[,2] == 1
>> g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1])
>> g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2])
>>
>> #Now looking at the g1 and g2 objects we see that they are equal and
>> agree with f1.
>>
>>
>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
>> email    rkoenker at uiuc.edu            Department of Economics
>> vox:     217-333-4558                University of Illinois
>> fax:       217-244-6678                Urbana, IL 61801
>>
>> On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote:
>>> Hi,
>>>
>>> I am trying to use quantile regression to perform weighted-
>>> comparisons of the
>>> median across groups. This works most of the time, however I am
>>> seeing some
>>> odd output in summary(rq()):
>>>
>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights =
>>> area_fraction)
>>> Coefficients:
>>>                  Value    Std. Error t value  Pr(>|t|)
>>> (Intercept)        45.44262  3.64706   12.46007  0.00000
>>> methodmukey-HRU     0.00000  4.67115    0.00000  1.00000
>>> 				  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> When I do not include the weights, I get something a little closer
>>> to a
>>> weighted comparison of means, along with an error message:
>>>
>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x)
>>> Coefficients:
>>>                  Value    Std. Error t value  Pr(>|t|)
>>> (Intercept)        44.91579  2.46341   18.23318  0.00000
>>> methodmukey-HRU     9.57601  9.29348    1.03040  0.30380
>>> Warning message:
>>> In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique
>>>
>>>
>>> I have noticed that the error message goes away when specifying
>>> method='fn' to
>>> rq(). An example is below. Could this have something to do with
>>> replication
>>> in the data?
>>>
>>>
>>> # example:
>>> library(quantreg)
>>>
>>> # load data
>>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
>>>
>>> # with weights
>>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
>>> se='ker')
>>>
>>> # without weights
>>> # note error message
>>> summary(rq(sand ~ method, data=x, tau=0.5), se='ker')
>>>
>>> # without weights, no error message
>>> summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker')
>>>