[R] quantile from quantile table calculation without original data
Abby Spurdle
@purd|e@@ @end|ng |rom gm@||@com
Fri Mar 12 10:22:39 CET 2021
Hi Petr,
In principle, I like David's approach the best.
However, I note that there's a bug in the squared step.
Furthemore, the variance of the sample quantiles should increase as
they move away from the modal region.
I've built on David's approach, but changed it to a two stage
optimization algorithm.
The parameter estimates from the first stage are used to compute density values.
Then the second stage is weighted, using the scaled density values.
I tried to create an iteratively reweighted algorithm.
However, it didn't converge.
(But that doesn't necessarily mean it can't be done).
The following code returns the value: 1.648416e-05
qfit.lnorm <- function (p, q, lower.tail=TRUE, ...,
par0 = c (-0.5, 0.5) )
{ n <- length (p)
qsample <- q
objf <- function (par)
{ qmodel <- qlnorm (p, par [1], par [2], lower.tail)
sum ( (qmodel - qsample)^2) / n
}
objf.w <- function (wpar, w)
{ qmodel <- qlnorm (p, wpar [1], wpar [2], lower.tail)
sum (w * (qmodel - qsample)^2)
}
wpar0 <- optim (par0, objf)$par
w <- dlnorm (p, wpar0 [1], wpar0 [2], lower.tail)
optim (wpar0, objf.w,, w=w)
}
par <- qfit.lnorm (temp$percent, temp$size, FALSE)$par
plnorm (0.1, par [1], par [2])
On Tue, Mar 9, 2021 at 2:52 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:
>
> Hallo David, Abby and Bert
>
> Thank you for your solutions. In the meantime I found package rriskDistributions, which was able to calculate values for lognormal distribution from quantiles.
>
> Abby
> > 1-psolution
> [1] 9.980823e-06
>
> David
> > plnorm(0.1, -.7020649, .4678656)
> [1] 0.0003120744
>
> rriskDistributions
> > plnorm(0.1, -.6937355, .3881209)
> [1] 1.697379e-05
>
> Bert suggested to ask for original data before quantile calculation what is probably the best but also the most problematic solution. Actually, maybe original data are unavailable as it is the result from particle size measurement, where the software always twist the original data and spits only descriptive results.
>
> All your results are quite consistent with the available values as they are close to 1, so for me, each approach works.
>
> Thank you again.
>
> Best regards.
> Petr
>
> > -----Original Message-----
> > From: David Winsemius <dwinsemius using comcast.net>
> > Sent: Sunday, March 7, 2021 1:33 AM
> > To: Abby Spurdle <spurdle.a using gmail.com>; PIKAL Petr
> > <petr.pikal using precheza.cz>
> > Cc: r-help using r-project.org
> > Subject: Re: [R] quantile from quantile table calculation without original data
> >
> >
> > On 3/6/21 1:02 AM, Abby Spurdle wrote:
> > > I came up with a solution.
> > > But not necessarily the best solution.
> > >
> > > I used a spline to approximate the quantile function.
> > > Then use that to generate a large sample.
> > > (I don't see any need for the sample to be random, as such).
> > > Then compute the sample mean and sd, on a log scale.
> > > Finally, plug everything into the plnorm function:
> > >
> > > p <- seq (0.01, 0.99,, 1e6)
> > > Fht <- splinefun (temp$percent, temp$size) x <- log (Fht (p) )
> > > psolution <- plnorm (0.1, mean (x), sd (x), FALSE) psolution
> > >
> > > The value of the solution is very close to one.
> > > Which is not a surprise.
> > >
> > > Here's a plot of everything:
> > >
> > > u <- seq (0.000001, 1.65,, 200)
> > > v <- plnorm (u, mean (x), sd (x), FALSE) plot (u, v, type="l", ylim =
> > > c (0, 1) ) points (temp$size, temp$percent, pch=16) points (0.1,
> > > psolution, pch=16, col="blue")
> >
> > Here's another approach, which uses minimization of the squared error to
> > get the parameters for a lognormal distribution.
> >
> > temp <- structure(list(size = c(1.6, 0.9466, 0.8062, 0.6477, 0.5069, 0.3781,
> > 0.3047, 0.2681, 0.1907), percent = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95,
> > 0.99)), .Names = c("size", "percent"
> > ), row.names = c(NA, -9L), class = "data.frame")
> >
> > obj <- function(x) {sum( qlnorm(1-temp$percent, x[[1]], x[[2]])-temp$size
> > )^2}
> >
> > # Note the inversion of the poorly named and flipped "percent" column,
> >
> > optim( list(a=-0.65, b=0.42), obj)
> >
> > #--------------------
> >
> > $par
> > a b
> > -0.7020649 0.4678656
> >
> > $value
> > [1] 3.110316e-12
> >
> > $counts
> > function gradient
> > 51 NA
> >
> > $convergence
> > [1] 0
> >
> > $message
> > NULL
> >
> >
> > I'm not sure how principled this might be. There's no consideration in this
> > approach for expected sampling error at the right tail where the magnitudes
> > of the observed values will create much larger contributions to the sum of
> > squares.
> >
> > --
> >
> > David.
> >
> > >
> > >
> > > On Sat, Mar 6, 2021 at 8:09 PM Abby Spurdle <spurdle.a using gmail.com>
> > wrote:
> > >> I'm sorry.
> > >> I misread your example, this morning.
> > >> (I didn't read the code after the line that calls plot).
> > >>
> > >> After looking at this problem again, interpolation doesn't apply, and
> > >> extrapolation would be a last resort.
> > >> If you can assume your data comes from a particular type of
> > >> distribution, such as a lognormal distribution, then a better
> > >> approach would be to find the most likely parameters.
> > >>
> > >> i.e.
> > >> This falls within the broader scope of maximum likelihood.
> > >> (Except that you're dealing with a table of quantile-probability
> > >> pairs, rather than raw observational data).
> > >>
> > >> I suspect that there's a relatively easy way of finding the parameters.
> > >>
> > >> I'll think about it...
> > >> But someone else may come back with an answer first...
> > >>
> > >>
> > >> On Sat, Mar 6, 2021 at 8:17 AM Abby Spurdle <spurdle.a using gmail.com>
> > wrote:
> > >>> I note three problems with your data:
> > >>> (1) The name "percent" is misleading, perhaps you want "probability"?
> > >>> (2) There are straight (or near-straight) regions, each of which, is
> > >>> equally (or near-equally) spaced, which is not what I would expect
> > >>> in problems involving "quantiles".
> > >>> (3) Your plot (approximating the distribution function) is
> > >>> back-the-front (as per what is customary).
> > >>>
> > >>>
> > >>> On Fri, Mar 5, 2021 at 10:14 PM PIKAL Petr <petr.pikal using precheza.cz>
> > wrote:
> > >>>> Dear all
> > >>>>
> > >>>> I have table of quantiles, probably from lognormal distribution
> > >>>>
> > >>>> dput(temp)
> > >>>> temp <- structure(list(size = c(1.6, 0.9466, 0.8062, 0.6477,
> > >>>> 0.5069, 0.3781, 0.3047, 0.2681, 0.1907), percent = c(0.01, 0.05,
> > >>>> 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99)), .Names = c("size", "percent"
> > >>>> ), row.names = c(NA, -9L), class = "data.frame")
> > >>>>
> > >>>> and I need to calculate quantile for size 0.1
> > >>>>
> > >>>> plot(temp$size, temp$percent, pch=19, xlim=c(0,2)) ss <-
> > >>>> approxfun(temp$size, temp$percent) points((0:100)/50,
> > >>>> ss((0:100)/50))
> > >>>> abline(v=.1)
> > >>>>
> > >>>> If I had original data it would be quite easy with ecdf/quantile function
> > but without it I am lost what function I could use for such task.
> > >>>>
> > >>>> Please, give me some hint where to look.
> > >>>>
> > >>>>
> > >>>> Best regards
> > >>>>
> > >>>> Petr
> > >>>> Osobní údaje: Informace o zpracování a ochraně osobních údajů
> > >>>> obchodních partnerů PRECHEZA a.s. jsou zveřejněny na:
> > >>>> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ |
> > >>>> Information about processing and protection of business partner's
> > >>>> personal data are available on website:
> > >>>> https://www.precheza.cz/en/personal-data-protection-principles/
> > >>>> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> > >>>> důvěrné a podléhají tomuto právně závaznému prohlá±ení o vyloučení
> > >>>> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and
> > >>>> any documents attached to it may be confidential and are subject to
> > >>>> the legally binding disclaimer:
> > >>>> https://www.precheza.cz/en/01-disclaimer/
> > >>>>
> > >>>>
> > >>>> [[alternative HTML version deleted]]
> > >>>>
> > >>>> ______________________________________________
> > >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>> PLEASE do read the posting guide
> > >>>> http://www.R-project.org/posting-guide.html
> > >>>> and provide commented, minimal, self-contained, reproducible code.
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list