[R] generate distribution based on summary data and add random noise

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Thu Feb 3 18:34:39 CET 2022


Nope. I think I provided what you asked for, random data in each bin with
the amount of data proportional to bin percentage and the distribution of
that data uniform (nor normal) within the bin. So maybe someone else can
give you what you want if this ain't it.

Cheers,
Bert

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Feb 3, 2022 at 8:44 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:

> Hallo Bert
>
> probably not, sorry. Did you try my examples?
>
> To make it maybe simpler
> 1. sample a vector with given proportion and generate new data
> 2. add random noise to each generated value with sd given by value of a
> vector.
>
> let say
>
> x <- c(10, 100)
> y <- c(.6, .4)
> set.seed(200)
> z <- sample(x, 10, rep=TRUE, prob=y)
> ind <- order(z)
> bins <- rle(z[ind])
> bin1 <- rnorm(bins$lengths[1], mean = 0, sd=bins$values[1]/5)
> bin2 <- rnorm(bins$lengths[2], mean = 0, sd=bins$values[2]/5)
> z[ind] + c(bin1, bin2)
>
> Sorry that I did not explain myself more clearly, I hoped that example
> showed what I have on mind.
>
> Basically it is particle size cumulative distribution but size is
> expressed as size bins. Normally I have exact size measurement for each
> particle.
>
> S pozdravem | Best Regards
> RNDr. Petr PIKAL
> Vedoucí Výzkumu a vývoje | Research Manager
> PRECHEZA a.s.
> nábř. Dr. Edvarda Beneše 1170/24 | 750 02 Přerov | Czech Republic
> Tel: +420 581 252 256 | GSM: +420 724 008 364
> mailto:petr.pikal using precheza.cz | https://www.precheza.cz/
>
> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> partnerů PRECHEZA a.s. jsou zveřejněny na:
> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> about processing and protection of business partner’s personal data are
> available on website:
> https://www.precheza.cz/en/personal-data-protection-principles/
> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> důvěrné a podléhají tomuto právně závaznému prohlášení o vyloučení
> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> documents attached to it may be confidential and are subject to the legally
> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>
> From: Bert Gunter <bgunter.4567 using gmail.com>
> Sent: Thursday, February 3, 2022 5:10 PM
> To: PIKAL Petr <petr.pikal using precheza.cz>
> Cc: R-help <r-help using r-project.org>
> Subject: Re: [R] generate distribution based on summary data and add
> random noise
>
> If I understand correctly:
> To generate a sample of total size N, generate a uniform sample of size
> p*N for a bin with proportion p?
> ?runif
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Feb 3, 2022 at 7:52 AM PIKAL Petr <mailto:petr.pikal using precheza.cz>
> wrote:
> Hallo all
>
> I have summary data with size bins and percentage below that size.
>
> dat <- structure(list(size = c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L,
> 90L, 100L, 110L, 120L, 130L, 140L, 150L, 160L, 170L, 180L, 190L,
> 200L, 250L, 300L, 400L, 500L), percent = c(0L, 0L, 0L, 1L, 1L,
> 2L, 4L, 8L, 13L, 18L, 24L, 31L, 38L, 44L, 50L, 57L, 65L, 72L,
> 76L, 83L, 95L, 98L, 100L, 100L)), class = "data.frame", row.names = c(NA,
> -24L))
>
> #I want to generate original distribution (I know it is better not to do
> it but I have no other choice) so I calculated #mids of those bins
>
> xd <-dat$size-c(5,diff(dat$size)/2)
> xd<- xd[-1]
>
> #I can sample the size bins with probability given by percent.
> Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)
> plot(ecdf(Result))
>
> #and I can add some noise to it, which is satisfactory with lower size
> bins but not enough for higher size bins.
>
> Result <- sample(xd, 1000, rep=T, prob=diff(dat$percent)/100)+rnorm(1000,
> mean=0, sd=5)
> plot(ecdf(Result))
> I can increase sd to satisfy bigger bin size but in that case noise is too
> big for lower bin size.
>
> I would like to add smaller random noise to lower size bins and bigger
> random noise to higher size bins, which seems to be easy task but I am
> stuck how to do it. It should be somehow proportional to size value.
> The only way forward I see is to sort generated result and to use
> something like
>
> + rnorm(1000, mean=xd, sd=xd/10)
> But it is not correct.
>
> I'd appreciate any hint how to add random noise to values in ordered
> manner.
>
> Best regards.
> Petr
>
> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> partnerů PRECHEZA a.s. jsou zveřejněny na:
> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> about processing and protection of business partner’s personal data are
> available on website:
> https://www.precheza.cz/en/personal-data-protection-principles/
> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> documents attached to it may be confidential and are subject to the legally
> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>
> ______________________________________________
> mailto:R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list