[R] how to generate a random data from a empirical, distribition
Marshall Feldman
marsh at uri.edu
Tue Jul 27 16:35:34 CEST 2010
On 7/27/2010 6:00 AM, r-help-request at r-project.org wrote:
> Date: Mon, 26 Jul 2010 11:36:29 -0700 (PDT)
> From: xin wei<xinwei at stat.psu.edu>
> To:r-help at r-project.org
> Subject: [R] how to generate a random data from a empirical
> distribition
> Message-ID:<1280169389379-2302716.post at n4.nabble.com>
> Content-Type: text/plain; charset=us-ascii
>
>
> hi, this is more a statistical question than a R question. but I do want to
> know how to implement this in R.
> I have 10,000 data points. Is there any way to generate a empirical
> probablity distribution from it (the problem is that I do not know what
> exactly this distribution follows, normal, beta?). My ultimate goal is to
> generate addition 20,000 data point from this empirical distribution created
> from the existing 10,000 data points.
> thank you all in advance.
>
>
> -- View this message in context:
> http://r.789695.n4.nabble.com/how-to-generate-a-random-data-from-a-empirical-distribition-tp2302716p2302716.html
> Sent from the R help mailing list archive at Nabble.com.
Ah! This brings back memories of the halcyon days of my youth when, as a
junior in college, I took a course in introductory probability theory
around this time during the summer in preparation for working as a co-op
student the coming fall.
Conceptually, why not treat your empirical sample as an "urn" with
10,000 items. Then take a sample of 20,000 by sampling with equal
probabilities and replacement (otherwise you'll run out of cases before
20,000). Remember that all the common distributions (normal, etc.)
either were derived because they fit certain common situations (e.g.,
binomial), are of particular use (e.g., Student's t), can be derived
from other distributions (e.g., normal and the Central Limit Theorem),
or some combination of such things. In other words, whether or not an
empirical sample fits one of them is always contingent, although
understanding any underlying processes that generate the sample might
point in the direction of certain distributions over others.
Nonetheless, for something like a Monte Carlo simulation, knowledge of
an underlying distribution is not necessary.
Also remember that many things in statistics were developed largely
because they made certain problems mathematically tractable. (Hence, for
example, the large number of situations involving independent,
identically distributed random samples or the popularity of ordinary
least-squares regression.) Today, most of us have more computing power
at our desks than entire mainframe computing centers had a few decades
ago. So in many instances, we don't need no stinkin' complex formulas
anymore.
If you suspect the distribution corresponds to one of the mathematically
studied distributions, why not fit a curve to a plot of your data points
and see if it looks familiar? Then do some kind of goodness-of-fit test
to see if the theoretical distribution is a reasonable approximation.
--
Dr. Marshall Feldman, PhD
Director of Research and Academic Affairs
CUSR Logo
Center for Urban Studies and Research
<http://www.uri.edu/prov/research/urbanstudies.html>
The University of Rhode Island <http://www.uri.edu>
email: marsh @ uri .edu (remove spaces) <mailto:marsh%20%5C%20uri%20.edu>
More information about the R-help
mailing list