Bert Gunter
gunter.berton at gene.com
Tue Apr 28 09:21:38 CEST 2015
... Realizing, of course, that after such data dredging, any subsequent
inference is highly biased.
On Tuesday, April 28, 2015, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi Lalitha,
> If you want to find a reasonable model distribution for your data, try
> plotting the histogram of the variable you want to predict and compare
> this to the density curves of the distributions that you think will
> fit. So for example:
>
> # plot a histogram of a uniform distribution
> hist(seq(1,10,length.out=100))
> # overlay a normal density function with the same mean
> lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*30)
>
> Not a very good fit, but:
>
> hist(rnorm(100,5.5))
> lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*90)
>
> Much better. You can then perform a "goodness of fit" test if you need
> it to justify your choice of distribution. In most cases, you will
> have to find a "family" (link function) to use in a generalized linear
> modeling (glm) test.
>
> Another approach is to use a non-parametric test if one gives an
> appropriate answer to your question.
>
> Jim
>
>
> On Tue, Apr 28, 2015 at 5:07 AM, David Winsemius <dwinsemius at comcast.net
> <javascript:;>> wrote:
> >
> > On Apr 27, 2015, at 10:50 AM, Lalitha Viswanathan wrote:
> >
> >> Hi
> >> I have a dataset as below
> >> Price Country Reliability Mileage Type Weight Disp. HP
> >>
> >>
> >> 8895 USA 4 33 Small 2560 97 113
> >> (Hundreds of rows)
> >>
> >> I am trying to find the best possible distribution to use, to find
> p-values
> >> and compute which factors most influence efficiency.
> >
> > "Finding p-values" is a task that requires research questions. You
> obviously have some sort of meaning attached to the word "efficiency" but
> have not stated what it is. This appears to be a request for a statistical
> tutorial an a topic that has not been described. (And if this is course
> homework, then it is off-topic for r-help.)
> >
> >>
> >> Any starting points for the functions I could use, or similar examples I
> >> could follow, would be a start.
> >> I am a relative novice at R having used it many years ago and am now
> >> getting back to it.
> >> So looking for pointers
> >>
> >> Thanks
> >>
> >
> > The Posting Guide suggests that you create a small example in R code and
> describe your question more clearly (if it's not homework.)
> >
> >
> > David Winsemius
> > Alameda, CA, USA
> >
>
>
