[R] Fwd: Distribution to use to calculate p values

Bert Gunter gunter.berton at gene.com
Tue Apr 28 09:21:38 CEST 2015


... Realizing, of course, that after such data dredging, any subsequent
inference is highly biased.

Cheers,
Bert

On Tuesday, April 28, 2015, Jim Lemon <drjimlemon at gmail.com> wrote:

> Hi Lalitha,
> If you want to find a reasonable model distribution for your data, try
> plotting the histogram of the variable you want to predict and compare
> this to the density curves of the distributions that you think will
> fit. So for example:
>
> # plot a histogram of a uniform distribution
> hist(seq(1,10,length.out=100))
> # overlay a normal density function with the same mean
> lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*30)
>
> Not a very good fit, but:
>
> hist(rnorm(100,5.5))
> lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*90)
>
> Much better. You can then perform a "goodness of fit" test if you need
> it to justify your choice of distribution. In most cases, you will
> have to find a "family" (link function) to use in a generalized linear
> modeling (glm) test.
>
> Another approach is to use a non-parametric test if one gives an
> appropriate answer to your question.
>
> Jim
>
>
> On Tue, Apr 28, 2015 at 5:07 AM, David Winsemius <dwinsemius at comcast.net
> <javascript:;>> wrote:
> >
> > On Apr 27, 2015, at 10:50 AM, Lalitha Viswanathan wrote:
> >
> >> Hi
> >> I have a dataset as below
> >> Price Country Reliability Mileage Type Weight Disp. HP
> >>
> >>
> >> 8895 USA 4 33 Small 2560 97 113
> >> (Hundreds of rows)
> >>
> >> I am trying to find the best possible distribution to use, to find
> p-values
> >> and compute which factors most influence efficiency.
> >
> > "Finding p-values" is a task that requires research questions. You
> obviously have some sort of meaning attached to the word "efficiency" but
> have not stated what it is. This appears to be a request for a statistical
> tutorial an a topic that has not been described. (And if this is course
> homework, then it is off-topic for r-help.)
> >
> >>
> >> Any starting points for the functions I could use, or similar examples I
> >> could follow, would be a start.
> >> I am a relative novice at R having used it many years ago and am now
> >> getting back to it.
> >> So looking for pointers
> >>
> >> Thanks
> >>
> >>       [[alternative HTML version deleted]]
> >
> > The Posting Guide suggests that you create a small example in R code and
> describe your question more clearly (if it's not homework.)
> >
> >> ______________________________________________
> >> R-help at r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and
> more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > R-help at r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and
> more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and
> more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge is
certainly not wisdom."
Clifford Stoll

	[[alternative HTML version deleted]]



More information about the R-help mailing list