[R] Normalizing data

Ramon Diaz-Uriarte rdiaz02 at gmail.com
Thu Nov 15 23:39:45 CET 2007


Maybe Joren means that the y axis has values greater than 1? If that
is the case, that is certainly not evidence of any problem; the
density can have values larger than 1 and still integrate to 1. (And,
just as a silly example, try "dnorm(0, mean = 0, sd = 0.1)").

Best,

R.



On Nov 15, 2007 11:05 PM, jim holtman <jholtman at gmail.com> wrote:
> I am not sure what you mean when you say it does not integrate to 1.
> Here are a couple of cases, and it seems fine to me:
>
> > x <- density(1:30)
> > str(x)
> List of 7
>  $ x        : num [1:512] -11.0 -10.9 -10.8 -10.7 -10.6 ...
>  $ y        : num [1:512] 6.66e-05 7.22e-05 7.84e-05 8.49e-05 9.20e-05 ...
>  $ bw       : num 4.01
>  $ n        : int 30
>  $ call     : language density.default(x = 1:30)
>  $ data.name: chr "1:30"
>  $ has.na   : logi FALSE
>  - attr(*, "class")= chr "density"
> > plot(x)
> > sum(diff(x$x) * (head(x$y,-1) + tail(x$y,-1))/2)  # integrate
> [1] 1.000823
> > x <- density(rnorm(1000))
> > sum(diff(x$x) * (head(x$y,-1) + tail(x$y,-1))/2) # integrate
> [1] 1.000974
> >
>
>
> On Nov 15, 2007 2:47 PM, Joren Heit <jorenheit at gmail.com> wrote:
> > Hello,
> >
> > I have a data set of about 300.000 measurements made by an STM which should
> > apporximately fix a normal (Gaussian) distribution.
> > I have imported the data in R and used plot(density()) to get a nice plot of
> > the distribution which in fact looks like a real Gaussian.
> > However, the integral over the surface is not equal to one (I know since
> > some of the plots extend to numbers greater then 1). Is there a way to
> > normalize the data so the density function will actualy yield the
> > probability of x (a height in my case)?
> > This is my code so far:
> >
> > #Input path
> > path <- "G:\\C\\Data txt\\1au300.txt"
> >
> > #Dataverwerking
> > data <- read.table(path, header=TRUE)
> > rows <- length(data$height)
> > height <- data$height[1:rows]
> > dens <-density(height)
> >
> > mean <- mean(height)
> > sd <- sd(height)
> > min <- min(hnorm)
> > max <- max(hnorm)
> >
> > #Plot
> > par(new=FALSE)
> > curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="", ylab="", col="white",
> > lwd=2)
> > points(dens, type="h", col="grey" )
> > par(new=TRUE)
> > curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="Height (nm)",
> > ylab="Density", lwd=2, col="darkred")
> >
> >
> > Thanks
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz



More information about the R-help mailing list