[R] how to get r-squared for a predefined curve or function with "other" data points
David Winsemius
dwinsemius at comcast.net
Thu Feb 16 23:16:29 CET 2012
On Feb 16, 2012, at 11:43 AM, protoplast wrote:
> hello mailing list!
> i still consider myself an R beginner, so please bear with me if my
> questions seems strange.
>
> i'm in the field of biology, and have done consecutive hydraulic
> conductivity measurements in three parallels ("Sample"), resulting
> in three
> sets of conductivity values ("PLC" for percent loss of conductivity,
> relative to 100%) at multiple pressures ("MPa").
>
> ---
> Sample MPa PLC
> 1 -0.3498324 0.000000
> 1 -1.2414770 15.207821
> 1 -1.7993249 23.819995
> 1 -3.0162866 33.598570
> 1 -3.5184321 46.376933
> 1 -3.9899791 67.532226
> 1 -4.2731145 89.735541
> 1 -4.7597244 99.836239
> 2 -0.2754036 0.000000
> 2 -1.2912619 12.476132
> 2 -1.5128974 13.543273
> ...
> ---
>
> since each sample is a statistical unit, i have fitted each sample-
> subset to
> a sigmoid curve:
>
> ---
> plot(
> NA,
> NA,
> main="",
> xlim=c(-20,0),
> ylim=c(0,100),
> xlab = "water potential [MPa]",
> ylab = "percent loss of conductivity [%]",
> xaxp = c(0,-20,4),
> yaxp = c(0,100,5),
> tck = 0.02,
> mgp = c(1.5,0.1,0),
> )
>
> for(i in 1:3){
> x <- subset(curvedata,Group == i)$MPa
> y <- subset(curvedata,Group == i)$PLC
> name <- subset(curvedata,Group == i)$Sample
> points(x,y)
>
> vlc <- nls(y ~ 100/(1+exp(a*(x-b))), start=c(a=1, b=-3),
> data=list(x,y))
>
> curve(100/(1+exp(coef(vlc)[1]*(x-coef(vlc)[2]))), col=1, add = TRUE)
>
> Rsquared <- 1 - var(residuals(vlc))/var(y)
>
> summarizeall[i ,"Run"] <- i
> summarizeall[i ,"Sample"] <- name[1]
> summarizeall[i ,"a"] <- coef(vlc)[1]
> summarizeall[i ,"b"] <- coef(vlc)[2]
> summarizeall[i ,"R2"] <- Rsquared
>
> listnow <- data.frame(list(Run = c(i),Sample = c(name[1]), a =
> c(coef(vlc)[1]), b = c(coef(vlc)[2]), R2 = c(Rsquared)))
> print(listnow)
>
> i <- i+1
> }
> ---
>
> ...and get three slightly different curves with three different
> estimatinos
> of fit (r², Rsquared).
>
> ---
>> summarizeall
> Sample a b R2
> 1 1 1.388352 -3.277755 0.9379886
> 2 2 1.800007 -3.363075 0.9327164
> 3 3 1.736857 -2.743972 0.9882998
>
>> average
> Var n a b R2
> 1 Mean 3 1.6417389 -3.1282673 NA
> 2 SE . 0.1279981 0.1937197 NA
> ---
>
> by averaging parameters a and b of the curve, i create a "mean
> curve" that
> is added to the plot (red curve in the attached image).
>
> http://r.789695.n4.nabble.com/file/n4394609/conductivity-curve.gif
>
> ---
> meana <- average[1,"a"]
> meanb <- average[1,"b"]
> curve(), col=2, lwd=2, add = TRUE)
> ---
>
> and now here's my problem:
> i'd like to calculate R squared for all points on that mean curve.
> since i have to average the curve parameters, i loose the curve's
> residuals
> that are stored in my variable vlc (the result of the nls function)
> for
> every sample.
> just fitting one curve to all the data points is not good enough.
So just calculate them?
# pseudo-code: residual= actual - predicted
gresid <- curvedata$PLC - 100/(1+exp(meana*(curvedata$MPa-meanb))
If you are convinced that your formula for R^2 makes sense and this
practice is generally accepted in your domain, then you can apply it
across the whole dataset. I would have thought that a single
regression model built with nlmer might have been more statistically
sound. (But this is a bit outside my domain of comfort for giving
advice.)
>
> an extensive google search over several days hasn't gotten me
> anywhere, but
> maybe someone here can help me?
>
> is there an efficient way to calculate r squared for a predefined
> function
> with "unrelated" data points?
> (unrelated as in "not used directly for fitting")
>
>
> thanks in advance
> markus
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list