[R] Curve Fitting/Regression with Multiple Observations

kMan kchamberln at gmail.com
Fri Apr 30 16:54:13 CEST 2010


Dear Joseph,

I have had a similar experience to replies. Andy's assessment about signal to noise on the list is, I believe, quite accurate, and quite elegant. My experience has generally been that R-replies get better with age. 

I welcome the feedback you just provided.

Sincerely,
KeithC.

-----Original Message-----
From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo.kim at gmail.com] 
Sent: Friday, April 30, 2010 4:10 AM
To: kMan
Cc: r-help at r-project.org
Subject: Re: [R] Curve Fitting/Regression with Multiple Observations

Dear Keith,

Thanks for the suggestion and taking your time to respond to it.

But, you misunderstand something and seems that you do not read all my previous e-mails.
For instance, can a hand-drawing curve give you an inverse function (analytically or numerically) so that you can find an x value given the y value (not just for one, but for hundreds of points)?

As for the statistical inferences, I admit that my communications were not that very clear. My intention is to get a smoothed curve from the simulation data in a statistically meaningful way as much as possible for my intended use of the resulting curve.

As said before, I don't know all the thorough theoretical details behind regression and curve fitting functions available in R (know the basics though as one with PhD in Elec. Eng. unlike someone's assessment), but am doing my best to catch up reading textbooks and manuals, and posting this question to this list is definitely a way to learn from many experts and advanced users of R.

By the way, I wonder why most of the responses I've received from this list are so cynical (or skeptical?) and in some sense done in a quite arrogant way. It's very hard to imagine that one would receive such responses in my own areas of computer simulation and optical communications/networking. If a newbie asks a question to the list not making much sense or another FAQ, that is usually ignored (i.e., no
response) because all we are too busy to deal with that. Sometimes, though, a kind soul (like Gabor) takes his/her own valuable time and doesn't mind explaining all the details from simple basics.

Again, what I want to hear from the list is the proper use of regression/curve fitting functions of R for my simulation data with
replications: Applying after taking means or directly on them? So far I haven't heard anyone even specifically touching my question, although there were several seemingly related suggestions.

Regards,
Joseph

On Fri, Apr 30, 2010 at 4:25 AM, kMan <kchamberln at gmail.com> wrote:
> Dear Joseph,
>
> If you do not need to make any inferences, that is, you just want it to look pretty, then drawing a curve by hand is as good a solution as any. Plus, there is no reason for expert testimony to say that the curve does not mean anything.
>
> Sincerely,
> KeithC.
>
> -----Original Message-----
> From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo.kim at gmail.com]
> Sent: Tuesday, April 27, 2010 2:33 PM
> To: Gabor Grothendieck
> Cc: r-help at r-project.org
> Subject: Re: [R] Curve Fitting/Regression with Multiple Observations
>
> Frankly speaking, I am not looking for such a framework.
>
> The system I'm studying is a communication network (like M/M/1 queue, but way too complicated to mathematically analyze it using classical queueing theory) and the conclusion I want to make is qualitative rather than quantatitive -- a high-level comparative study of various network architectures based on the "equivalence principle" (a concept specific to netwokring, not in the general sense).
>
> What l want in this regard is a smooth, non-decreasing (hence
> one-to-one) function built out of simulation data because later in my processing, I need an inverse function of the said curve to find out an x value given the y value. That was, in fact, the reason I used the exponential (i.e., non-decreasing function) curve fiting.
>
> Even though I don't need a statistical inference framework for my work, I want to make sure that my use of regression/curve fitting techniques with my simulation data (as a tool for getting the mentioned curve) is proper and a usual practice among experts like you.
>
> To get answer to my question, I digged a lot through the Internet but found no clear explanation so far.
>
> Your suggestions and providing examples (always!) are much appreciated, but I am still not sure the use of those regression procedures with the kind of data I described is a right way to do.
>
> Again, many thanks for your prompt and kind answers, Joseph
>
>
> On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
>> If you are looking for a framework for statistical inference you 
>> could look at additive models as in the mgcv package which has  a 
>> book associated with it if you need more info. e.g.
>>
>> library(mgcv)
>> fm <- gam(dist ~ s(speed), data = cars)
>> summary(fm)
>> plot(dist ~ speed, cars, pch = 20)
>> fm.ci <- with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) +
>> c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2,
>> 2))
>>
>>
>> On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim 
>> <kyeongsoo.kim at gmail.com> wrote:
>>> Hello Gabor,
>>>
>>> Many thanks for providing actual examples for the problem!
>>>
>>> In fact I know how to apply and generate plots using various R 
>>> functions including loess, lowess, and smooth.spline procedures.
>>>
>>> My question, however, is whether applying those procedures directly 
>>> on the data with multiple observations/duplicate points(?) is on the 
>>> sound basis or not.
>>>
>>> Before asking my question to the list, I checked smooth.spline 
>>> manual pages and found the mentioning of "cv" option related with 
>>> duplicate points, but I'm not sure "duplicate points" in the manual 
>>> has the same meaning as "multiple observations" in my case. To me, 
>>> the manual seems a bit unclear in this regard.
>>>
>>> Looking at "car" data, I found it has multiple points with the same 
>>> "speed" but different "dist", which is exactly what I mean by 
>>> multiple observations, but am still not sure.
>>>
>>> Regards,
>>> Joseph
>>>
>>>
>>> On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck 
>>> <ggrothendieck at gmail.com> wrote:
>>>> This will compute a loess curve and plot it:
>>>>
>>>> example(loess)
>>>> plot(dist ~ speed, cars, pch = 20)
>>>> lines(cars$speed, fitted(cars.lo))
>>>>
>>>> Also this directly plots it but does not give you the values of the 
>>>> curve separately:
>>>>
>>>> library(lattice)
>>>> xyplot(dist ~ speed, cars, type = c("p", "smooth"))
>>>>
>>>>
>>>>
>>>> On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim 
>>>> <kyeongsoo.kim at gmail.com> wrote:
>>>>> I recently came to realize the true power of R for statistical 
>>>>> analysis -- mainly for post-processing of data from large-scale 
>>>>> simulations -- and have been converting many of existing
>>>>> Python(SciPy) scripts to those based on R and/or Perl.
>>>>>
>>>>> In the middle of this conversion, I revisited the problem of curve 
>>>>> fitting for simulation data with multiple observations resulting 
>>>>> from repetitions.
>>>>>
>>>>> In the past, I first processed simulation data (i.e., multiple y's 
>>>>> from repetitions) to get a mean with a confidence interval for a 
>>>>> given value of x (independent variable) and then applied spline 
>>>>> procedure for those mean values only (i.e., unique pairs of (x_i,
>>>>> y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather 
>>>>> large confidence intervals, however, the resulting curves were 
>>>>> hardly smooth enough for my purpose, I had to fix the function to 
>>>>> exponential and used least square methods to fit its parameters for data.
>>>>>
>>>>> >From a plot with confidence intervals, it's rather easy for one 
>>>>> >to
>>>>> visually and manually(?) figure out a smoothed curve for it.
>>>>> So I'm thinking right now of directly applying spline (or whatever 
>>>>> regression procedures for this purpose) to the simulation data 
>>>>> with repetitions rather than means. The simulation data in this 
>>>>> case looks like this (assuming three repetitions):
>>>>>
>>>>> # x    y
>>>>> 1      1.2
>>>>> 1      0.9
>>>>> 1      1.3
>>>>> 2      2.2
>>>>> 2      1.7
>>>>> 2      2.0
>>>>> ...      ....
>>>>>
>>>>> So my idea is to let spline procedure handle the fluctuations in 
>>>>> the data (i.e., in repetitions) by itself.
>>>>> But I wonder whether this direct application of spline procedures 
>>>>> for data with multiple observations makes sense from the 
>>>>> statistical analysis (i.e., theoretical) point of view.
>>>>>
>>>>> It may be a stupid question and quite obvious to many, but 
>>>>> personally I don't know where to start.
>>>>> It would be greatly appreciated if anyone can shed a light on this 
>>>>> in this regard.
>>>>>
>>>>> Many thanks in advance,
>>>>> Joseph
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>
>>
>
>
>
>



More information about the R-help mailing list