[R] A model for possibly periodic data with varying amplitude [repost, much edited]

Tue Aug 15 23:39:57 CEST 2006

Hi dear R community,

I have up to 12 measures of a protein for each of 6 patients, taken
every two or three days. The pattern of the protein looks periodic,
but the height of the peaks is highly variable.  I'm testing for
periodicity using a Monte Carlo simulation envelope approach applied
to a cumulative periodogram.  Now I want to predict the location of
the peaks in time.  Of course, the peaks might be occurring on
unmeasured days.

Sadly, an NDA prohibits me from sharing the real data. The data look
something like this:

##################################################################

patient <- data.frame(
	day = c(1, 3, 5, 8, 10, 12, 15, 17, 19, 22, 24, 26),
	protein = c(5, 3, 10, 7, 2, 8, 25, 22, 7, 10, 12, 5)
	)

plot(patient$day, patient$protein, type="b")

# This is my model:

wave.form <-
  deriv3( ~ sin(2*pi*((day-offset)/period + 0.25)) * amplitude + mean,
         c("period", "offset", "amplitude", "mean"),
         function(day, period, offset, amplitude, mean){})

curve(wave.form(x, period=7, offset=2, mean=5, amplitude=4), 
			        from=1, to=30)

require(nlme)

wave.1 <- gnls(protein ~ wave.form(day, period, offset, amplitude, mean),
               start=list(period=7, offset=0, amplitude=10, mean=5),
               weights=varPower(), data=patient)

wave.1

#################################################################

I emphasize that I don't care about the mean or amplitude, just the
offset and the period. The problem for my model is that spikes in the
data, such as at day 15, seem to make fitting the model quite
unstable, (although it is fine in this artificial case).  The best I
can think of right now is to use the "weights=varPower()" model in
gnls(), which I expect will down-weight the extreme spikes.  Often
that model fails to converge.  Then, all I can do is fall back on
modelling the log response.  If I do both, then when the weights model
converges the estimates sometimes differ, and sometimes considerably.

I know that I need more data, but more data are not available.  So,
instead I need to do the best I can :)

I have a half-baked idea of conditionally shrinking the peaks, so that
they all look like the same size - perhaps by discretizing the data
into, say, three possible values defined by quantiles of the response
variable or the estimated slope - but I'm concerned that doing so
risks creating the periodicity that I want to detect.  Also the series
is not necessarily stationary, so I wouild have to smooth it somehow
first.

On the other hand, maybe I can test for periodicity in the original
data, and then fit on some transformed data.  If I reject the null
hypothesis of no periodic oscillation, then is it reasonable to
condition subsequent analysis on the belief that periodicity exists,
and try to determine its characteristics?

I wonder if anyone can suggest an analysis technique or model that
more directly accommodates the varying heights of the peaks?  Or is
this pretty much the best I can do?

Any suggestions will be gratefully received.

Cheers,

Andrew
-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
Email: a.robinson at ms.unimelb.edu.au         http://www.ms.unimelb.edu.au