[R] Is R the right choice for simulating first passage times of random walks?

Thu Jul 28 02:24:20 CEST 2011

Top posting cuz hotmail decided not to highlight...

Personally I would tend to use java or c++ for the inner loops
but you could of course later make an R package out of that.
This is especially true if your code will be used elsewhere
in a performance critical system. For example, I wrote some
c++ code for dealing with graphs nothing fancy but it
let me play with some data structure ideas and I could
then build it into standalone programs or perhaps an R
package. 

Many of these things slow down due to memory incoherence
or IO long before you use up the processor. With c++
in principle anyway you have a lot of control over these things.

Once you have your results and want to anlyze them, that's
when I would use R. Dumping simulation samples to a text file
is easy to also lets you use other things like sed/grep or
vi to explore as needed. 

----------------------------------------
From: paulepanter at users.sourceforge.net
To: r-help at r-project.org
Date: Thu, 28 Jul 2011 02:00:13 +0200
Subject: Re: [R] Is R the right choice for simulating first passage times of random walks?

Dear R folks,

Am Donnerstag, den 28.07.2011, 01:36 +0200 schrieb Paul Menzel:

> I need to simulate first passage times for iterated partial sums. The
> related papers are for example [1][2].
>
> As a start I want to simulate how long a simple random walk stays
> negative, which should result that it behaves like n^(-½). My code looks
> like this.
>
> -------- 8< -------- code -------- >8 --------
> n = 100000 # number of simulations
>
> length = 100000 # length of iterated sum
> z = rep(0, times=length + 1)
>
> for (i in 1:n) {
>       x = c(0, sign(rnorm(length)))
>       s = cumsum(x)
>       for (i in 1:length) {
>               if (s[i] < 0 && s[i + 1] >= 0) {
>                       z[i] = z[i] + 1
>               }
>       }
> }
> plot(1:length(z), z/n)
> curve(x**(-0.5), add = TRUE)
> -------- 8< -------- code -------- >8 --------

Of course the program above is not complete, because it only checks for
the first passage from negativ to positive. `if (s[2] < 0) {}` should be
added before the for loop.

> This code already runs for over half an hour on my system¹.
>
> Reading about the for loop [3] it says to try to avoid loops and I
> probably should use a matrix where every row is a sample.
>
> Now my first problem is that there is no matrix equivalent for
> `cumsum()`. Can I use matrices to avoid the for loop?

I mean the inner for loop.

Additionally I wonder if `cumsum` is really faster or if I should sum
the elements by myself and check after every step if the walk gets
non-negative/0. With a length of 1000000 this should save some cycles.
On the other hand adding numbers should be really fast and adding checks
in between could potentially be slower.

> My second question is, is R the right choice for such simulations? It
> would be great when R can also give me a confidence interval(?) and also
> try to fit a curve through the result and give me the rule of
> correspondence(?) [4]. Do you have pointers for those?
>
> I glanced at simFrame [5] and read `?simulate` but could not understand
> it right away and think that this might be overkill.
>
> Do you have any suggestions?

Thanks,

Paul

> ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz
>
>
> [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps
> [2] http://arxiv.org/abs/0911.5456
> [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution
> [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics)
> [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.