[R] can you tell what .Random.seed was?

Fri May 15 20:39:22 CEST 2009

G. Jay Kerns wrote:
> 
> I want it to be *difficult* for students to figure out the seed and
> automatically generate solutions on their own.

Hmmm.... Would it really be a bad thing if someone reverse engineered 
this to generate answers given the problem set?  If it's hard enough to 
do that, it'd be more worth solving than the given problem set.  I call 
that "extra credit".

> a brute force search of set.seed() is really
> pretty easy and fast... even for students at this level.

Either you're misunderstanding Stavros' benchmark results, or I am. 
Could easily be the latter...I'm an R newbie.

As far as I can tell, the inner part of the loop does very little.  If 
that's right, Stavros is saying it will take 18 hours to try every 
possible seed when the algorithm based on that seed takes almost no time 
to run.  But, if generating each problem set takes, say, a minute, it 
will take 4.7 million years to generate a complete rainbow table when 
there are 2^32 possible seeds.

> what if the Instructor
> inserted an *unknown* very large number of calls to the RNG near the
> beginning of the .Rnw (but after the set.seed)...  and did not
> distribute this information to the students...  that would make it
> much harder, yes?

There are better ways.

As above, one key to making rainbow tables impractical is making the 
per-iteration time long enough.  Even if it only takes a second to 
generate each possible problem set, that's enough when multiplied by 
high enough powers of 2.

The other key is using big enough powers of 2.

I hadn't looked into R's random number generation before, but it appears 
quite robust.  Seeding it with the current wall clock time (a 32-bit 
integer on most systems) is an insult to its capability.

The default pseudo-random number generator (PRNG) in my copy of R is the 
Mersenne Twister, a truly awesome algorithm.  It's capable of very high 
quality results, as long as you give it a good seed.  It will take a 
vector of *many* integers as a seed, not just one.  It's not clear to me 
from the R docs if you can pass an arbitrary array of integers with any 
value, or if it needs something special.

Assuming you can give it any old passel of randomness as a seed, you 
just have to find a good source of randomness to create that seed.  On a 
Linux box, you could concatenate several dozen bytes read from 
/dev/random, the current wall clock time in microseconds, the inode of 
the R script being run, the process ID of the R interpreter, and the 
current mouse cursor position into a single string.  Feed all that into 
a hash algorithm, and break off pieces of that 4 bytes long, cast them 
to integers, and send that array of ints to set.seed().

If you use SHA-256 as the hash algorithm, that scheme should give you 
enough input randomness to get any of the possible 2^256 hash outputs, 
making that the amount of possible problem sets.  That's more than a 
rainbow table buster...there aren't enough atoms in the visible universe 
to construct a computer big enough to cope with 2^256 possible outputs.

That said, the quality of the PRNG just *allows* you to avoid screwing 
up.  It doesn't make it impossible make a weak algorithm.

[R] can you tell what .Random.seed *was*?

[R] can you tell what .Random.seed was?