[R] can you tell what .Random.seed was?

Fri May 15 15:09:01 CEST 2009

Duncan Murdoch wrote:
>> 1) can you tell me what my original set.seed() value was?  (I wouldn't
>> be able to figure it out, but maybe someone can)
> 
> The only way I know is to test all 2^32 possible values of the seed.  I 
> think cryptographers would know faster ways.

Well, I'm not a cryptographer, but I know a faster way: rainbow tables.

	http://en.wikipedia.org/wiki/Rainbow_table

Given that the algorithm to generate these images is known and that each 
seed always gives the same image as output, you can simply precompute 
all possible images, hash them using your favorite algorithm -- say, 
SHA-256 -- and record the seed-to-hash correspondence on disk.  Then 
given an output image, you can hash it and use that to look up the seed.

It can take a long time to generate all the images, but then you have 
database like lookup speeds for image-to-seed correspondence.

This is not just a theoretical idea.  There are underground sites where 
you can put in, say, an MD5 password hash and get out the likely 
password that was actually used.  This allows a black hat to break into 
one site, grab their password hash database, reverse engineer the 
passwords, and then go use them to bang on the front door of other sites 
users of that site he first compromised also use.  There are defenses 
against this: salting the passwords and using passwords too big to 
appear in rainbow tables are easiest.

> Now, if the seed was removed just before the values were generated, the 
> seed would be generated from the system clock.  If you knew the time 
> that this occurred approximately, the search could be a lot faster.

This also helps with the rainbow table approach.

Given that the seed for the generation algorithm is always the current 
wall time, you can restrict the needed rainbow table size greatly.  You 
simply have to know when the algorithm was first put into use, then 
start your rainbow table with that time's value as the first seed, and 
only compute up to "now" plus whatever you need for future operations.

For instance, you can cover about 3 years worth of image production in 
about 1/45 the time as it takes to cover all 2^32 possible images.  Say 
it takes a month to generate a rainbow table covering those 3 years. 
Full coverage would then take nearly 4 years.

[R] can you tell what .Random.seed *was*?

[R] can you tell what .Random.seed was?