[R] can you tell what .Random.seed *was*?
Warren Young
warren at etr-usa.com
Fri May 15 15:09:01 CEST 2009
Duncan Murdoch wrote:
>> 1) can you tell me what my original set.seed() value was? (I wouldn't
>> be able to figure it out, but maybe someone can)
>
> The only way I know is to test all 2^32 possible values of the seed. I
> think cryptographers would know faster ways.
Well, I'm not a cryptographer, but I know a faster way: rainbow tables.
http://en.wikipedia.org/wiki/Rainbow_table
Given that the algorithm to generate these images is known and that each
seed always gives the same image as output, you can simply precompute
all possible images, hash them using your favorite algorithm -- say,
SHA-256 -- and record the seed-to-hash correspondence on disk. Then
given an output image, you can hash it and use that to look up the seed.
It can take a long time to generate all the images, but then you have
database like lookup speeds for image-to-seed correspondence.
This is not just a theoretical idea. There are underground sites where
you can put in, say, an MD5 password hash and get out the likely
password that was actually used. This allows a black hat to break into
one site, grab their password hash database, reverse engineer the
passwords, and then go use them to bang on the front door of other sites
users of that site he first compromised also use. There are defenses
against this: salting the passwords and using passwords too big to
appear in rainbow tables are easiest.
> Now, if the seed was removed just before the values were generated, the
> seed would be generated from the system clock. If you knew the time
> that this occurred approximately, the search could be a lot faster.
This also helps with the rainbow table approach.
Given that the seed for the generation algorithm is always the current
wall time, you can restrict the needed rainbow table size greatly. You
simply have to know when the algorithm was first put into use, then
start your rainbow table with that time's value as the first seed, and
only compute up to "now" plus whatever you need for future operations.
For instance, you can cover about 3 years worth of image production in
about 1/45 the time as it takes to cover all 2^32 possible images. Say
it takes a month to generate a rainbow table covering those 3 years.
Full coverage would then take nearly 4 years.
More information about the R-help
mailing list