[solved] numpy.random state seems to repeat across multiple OS runs

Wouter · March 2015

Hey there,

I have an experiment that makes use of numpy arrays on various occasions.
Since numpy has a random module, I decided to use that for randomization.

However, on repeated runs the randomization outcome of the numpy functions is repeated.
I have created a (gist)[https://gist.github.com/wkbouter/0614abcf1491fc67f7ba] that implements a minimal example of this problem; the critical item in that example is the script that runs:

import random
import numpy as np

l = [0,1,2,3,4,5,6]
np.random.shuffle(l)
print l
####
l = [0,1,2,3,4,5,6]
random.shuffle(l)
print l

#####
print np.random.random_sample() 
print random.random()

When I hit any of the run buttons, the print outputs are always the same for the np.random module. Oddly though, not for the random module.

>>> 
Starting experiment as ExperimentProcess-11
Expyriment 0.7.0 (Revision 55a4e7e; Python 2.7.8) 
Warning: OpenGL does not support window mode. OpenGL will be deactivated!
openexp.sampler._legacy.init_sound(): sampling freq = 48000, buffer size = 1024
openexp.sampler._legacy.init_sound(): mixer already initialized, closing
experiment.init_log(): using '//*************/quickrun.csv' as logfile (utf-8)
experiment.run(): experiment started at Fri Mar 27 15:15:30 2015
[2, 3, 5, 4, 6, 0, 1]
[0, 5, 6, 1, 3, 2, 4]
0.612798575407
0.113753964223
experiment.run(): experiment finished at Fri Mar 27 15:15:32 2015
>>> 
Starting experiment as ExperimentProcess-12
Expyriment 0.7.0 (Revision 55a4e7e; Python 2.7.8) 
Warning: OpenGL does not support window mode. OpenGL will be deactivated!
openexp.sampler._legacy.init_sound(): sampling freq = 48000, buffer size = 1024
openexp.sampler._legacy.init_sound(): mixer already initialized, closing
experiment.init_log(): using '/*************/defaultlog.csv' as logfile (utf-8)
experiment.run(): experiment started at Fri Mar 27 15:15:37 2015
[2, 3, 5, 4, 6, 0, 1]
[6, 1, 4, 3, 5, 0, 2]
0.612798575407
0.563259958776
experiment.run(): experiment finished at Fri Mar 27 15:15:39 2015
>>> 
Starting experiment as ExperimentProcess-13
Expyriment 0.7.0 (Revision 55a4e7e; Python 2.7.8) 
openexp.sampler._legacy.init_sound(): sampling freq = 48000, buffer size = 1024
openexp.sampler._legacy.init_sound(): mixer already initialized, closing
experiment.init_log(): using '/*************/defaultlog.csv' as logfile (utf-8)
experiment.run(): experiment started at Fri Mar 27 15:15:46 2015
[2, 3, 5, 4, 6, 0, 1]
[3, 4, 5, 1, 2, 6, 0]
0.612798575407
0.383239057903
experiment.run(): experiment finished at Fri Mar 27 15:15:47 2015
>>>

When I call np.random.shuffle multiple times within an expt it will give novel outcomes consecutively. So randomization within a session works.

Neverthless, the sequence of calls as a whole will always give the same result overall.....

I'm completely at a loss what could cause this -- any thoughts?

Wouter · March 2015

By the way -- I found I can resolve the issue in terms of its outcome, simply by calling np.random.seed() at the start of my experiment.

-- but I have very little understanding as to why/where this is happening, and it seems that it shouldn't happen to begin with

sebastiaan · March 2015

Hi Wouter,

This is quite bizarre, especially because when you execute your code directly in a Python script the random number generator is reinitialized. In general terms it must be a side-effect of how OpenSesame imports and re-imports numpy, but beyond that I have no idea. The solution would be to always call numpy.random.seed() when the experiment is launched, but only when numpy is available (it shouldn't become a dependency). I filed an issue for it here.

Thanks for spotting this one!

Cheers,
Sebastiaan

Wouter · March 2015

Since the seed seems to be reset to the same value whenever opensesame runs, it would seem that something that is either imported or run at startup sets it.

But since the process repeats within and not between processes, it can't be a hard-coded seed -- Per's suggestion was that maybe some dependency is using the PID as a seed?
I can't find any out-of-the-ordinary pieces of code in the opensesame source though when searching either for numpy, np or seed.

Also, seed repeats even when switching backends in between sessions.

sebastiaan · March 2015

But since the process repeats within and not between processes, it can't be a hard-coded seed -- Per's suggestion was that maybe some dependency is using the PID as a seed?

According to the documentation, it uses either /udev/random or the current time as a seed. And the process id is different on different runs anyway, at least when using the multiprocess runner, which is when the problem occurs (not with the other runners). So it must be that for some reason the numpy random seed is simply not reinitialized, even when started in a different process.

For the standard random module this is clearly different--it works as expected.

I can't find any out-of-the-ordinary pieces of code in the opensesame source though when searching either for numpy, np or seed.

In principle, OpenSesame doesn't use numpy. That way it remains portable across platforms on which numpy isn't available, which at present is only Android. So you won't find any references to numpy (the synth back-end is an exception--there was little choice there).

So you could say that it's the users' own responsibility to call numpy.random.seed(). However, that's hardly realistic, and I think it makes sense to add a call to numpy.random.seed() to protect users from this weird behavior (wherever the cause may lie).

Howdy, Stranger!

Categories

[solved] numpy.random state seems to repeat across multiple OS runs

Comments

Howdy, Stranger!

Quick Links

Categories

[solved] numpy.random state seems to repeat across multiple OS runs

Comments