[solved] numpy.random state seems to repeat across multiple OS runs
Hey there,
I have an experiment that makes use of numpy arrays on various occasions.
Since numpy has a random module, I decided to use that for randomization.
However, on repeated runs the randomization outcome of the numpy functions is repeated.
I have created a (gist)[https://gist.github.com/wkbouter/0614abcf1491fc67f7ba] that implements a minimal example of this problem; the critical item in that example is the script that runs:
import random
import numpy as np
l = [0,1,2,3,4,5,6]
np.random.shuffle(l)
print l
####
l = [0,1,2,3,4,5,6]
random.shuffle(l)
print l
#####
print np.random.random_sample()
print random.random()
When I hit any of the run buttons, the print outputs are always the same for the np.random module. Oddly though, not for the random module.
>>>
Starting experiment as ExperimentProcess-11
Expyriment 0.7.0 (Revision 55a4e7e; Python 2.7.8)
Warning: OpenGL does not support window mode. OpenGL will be deactivated!
openexp.sampler._legacy.init_sound(): sampling freq = 48000, buffer size = 1024
openexp.sampler._legacy.init_sound(): mixer already initialized, closing
experiment.init_log(): using '//*************/quickrun.csv' as logfile (utf-8)
experiment.run(): experiment started at Fri Mar 27 15:15:30 2015
[2, 3, 5, 4, 6, 0, 1]
[0, 5, 6, 1, 3, 2, 4]
0.612798575407
0.113753964223
experiment.run(): experiment finished at Fri Mar 27 15:15:32 2015
>>>
Starting experiment as ExperimentProcess-12
Expyriment 0.7.0 (Revision 55a4e7e; Python 2.7.8)
Warning: OpenGL does not support window mode. OpenGL will be deactivated!
openexp.sampler._legacy.init_sound(): sampling freq = 48000, buffer size = 1024
openexp.sampler._legacy.init_sound(): mixer already initialized, closing
experiment.init_log(): using '/*************/defaultlog.csv' as logfile (utf-8)
experiment.run(): experiment started at Fri Mar 27 15:15:37 2015
[2, 3, 5, 4, 6, 0, 1]
[6, 1, 4, 3, 5, 0, 2]
0.612798575407
0.563259958776
experiment.run(): experiment finished at Fri Mar 27 15:15:39 2015
>>>
Starting experiment as ExperimentProcess-13
Expyriment 0.7.0 (Revision 55a4e7e; Python 2.7.8)
openexp.sampler._legacy.init_sound(): sampling freq = 48000, buffer size = 1024
openexp.sampler._legacy.init_sound(): mixer already initialized, closing
experiment.init_log(): using '/*************/defaultlog.csv' as logfile (utf-8)
experiment.run(): experiment started at Fri Mar 27 15:15:46 2015
[2, 3, 5, 4, 6, 0, 1]
[3, 4, 5, 1, 2, 6, 0]
0.612798575407
0.383239057903
experiment.run(): experiment finished at Fri Mar 27 15:15:47 2015
>>>
When I call np.random.shuffle
multiple times within an expt it will give novel outcomes consecutively. So randomization within a session works.
Neverthless, the sequence of calls as a whole will always give the same result overall.....
I'm completely at a loss what could cause this -- any thoughts?
Comments
By the way -- I found I can resolve the issue in terms of its outcome, simply by calling
np.random.seed()
at the start of my experiment.-- but I have very little understanding as to why/where this is happening, and it seems that it shouldn't happen to begin with
Hi Wouter,
This is quite bizarre, especially because when you execute your code directly in a Python script the random number generator is reinitialized. In general terms it must be a side-effect of how OpenSesame imports and re-imports numpy, but beyond that I have no idea. The solution would be to always call
numpy.random.seed()
when the experiment is launched, but only when numpy is available (it shouldn't become a dependency). I filed an issue for it here.Thanks for spotting this one!
Cheers,
Sebastiaan
Check out SigmundAI.eu for our OpenSesame AI assistant!
Since the seed seems to be reset to the same value whenever opensesame runs, it would seem that something that is either imported or run at startup sets it.
But since the process repeats within and not between processes, it can't be a hard-coded seed -- Per's suggestion was that maybe some dependency is using the PID as a seed?
I can't find any out-of-the-ordinary pieces of code in the opensesame source though when searching either for
numpy
,np
orseed
.Also, seed repeats even when switching backends in between sessions.
According to the documentation, it uses either
/udev/random
or the current time as a seed. And the process id is different on different runs anyway, at least when using the multiprocess runner, which is when the problem occurs (not with the other runners). So it must be that for some reason the numpy random seed is simply not reinitialized, even when started in a different process.For the standard
random
module this is clearly different--it works as expected.In principle, OpenSesame doesn't use numpy. That way it remains portable across platforms on which numpy isn't available, which at present is only Android. So you won't find any references to numpy (the
synth
back-end is an exception--there was little choice there).So you could say that it's the users' own responsibility to call
numpy.random.seed()
. However, that's hardly realistic, and I think it makes sense to add a call tonumpy.random.seed()
to protect users from this weird behavior (wherever the cause may lie).Check out SigmundAI.eu for our OpenSesame AI assistant!