How to get deterministic results in different runs?

hfutxrg · July 10, 2019, 3:38pm

I want to know how to make the results deterministic in different runs. Because I try to parallelize a model using Horovod. I want to make sure the results are deterministic no matter how many processes are used, as long as the same hyper-parameters are used.

I started with the simplest MNIST example in https://github.com/apache/incubator-mxnet/blob/master/example/distributed_training-horovod/gluon_mnist.py. Since all different results are because of random seed, I set the same random seed by adding the following code in the file:

import numpy as np
import random

mx.random.seed(1234)
np.random.seed(1234)
random.seed(1234)

Only one process was used to run the file. But the output accuracy are still not the same in different runs. I also tried to set shuffle as True in both train and val iterator, but the results were still not reproducible. So how to make the results deterministic in different runs?

I found the same issue in https://github.com/apache/incubator-mxnet/issues/10831, but that issue was not solved.

hfutxrg · July 29, 2019, 7:25pm

Why no one answer this question? I think this is a very important question.

NRauschmayr · July 31, 2019, 4:54pm

Did you set the seed for each device? MXNet uses the device ID to set the state of the random number generator. That means random numbers generated from different devices can be different even if they are seeded using the same seed. To make sure random numbers are the same on each device, you need to set the context.

mx.random.seed(128, ctx=mx.gpu(0))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.gpu(0)).asnumpy())
[[ 2.5020072 -1.6884501]
 [-0.7931333 -1.4218881]]
mx.random.seed(128, ctx=mx.gpu(1))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.gpu(1)).asnumpy())
[[ 2.5020072 -1.6884501]
 [-0.7931333 -1.4218881]]

For more details, you can check the following documentation: https://mxnet.incubator.apache.org/api/python/symbol/random.html#mxnet.random.seed

hfutxrg · August 1, 2019, 7:33pm

I only used one GPU, so the context should not matter much. But I also just tried to add ctx=mx.gpu(0) in my code, and the output accuracy is still different in different runs.

PascalIv · October 13, 2020, 2:03pm

I have the same problem. No reproducibility for both mxnet 1.6.0 and 1.7.0.
Any news?

Topic		Replies	Views
Inconsistent results on GPU Discussion	0	315	March 20, 2020
Very slow initialisation of GPU distributed training Gluon	7	1295	September 7, 2020
Best practices for prediction on a machine with multiple GPUs	3	1190	November 8, 2017
Multi-Threaded Inference Question	1	1008	July 4, 2019
Evaluate accuracy on multi GPU machine Gluon	5	1403	October 10, 2018

How to get deterministic results in different runs?

Related Topics