The backward() of custom operator doesn't run fully

Hi everyone,
I am trying to initialize multiple NDArrayIters inside the backward() method when creating a custom operator. However, I find the program always terminates when the backward does not finish yet. The implementations of forward and backward are as follow:

def forward(self, is_train, req, in_data, out_data, aux):
        logging.debug('forward')
        for i in range(10):
            logging.debug('forward {}'.format(i))
            train_iter = mx.io.NDArrayIter(np.array([1,2,3]), np.array([2,2,2]))
        logging.debug('forward finished')

def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        logging.debug('backward')
        for i in range(10):
            logging.debug('backward {}'.format(i))
            train_iter = mx.io.NDArrayIter(np.array([1,2,3]), np.array([2,2,2]))
        logging.debug('backward finished')

And the output looks like:

DEBUG:root:forward
DEBUG:root:forward 0
DEBUG:root:forward 1
DEBUG:root:forward 2
DEBUG:root:forward 3
DEBUG:root:forward 4
DEBUG:root:forward 5
DEBUG:root:forward 6
DEBUG:root:forward 7
DEBUG:root:forward 8
DEBUG:root:forward 9
DEBUG:root:forward finished
INFO:root:Epoch[0] Train-dummy_metric=0.000000
INFO:root:Epoch[0] Time cost=0.015
DEBUG:root:backward
DEBUG:root:backward 0
DEBUG:root:backward 1
DEBUG:root:backward 2
DEBUG:root:backward 3

As you can see, the program terminates before the backward finished. Note that the backward method does not always stop at iteration 3. The stop point might be different in multiple runs. Could anyone kindly tell me what is the problem here? Thanks :blush:

Also, I have uploaded my full code to Google drive here.

OK, such problem is a bug of stable version 1.4.1, latest pre-release version fixed this problem

2 Likes

Thanks for sharing your solution @xu.xing! Glad you found a work-around.