Memory issue with Module forward function

I recently tried to implement CycleGAN in symbolic API (alongside with module API). I realized while training, the MXNET is taking more and more of my RAM to a point that it takes up the whole memory. (32GB of RAM). and this is not while loading the dataset but while the training was happening. So I investigated different parts of my code to see where the problem is from. I realized this is happening while performing the forward propagation in the generator network.
So I made this code, There is no database loaded, it is just a network accepting a random vector and generates an image at the output. The network is the generator network from

Here we just do the forward propagation, 10000000 times. If running this code, you can see that python is taking more and more of your RAM while the code is running.
Can Someone please explain to me what’s happening and how can I fix this. (I am using TitanX (maxwell) and CUDA 8.0 on ubuntu 16.4 and mxnet 1.0.0)

import mxnet as mx
class RandIter(
    def __init__(self, batch_size, ndim):
        self.batch_size = batch_size
        self.ndim = ndim
        self.provide_data = [('rand', (batch_size, ndim, 1, 1))]
        self.provide_label = []

    def iter_next(self):
        return True

    def getdata(self):
        #Returns random numbers from a gaussian (normal) distribution
        #with mean=0 and standard deviation = 1
        return [mx.random.normal(0, 1.0, shape=(self.batch_size, self.ndim, 1, 1))]

Z = 100
rand_iter = RandIter(batch_size, Z)

no_bias = True
fix_gamma = True
epsilon = 1e-5 + 1e-12

rand = mx.sym.Variable('rand')

g1 = mx.sym.Deconvolution(rand, name='g1', kernel=(4,4), num_filter=1024, no_bias=no_bias)
gbn1 = mx.sym.BatchNorm(g1, name='gbn1', fix_gamma=fix_gamma, eps=epsilon)
gact1 = mx.sym.Activation(gbn1, name='gact1', act_type='relu')

g2 = mx.sym.Deconvolution(gact1, name='g2', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=512, no_bias=no_bias)
gbn2 = mx.sym.BatchNorm(g2, name='gbn2', fix_gamma=fix_gamma, eps=epsilon)
gact2 = mx.sym.Activation(gbn2, name='gact2', act_type='relu')

g3 = mx.sym.Deconvolution(gact2, name='g3', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=256, no_bias=no_bias)
gbn3 = mx.sym.BatchNorm(g3, name='gbn3', fix_gamma=fix_gamma, eps=epsilon)
gact3 = mx.sym.Activation(gbn3, name='gact3', act_type='relu')

g4 = mx.sym.Deconvolution(gact3, name='g4', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=128, no_bias=no_bias)
gbn4 = mx.sym.BatchNorm(g4, name='gbn4', fix_gamma=fix_gamma, eps=epsilon)
gact4 = mx.sym.Activation(gbn4, name='gact4', act_type='relu')

g5 = mx.sym.Deconvolution(gact4, name='g5', kernel=(4,4), stride=(2,2), pad=(1,1), num_filter=3, no_bias=no_bias)
generatorSymbol = mx.sym.Activation(g5, name='gact5', act_type='tanh')

sigma = 0.02
lr = 0.0002
beta1 = 0.5
# Define the compute context, use GPU if available
ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()

#=============Generator Module=============
generator = mx.mod.Module(symbol=generatorSymbol, data_names=('rand',), label_names=None, context=ctx)


for i in range(10000000):
    rbatch =
    generator.forward(rbatch, is_train=True)

MXNet has an asynchronous execution engine. When you call forward, all that happens is that the sequence of operations along with the necessary data for those operations is scheduled to be executed by the engine. What your loop is doing, is basically scheduling several operations faster than they can be handled by the engine, so memory increases. If you use any of the blocking calls (nd.waitall(), NDArray.asnumpy(), NDArray.asscalar(), or NDArray.wait_to_read() ), then the symptom will stop.


Thanks for the reply. Adding mx.ndarray.waitall() after each iteration solved the memory issue.