Gluon Out of Memory Issue (cudaMalloc failed: out of memory)

Hi,
I’m using Gluon to construct a custom neural net and train it. My network consists of a sequence of 15 conv2D blocks (Each block is a conv2d - activation - norm - dropout). The network has around 7M parameters which is reasonably small compared to other networks and I seem to be running into memory issues still. What might be the issue here? Any ideas?

I tried to figure out where the memory starts to explode. I use an input batch of shape (2,1,10000). A batch size of 2 is also pretty small in general for the networks I’ve trained before. I have a part of my network architecture below.

Before the x = self.conv5( self.conv4( self.conv3( self.conv2( self.conv1( x))))) step, the memory usage is 900 MB. After its execution, it jumps to ~7GB when I check on my nvidia gpus. This blows up further since I have 15 convolutions overall.

Code snippet

    # Batch size
    bt = x.shape[0]

    # Duration
    T = x.shape[2]

    # Apply some dropout on the input
    x = self.dropout_layer( x)

    # Apply front end transform
    Mag, Phs = self.front_end_transform( x)

    # Add a dimension to facilitate stacking
    M_ed = Mag.expand_dims( axis=1)
    P_ed = Phs.expand_dims( axis=1)

    x = nd.concat( M_ed, P_ed, dim=1)


    # Transpose to make bt x nchannels x t x f
    x = nd.transpose( x, axes=[0,1,3,2])

    pdb.set_trace()

    # convolutions 1 - 5
    x = self.conv5( self.conv4( self.conv3( self.conv2( self.conv1( x)))))

    pdb.set_trace()

Thanks,
Shrikant

Hi, this sounds like you are (maybe) not reducing the size of the image with “pooling” operations? Can you please post the full architecture? In your code I cannot see the size of the convolution layers. This topic discusses a similar problem.