Low GPU usage training cifar10

environment : python 3.6.4 / mxnet-cu91 / GTX1070

hi everyone , i’m trying to run the code written in chapter “deep CNN” in tutorials(https://gluon.mxnet.io/chapter04_convolutional-neural-networks/deep-cnns-alexnet.html)

It’s a alexnet model with cifar10 dataset , i’ve tried different batch size (64/128/512) ,but gpu usage always be at about 20% and running at about 30% TDP (gpu memory usage is pretty high at 5886MB of 8 gigs when batch size equals to 512) , it’s depressing.

am i doing something wrong? could anyone help,thanks

by the way i’m running on windows ,is that the problem?

The bottleneck could be dataloading, can you try using this instead:

import multiprocessing
train_data = gluon.data.DataLoader(
    gluon.data.vision.CIFAR10('./data', train=True, transform=transformer),
    batch_size=batch_size, shuffle=True, last_batch='discard', num_workers=multiprocessing.cpu_count())

test_data = gluon.data.DataLoader(
    gluon.data.vision.CIFAR10('./data', train=False, transform=transformer),
    batch_size=batch_size, shuffle=False, last_batch='discard', num_workers=multiprocessing.cpu_count())

Also on every batch your are calling .asscalar() which forces a synchronous copy to CPU. Moving this at the beginning of your epoch loop, rather than at the end should help because the data will be loaded first on GPU:

        if i > 0:
            curr_loss = nd.mean(loss).asscalar()
            moving_loss = (curr_loss if ((i == 0) and (e == 0))
                           else (1 - smoothing_constant) * moving_loss + (smoothing_constant) * curr_loss)

edit: not sure multiprocessing will work on windows as it does not support forking :confused:

1 Like


Thanks a lot , couldn’t run your multiprocessing code ,but performance significantly improved after i put .asscalar() in the front of each loop as you said!

1 Like