The following sequence works for me on Google Colab (Jupyter notebook):
!nvcc --version
# Run on a non-GPU instance first.
Result:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
Then (on a GPU accelerated instance):
!pip install mxnet-cu100
# Must install mxnet version matching CUDA version above.
import mxnet as mx
# Testing that GPU works.
a = mx.nd.ones((2, 3), mx.gpu())
b = a * 2 + 1
print(b)