I didn’t observe this for Adam or SGD. The GPU memory was about 3.2G during initialization. After that it was 6.9G and then 6.0G and then 4.6G and so on so forth, until the out of memory reported.
/home/ubuntu/src/mxnet/dmlc-core/include/dmlc/./logging.h:308: src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fe2e107bf0c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet7storage23GPUPooledStorageManager5AllocEm+0x15e) [0x7fe2e20acbae]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet11StorageImpl5AllocEmNS_7ContextE+0x69) [0x7fe2e20b00b9]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(+0x175f895) [0x7fe2e20d4895]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7fe2e209ccb3]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x123) [0x7fe2e20a59d3]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7fe2e209f13a]
[bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fe2f7d0cc80]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fe2fd9dc6ba]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fe2fd7123dd]
/home/ubuntu/src/mxnet/dmlc-core/include/dmlc/./logging.h:308: src/engine/./threaded_engine.h:347: src/storage/./pooled_storage_manager.h:102: cudaMalloc failed: out of memory
The GPU memory cost is about 6G for Adam and 4.5G for SGD.
Appreciate if anybody could help.