ErrStr:out of memory on Faster RCNN

czrcbl · June 18, 2020, 6:24pm

Hi,

I am trying to train a Faster RCNN in a custom dateset using the script provided on gluoncv site.
I am using a RTX 2070 with 8Gb of memory in a docker container with cuda102 and the git version of gluoncv once the pip version would yield import error for the Faster R-CNN model.

The training runs normally for a random number of epochs and then I get the following error:

mxnet.base.MXNetError: [18:13:30] src/operator/random/./../tensor/./broadcast_reduce-inl.cuh:554: Check failed: err == cudaSuccess (2 vs. 0) : Name: reduce_kernel ErrStr:out of memory

I am using batch_size=1 and disable-hybridization , reduced short to 600 in order to minimize the memory usage.

This problem is really strange, because last year I was able to train Faster R-CNN in the same hardware with the same dataset, that time I used the previous version of the script (I realized that it is updated from time to time).

So, what might be the problem?

Topic		Replies	Views
Gluon Multi GPU Out of Memory Issues	6	3418	April 11, 2019
Symbolic Faster-RCNN meets CUDA illegal memory access error Discussion	0	425	September 2, 2020
Speed Issue converting NDarray to np.array Performance	2	660	August 21, 2019
FasterRCCN Coco takes 5s each foward pass	3	410	August 28, 2019
Training speed in MXNet is nearly 2.5x times slower than Pytorch	8	2976	January 20, 2019

ErrStr:out of memory on Faster RCNN

Related Topics