GluonCV on Jupyter: "The kernel appears to have died. It will restart automatically."

olivcruche · November 25, 2018, 9:26pm

Hi, I’m adapting this gluoncv demo https://gluon-cv.mxnet.io/build/examples_detection/finetune_detection.html to another dataset.
Apart from using a different dataset, I changed the model to ssd_512_resnet50_v1_custom. The Data Loader does not seem to handle those changes, and the snippet below kills jupyter (returns “The kernel appears to have died. It will restart automatically.”) on both ml.m4.2xlarge and ml.p2.xlarge Amazon SageMaker instances. What could cause that?..

def get_dataloader(net, train_dataset, data_shape, batch_size, num_workers):
    from gluoncv.data.batchify import Tuple, Stack, Pad
    from gluoncv.data.transforms.presets.ssd import SSDDefaultTrainTransform
    width, height = data_shape, data_shape
    # use fake data to generate fixed anchors for target generation
    with autograd.train_mode():
        _, _, anchors = net(mx.nd.zeros((1, 3, height, width)))
    batchify_fn = Tuple(Stack(), Stack(), Stack())  # stack image, cls_targets, box_targets
    train_loader = gluon.data.DataLoader(
        train_dataset.transform(SSDDefaultTrainTransform(width, height, anchors)),
        batch_size, True, batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)
    return train_loader

train_data = get_dataloader(
    net=net,
    train_dataset=gcv.data.RecordFileDetection('train.rec'),
    data_shape=600,
    batch_size=4,
    num_workers=0)

for i in train_data:
    print(i)

DarkWings · November 26, 2018, 2:12am

Hi,friend,I noticed that there was something wrong with your place. for i in train_data: it should be replaced by for i in enumerate(train_data):

NRauschmayr · November 27, 2018, 8:30pm

I tried to reproduce the problem, but your code example is working fine on my p2 instance. I assume that there is an issue with your input data train.rec. Would you mind sharing the file with me, so that I can investigate why your script is failing?

NRauschmayr · November 29, 2018, 5:37pm

The reason why I could not reproduce your problem, is that I was running a different MXNet version on my P2 instance. When running on a new instance, I encountered the same issue. A Github issue has been opened: https://github.com/apache/incubator-mxnet/issues/13448

Topic		Replies	Views
Gluoncv pikachu killing jupyter on p3.2x	4	674	November 29, 2018
Cryptic failure of SSD training with gluoncv 0.5.0 Gluon	1	503	October 23, 2019
MNIST: DataLoader example doesn't terminate Gluon	2	329	January 25, 2020
Gluoncv SSD working in notebook, failing in docker on same notebook Gluon	1	561	April 2, 2020
Data loader with rectangular images for object-detections Discussion	2	486	March 30, 2020

GluonCV on Jupyter: "The kernel appears to have died. It will restart automatically."

Related Topics