Unable to run sample code on GPU

Hi,

I’ve been trying to update the segmentation demo so I can run it on GPU. I’ve installed mxnet-cu91 to match my CUDA version and ha ev update the context used in teh FCN Pre-trained model

ctx = mx.gpu(0)

transform_fn = transforms.Compose([
** transforms.ToTensor(),**
** transforms.Normalize([.485, .456, .406], [.229, .224, .225])**
])
img = transform_fn(img)
img = img.expand_dims(0).as_in_context(ctx)
model = gluoncv.model_zoo.get_model(‘fcn_resnet50_voc’, pretrained=True, ctx=ctx)

The img context is gpu(0) (I’ve printed it out) but as soon as the attempt to load a model I get the error:

Failed to load Parameter 'fcn0_dilatedresnetv00_layers2_batchnorm11_runnung_var on [gpu[0]] because it was previous initialized on [cpu(0)]

Since I pass the correct context tinto the get_model call why is this having mixed contexts?

I am not sure why you have such a problem, but if I run this script on CUDA-9.0 (I don’t have an access to 9.1) I don’t receive the exception you mention:

import mxnet as mx
import gluoncv

model = gluoncv.model_zoo.get_model("fcn_resnet50_voc", pretrained=True, ctx=mx.gpu(0))

I also don’t think it is related to CUDA version, because the exception looks like to be generated on MXNet level.

Can you check what your versions of MXNet and gluoncv are? Mine are 1.2.0 and 0.2.0 respectively. Maybe you need to update yours?

Can you tell me which versions do you use?
Can you try my minimal example and tell me if the problem is still there?

Thanks, I was indeed still on a combination of mxnet-cu91 1.2.0 and gluoncv 0.1.0

After updating to 1.2.0 and gluoncv 0.2.0b20180618 the model loads without error on GPU.

However, a bigger issue is that the segmentation output is now incorrect. The output appears to be fixed to 480x480 and no longer matches the ouptut I had with 0.1.0

This happens on both cpu and gpu mode

I have adjusted the code a little bit, so you could get the result of the same dimensions. Basically, I use MultiEvalModel from original test.py.

It works way slower (on my CPU it takes about a half a minute to generate an output), and it also requires to install the latest version of gluoncv with pip install gluoncv --pre.

Here is the code for this airplane example you have posted:

"""1. Getting Started with FCN Pre-trained Models
==============================================
This is a quick demo of using GluonCV FCN model.
"""
import mxnet as mx
from mxnet import image
from mxnet.gluon.data.vision import transforms
from gluoncv.model_zoo.segbase import *
import gluoncv
# using cpu
ctx = mx.cpu(0)


##############################################################################
# Prepare the image
# -----------------
#
# download the example image
url = 'https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/segmentation/voc_examples/4.jpg'
filename = 'example.jpg'
gluoncv.utils.download(url, filename)

##############################################################################
# load the image
img = image.imread(filename)

from matplotlib import pyplot as plt
plt.imshow(img.asnumpy())
plt.show()

##############################################################################
# normalize the image using dataset mean
transform_fn = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([.485, .456, .406], [.229, .224, .225])
])
img = transform_fn(img).as_in_context(ctx)

##############################################################################
# Load the pre-trained model and make prediction
# ----------------------------------------------
#
# get pre-trained model
model = gluoncv.model_zoo.get_model('fcn_resnet50_voc', pretrained=True)
evaluator = MultiEvalModel(model, 21, ctx_list=ctx)

##############################################################################
# make prediction using single scale
output = evaluator(img)
predict = mx.nd.squeeze(mx.nd.argmax(output, 1)).asnumpy()

##############################################################################
# Add color pallete for visualization
from gluoncv.utils.viz import get_color_pallete
import matplotlib.image as mpimg
mask = get_color_pallete(predict, 'pascal_voc')
mask.save('output.png')

##############################################################################
# show the predicted mask
mmask = mpimg.imread('output.png')
plt.imshow(mmask)
plt.show()

Hope it works for you. If you want to do it in batches, then take a look into original test.py file mentioned above.

Thanks,
That worked! The output mask is not consistent with the previous version though it is the correct dimensions. The performance is good running one of my Geforce GTX 1080Ti (I’ve yet to try using more than one)

Hi Sergey,

I am using the test.py for testing on cityscapes dataset. However, I got below issue:

multiprocessing.pool.RemoteTraceback:

Traceback (most recent call last):
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/mxnet/gluon/data/dataloader.py", line 400, in _worker_fn
    batch = batchify_fn([_worker_dataset[i] for i in samples])
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/mxnet/gluon/data/dataloader.py", line 400, in <listcomp>
    batch = batchify_fn([_worker_dataset[i] for i in samples])
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/gluoncv/data/cityscapes.py", line 52, in __getitem__
    img = self.transform(img)
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 540, in __call__
    out = self.forward(*args)
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/mxnet/gluon/nn/basic_layers.py", line 53, in forward
    x = block(x)
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 540, in __call__
    out = self.forward(*args)
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 921, in forward
    "Symbol or NDArray, but got %s"%type(x)
AssertionError: HybridBlock requires the first argument to forward be either Symbol or NDArray, but got <class 'PIL.Image.Image'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test.py", line 81, in <module>
    test(args)
  File "test.py", line 59, in test
    for i, (data, dsts) in enumerate(tbar):
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/tqdm/_tqdm.py", line 1005, in __iter__
    for obj in iterable:
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/site-packages/mxnet/gluon/data/dataloader.py", line 450, in __next__
    batch = pickle.loads(ret.get()) if self._dataset is None else ret.get()
  File "/home/v2m/anaconda3/envs/my_env3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
AssertionError: HybridBlock requires the first argument to forward be either Symbol or NDArray, but got <class 'PIL.Image.Image'>

Can you give me some recommendation?

Hard to say, without seeing actual code you are trying to use.
But based on the exception message, I would assume that you didn’t call the transformation on your dataset:

    input_transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([.485, .456, .406], [.229, .224, .225]),
])

The ToTensor() in the input_transform is converting an image to tensor by moving channel axis and dividing by 255. After that operation the data won’t be an image anymore.

Try to apply the transformation to your dataset and see if it works. It should look something like that, if you have followed the instructions to obtain dataset from here:

train_dataset = CitySegmentation(split='train')
transformed_train_dataset = train_dataset.transform_first(input_transform)

This is my source code for testing:

import os
from tqdm import tqdm
import numpy as np

import mxnet as mx
from mxnet import gluon
from mxnet.gluon.data.vision import transforms

import gluoncv
from gluoncv.model_zoo.segbase import *
from gluoncv.model_zoo import get_model
from gluoncv.data import get_segmentation_dataset, ms_batchify_fn
from gluoncv.utils.viz import get_color_pallete

from train import parse_args

def test(args):
    # output folder
    outdir = 'outdir'
    if not os.path.exists(outdir):
        os.makedirs(outdir)
    # image transform
    input_transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([.485, .456, .406], [.229, .224, .225]),
    ])
    # dataset and dataloader
    if args.eval:
        testset = get_segmentation_dataset(
            args.dataset, split='val', mode='testval', transform=input_transform)
        total_inter, total_union, total_correct, total_label = \
            np.int64(0), np.int64(0), np.int64(0), np.int64(0)
    else:
        testset = get_segmentation_dataset(
            args.dataset, split='test', mode='test', transform=input_transform)
    test_data = gluon.data.DataLoader(
        testset, args.test_batch_size, shuffle=False, last_batch='keep',
        batchify_fn=ms_batchify_fn, num_workers=args.workers, thread_pool=True)
    # create network
    if args.model_zoo is not None:
        model = get_model(args.model_zoo, pretrained=True)
    else:
        model = get_segmentation_model(model=args.model, dataset=args.dataset, ctx=args.ctx,
                                       backbone=args.backbone, norm_layer=args.norm_layer,
                                       norm_kwargs=args.norm_kwargs, aux=args.aux,
                                       base_size=args.base_size, crop_size=args.crop_size)
        # load pretrained weight
        assert args.resume is not None, '=> Please provide the checkpoint using --resume'
        if os.path.isfile(args.resume):
            model.load_parameters(args.resume, ctx=args.ctx)
        else:
            raise RuntimeError("=> no checkpoint found at '{}'" \
                .format(args.resume))
    print(model)
    evaluator = MultiEvalModel(model, testset.num_class, ctx_list=args.ctx)
    metric = gluoncv.utils.metrics.SegmentationMetric(testset.num_class)

    tbar = tqdm(test_data)
    for i, (data, dsts) in enumerate(tbar):
        if args.eval:
            predicts = [pred[0] for pred in evaluator.parallel_forward(data)]
            targets = [target.as_in_context(predicts[0].context) \
                       for target in dsts]
            metric.update(targets, predicts)
            pixAcc, mIoU = metric.get()
            tbar.set_description( 'pixAcc: %.4f, mIoU: %.4f' % (pixAcc, mIoU))
        else:
            im_paths = dsts
            predicts = evaluator.parallel_forward(data)
            for predict, impath in zip(predicts, im_paths):
                predict = mx.nd.squeeze(mx.nd.argmax(predict[0], 1)).asnumpy() + \
                    testset.pred_offset
                mask = get_color_pallete(predict, args.dataset)
                outname = os.path.splitext(impath)[0] + '.png'
                mask.save(os.path.join(outdir, outname))

if __name__ == "__main__":
    args = parse_args()
    args.test_batch_size = args.ngpus
    print('Testing model: ', args.resume)
    test(args)

Command line for run the test is

CUDA_VISIBLE_DEVICES=0,1 python test.py --dataset citys --model psp --backbone resnet101 --syncbn --lr 0.01 --ngpus 2 --base-size 2048 --test-batch-size 4 --checkname res101 --resume runs/citys/psp/res101/checkpoint.params

–resume parameter is my trained model.

In my through, I already transform for my dataset. Here is my test set. However, it still doesn’t work but it works as well with PASCAL VOC2012. So, I think the method def __getitem__(self, index): is missing below line of code, but I’m not sure.

img = self._img_transform(img)

In addition, is there any way to use test dataset by CitySegmentation class? Because it looks below code does not work.

test_dataset = CitySegmentation(split = 'test')

Thank you!