Custom HybridBlock Problem when not hybridizing with random_uniform

I wrote the following custom layer:

class HybridGumbleSoftmax(HybridBlock):

def __init__(self, temp=1):
    super(HybridGumbleSoftmax, self).__init__()
    self.temp = temp

def hybrid_forward(self, F, x):
    noise = F.random_uniform()
    # G = μ − log(− log(U ))
    noise = F.negative((noise.__add__(1e-10).log()))
    noise = F.negative((noise.__add__(1e-10).log()))
    gumble_trick_log_prob_samples = x + noise
    soft_samples = F.softmax(gumble_trick_log_prob_samples / self.temp, axis=1)
    return soft_samples

This does what I want if call hybridize() before training. However, I am unable to modify this to get a HybridBlock that can run in both symbolic and imperative mode. (I also have one that can do only imperative mode by using the ndarray API, but thats also not what I want). If I just run the code above without hybridize, the result is mxnet.base.MXNetError: [21:02:32] ../src/imperative/./imperative_utils.h:122: Check failed: infershape[attrs.op](attrs, &in_shapes, &out_shapes)
Now if I edit the call to add the shape parameter to the random uniform call:
F.random_uniform(shape=x.shape))
And try to run it on the gpu, I get an error in the add function:
mxnet.base.MXNetError: [21:11:58] ../src/imperative/./imperative_utils.h:70: Check failed: inputs[i]->ctx().dev_mask() == ctx.dev_mask() (1 vs. 2) Operator broadcast_add require all inputs live on the same context. But the first argument is on gpu(0) while the 2-th argument is on cpu(0)

Which really astonishes me. I can not just call __add__ with an hardcoded scalar? How else would I do that, as I cant hardcode the creation of an nd.array or of an symbol there, as I would lose the hybride property again :frowning:
But I guess the first error (infer shape) is more relevant here, as I cannot do x.shape with symbols anyway.

I actually know the exact shape of the noise that I need, but passing a tuple for the shape argument results in Deferred initialization failed because shape cannot be inferred. if I hybridize and again the different context error in the plus operator if I do not hybridize.

I am puzzled why mxnet can infer the shape with symbols, but not if I use ndarrays. Can I somehow specify shapes (similiar to custom operator property class) for hybrid blocks? Or is there something else wrong with the above code for imperative mode?

Please let me know if I did not provide necessary information like full stack traces. Any help is much appreciated.

Cheers,
Adrian

In the line where you call noise = F.random_uniform(), noise is always a single random value, which is definitely not what you want for proper random sampling :slight_smile:

In the symbolic case, it appears that a shape is inferred, but in fact what happens is when you call x + noise, noise is broadcast to shape of x.

The following simple changes will fix your problem:

        noise = F.random.uniform_like(x)
        # G = μ − log(− log(U ))
        noise = -((noise + 1e-10).log())
        noise = -((noise + 1e-10).log())

This means that your block would look like this:

class HybridGumbleSoftmax(gluon.HybridBlock):
    def __init__(self, temp=1):
        super(HybridGumbleSoftmax, self).__init__()
        self.temp = temp

    def hybrid_forward(self, F, x):
        noise = F.random.uniform_like(x)
        # G = μ − log(− log(U ))
        noise = -((noise + 1e-10).log())
        noise = -((noise + 1e-10).log())

        gumble_trick_log_prob_samples = x + noise
        soft_samples = F.softmax(gumble_trick_log_prob_samples / self.temp, axis=1)
        return soft_samples
1 Like

Thanks for your reply. It makes sense to me, and I was already looking for uniform_like before. But I cannot find it, and your code throws the expected error:
AttributeError: module 'mxnet.symbol.random' has no attribute 'uniform_like'
It is really weird that this function is mentioned in the documentation: Random Distribution Generator Symbol API — mxnet documentation
but as far as I can see, it is not implemented! https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/symbol/random.py I am on mxnet 1.3.0, but looking at the code on github I don’t think upgrading would solve my issue.

I know the shapes in my case, so I made a compromise to solve my problem. I now have the same noise for every image in the batch, which I think is fine for my case. I still couldnt find a way to generate the noise in respect to different batch sizes.

class HybridGumbleSoftmax(HybridBlock):

def __init__(self, temp=1):
    super(HybridGumbleSoftmax, self).__init__()
    self.temp = temp

def hybrid_forward(self, F, x):
    noise = F.random.uniform(shape=(1, 2, 1, 1))
    # G = μ − log(− log(U ))
    noise = -((noise + 1e-10).log())
    noise = -((noise + 1e-10).log())

    gumble_trick_log_prob_samples = F.broadcast_add(x, noise)
    soft_samples = F.softmax(gumble_trick_log_prob_samples / self.temp, axis=1)
    return soft_samples

It’s in 1.3.1 release. Just do a pip install with -U switch to update your mxnet installation.

The problem with your implementation is that you’re drawing only 2 random samples. Is your batch-size always 1? What about the last two axes?

I work on a bigger project where mxnet is built from source. Updating it might be possible, but I think its not needed for this usecase.

My batchsize varies, and I need 2 random samples for each image (last 2 axis always fixed in this part of the network). So with my current implementation as far as I understand it, I have the same noise (the same 2 random variables) for each image in my batch, instead of drawing independently for each image. It’s not optimal but it should be ok. If I realize that it hinders the training process, I will update to 1.3.1 and use uniform_like to get random variables for each image in the batch.

Thanks again for your quick help.

1 Like