Is Gluon going to replace Symbol?

Hello all! I am working on a machine learning project using MXNet, and I am currently using Symbol-based models from the MXNet Model Zoo. I’m wondering if I should switch over to Gluon models. I know Gluon is an easier interface, but will it be replacing the Symbol API in the future? I ask because it seems that the Gluon forward pass was slower for me than Symbol, so I would prefer to stick with Symbol for performance, but I also don’t want to be stuck with a legacy platform.

Thanks!

Hi @qheaden, Gluon API would be the recommended route if you’re getting started now.

Just to clarify, I think you’re actually deciding between Gluon API and the Module API. Under the hood Gluon still uses the NDArray API and Symbol API (depending on whether your hybridize your model).

With regards to performance in Gluon, the key is hybridization. When you first create a model with Gluon you’ll be using the NDArray API under the hood and you’ll get more flexibility and easier debugging at the expense of performance, which is what you’re seeing. But when you’re finished creating your network (e.g. net), you can call net.hybridize() to use Symbol API under the hood and get improved performance, similar to what you’d see when using Module API with Symbols.

Check out this tutorial, and you can find the Gluon Model Zoo here.

2 Likes

Thanks @thomelane! I think I know what my performance issue was with Gluon. I was calling hybridize(), but I didn’t realize that it builds the graph after the first forward pass. I was testing my model with a one-shot python script that would build everything, run the model, print the results, then exit. So it never got a chance to use the cache. In reality, I am going to build an API around this, so the cache would definitely be used.

I already have some models trained using the Symbol-based Model Zoo. Is there an easy way to load Symbol-based models using Gluon?

Thanks again.

I’m facing the similar problem.

For me, I feel use symbol/ndarray more intuitive and help me to understand ML/MxNet better. But it’s really hard to debug pure symbol codes. So I think maybe using Gluon and hybridize() is the correct way to go in the future.

Yes, you can integrate an existing symbol into Gluon. Check out SymbolBlock, as this can be used to wrap a Symbol in a Block (which are the building ‘blocks’ of Gluon).

Gluon’s definitely the way to go if you’re looking for improved debugging. Working with NDArrays (instead of Symbols) allows you to step through code and inspect values and shapes easily. Check out this video for an example of Gluon debugging.

I understand that Gluon is much more flexible and more user friendly and you can even speed up by hybridizing the model, and after hybridizing your model would be almost 2x faster than before.

But the problem is that the symbolic graphs using symbol api is even 30% more faster than hybridized Gluon model, so why should we indulge in something which is slower? Moreover it is easy to make your own custom computation and backpropagate through it using symbol api, while in Gluon you have to make our own class using “block” or “hybrid block”.

I think the only use case of gluon is to use its imperative and flexible nature, through which we can do debugging for your model by changing the computation graph (which is not possible with symbol api) and then finally build our model using symbol api.

So its like using gluon for research and learning stuff, and symbol api for production.

PS: Plz correct me if I am wrong anywhere, I am just 18 years old and just started deep learning. I started using Mxnet when I found that doing research work in tensorflow is very difficult.

With the hybridize(static_shape=True, static_alloc=True) the difference in speed between symbolic and gluon is ~<1-2%
I agree that we can could have a functional way of building network rather than wiring them ourselves in a custom block, that’s a good feature request for Gluon.

-------------------
Edit: don’t read this. This comparison of gluon and symbol using following codes are wrong, because in both codes the cuda very is different and the number of GPUs also different.
-------------------

After checking out this repository I am a bit confused, check this out.
It’s the cnn implemented using gluon api:

Which takes 37 secs to train

And below is the same model but build using symbol api:

Which takes 48 secs to train

Holy smokes, why and how the hell gluon is faster than smbolic api?? Cause in all my personal tests I found symbolic api almost 30% faster than gluon’s hybridized model.
Here’s the link for that test:

What am I missing?
Plz help me what’s going on.

Just tried this out myself and I had a similar finding using a p3.2xlarge AWS EC2 instance.

Gluon API: 33s
Module API: 40s
Result: Gluon 17.5% faster than Module

I installed the nightly build of MXNet (e.g. pip install --pre --upgrade mxnet-cu90), and changed the hybridization in Gluon to use static_shape=True, static_alloc=True as mentioned by @ThomasDelteil.

One reason for Gluon being quicker than Module here is the lack of blocking operations in the Gluon training loop for each batch (e.g. loss not printed every batch). You can get a speedup from this, but it could cause out of memory issues if your data is reasonably large. So essentially the Gluon code here can overlap the processing of batches to fully use the compute available.

Adding a blocking operation at the end of the batch loop (e.g. loss.asnumpy()) evens things out a lot.

Gluon API: 39s
Module API: 40s
Result: Gluon 2.5% faster than Module

One more thing I’d like to point out that gluon code is also not calculating accuracy at every iteration unlike in symbolic code, that’s also giving it a lot speed

I just tried with Gluon after adding accuracy calculation for a totally fair test.

I still get an equivalence in speed between Gluon API and Symbol API. And there’s essentially no change in time because the computation required for calculating accuracy is very minimal.

1 Like

Is there a Gluon API for scala?

Hi @adwivedi, it’s just a Python API for now.

It’s now available for Perl though.

1 Like