Keras-MXNet optimizations?

I’m running a Keras LSTM-based sequence classifier for a recommender system, where the goal is to predict next item consumed, given a sequence of items.
Data is sequence of strings, that are pre-processed using sklearn preprocessing.LabelEncoder(), and to_categorical and sequence.pad_sequences from keras.
On a given architecture and optimizer setting, by just switching jupyter kernel in a sagemaker notebook, keras-MXNet is 25% faster than keras-TF in single GPU (1 V100 on P3.16xl) and 60% faster in multi GPU (8 V100 on P3.16xl) . Are there MXNet-specific optimizations that can be used to push keras-MXNet further? In particular, is it possible to use the following MXNet features in keras-MXNet?

  1. mixed precision?
  2. hybridization?
  3. multi-processed loaders?


Hi @olivcruche, thank you for trying out keras-mxnet, it’s nice to see the performance boost on your use case. Some context: Keras is a pure symbolic framework and we are using MXNet’s Symbol API under the hood. To address your questions:

  1. this is something we can look into supporting. MXNet’s Symbol API supports it:
  2. Hybridization is useful on models built using imperative API (Gluon), it speeds up the performance by leveraging the Symbol API. Keras-MXNet is already using Symbolic API and does not have a imperative interface. So we don’t need hybridization here. Refer to:
  3. I believe Keras is already using multi-processed loaders, this is the same regardless of the backend. Look for workers in fit_generator and sequence