I am one of the developers of PyMC3, a package for bayesian statistics. At the moment we use Theano as backend, but as you might have heard development of Theano is about to stop. Currently, we are looking at TensorFlow, MXNet and PyTorch as possible replacements.
Our use case for MxNet would be different to most deep learning applications in some ways:
- We do not build models ourselves, but provide building blocks and inference algorithms for users. Users would use both our library and mxnet in their models. Not all our users come from a computer science background, but from many different fields in science and business.
- Many models in PyMC3 are much smaller than the typical neural net, but in many cases we require a larger number of function and gradient evaluations. Even relatively small overhead on the python side or inside the computational graph can add up quickly.
- Custom Ops tend to be relatively common, e.g. for calling into external code or to minimise functions
Do you think MxNet could be a good fit for this use case?
While prototyping we ran into a couple of issues (in no particular order):
- It is important that users get good error messages for misspecified models, that point to the correct source of the problem. If you define symbols with incompatible shapes and add them, you currently don’t get an exception until you call
bind
orinfer_shape
. (i.e.mx.sym.var(‘a’, shape=2) + mx.sym.var(‘b’, shape=3)
doesn’t raise an exception) - We couldn’t find a way to create a modified copy of a symbol. Theano allows us to replace inputs or parts of the graph by other subgraphs (the
replace
arg oftheano.clone
). This gives us a lot of flexibility to switch around inference algorithms for a given model. -
mx.sym
doesn’t have a concept of a scalar, which can lead to a lot of confusion in PyMC3 (and looking at the invalid memory access in #8133, for mxnet as well). This fact would be exposed to users, who usually are used to numpy. - There doesn’t seem to be support of constants. In large parts of our code base each value can be either a parameter or a constant depending on how it is used. We can’t easily represent constant values as symbols to make calls like
mx.sym.broadcast_add(a, b)
possible, independent of whethera
andb
are symbols, python scalars or numpy arrays. Similarly, we and our users can’t add or multiply symbols by numpy arrays. Would it be possible to implement something likemx.sym.constant
, that returns a symbol with auxiliary data, such that this auxiliary data has a default value? - Some special functions like
erf
or the incomplete gamma and beta functions are missing at the moment. A functionbroadcast_sum_n
would be very nice. - We can’t live without fancy indexing.
Do you think those could be solved or worked around in principle?