Documentation Request: Model Parallelism Tutorial

sumneuron · November 16, 2017, 4:08pm

Note: I have been to and read

which are the two tutorials about model parallelism.

They link to this git repo:

github.com/apache/incubator-mxnet/tree/master/example/image-classification

which has some files allowing run to train on cifar10 and mnist datasets using a lot of nifty arguments.

However, this code is written in a very esoteric way (from the perspective of someone new to MXNet).

So my question is or topic is basically a tutorial for using multiple gpus.
Something simple like initializing a simple network with gluon and then training it on multiple gpus.

Also explaining what is and how to use

module = mx.module.Module(context=[mx.gpu(0), mx.gpu(2)], ...)

would be nice as the current isolated documentation does not make it very clear (again, from the point of a newcomer, perhaps it makes more sense for those who are more experienced with MXNet).

I appreciate your assistance and clarification in advance.

eric-haibin-lin · November 16, 2017, 10:05pm

Hi,

There’re some well written tutorials for multi-GPU training here: http://gluon.mxnet.io/chapter07_distributed-learning/multiple-gpus-scratch.html
Maybe you want to also read the first few chapters to get the basics of the Gluon API.

Jerry · November 17, 2017, 3:25am

I think having documentation reduce the friction for experienced user. Reading docs is usually faster than reading a tutorial.

sumneuron · November 17, 2017, 10:13am

It is always a trade off. Great documentation will be faster for those who are familiar with the library and api as reference. However, for those starting to learn the library (which is currently in transition from symbol to gluon), the current documentation is not yet accessible. So both are needed. Perhaps the best example of a well documented language is Mathematica which has extensive documentation with a variety of depths as well as examples.

sumneuron · November 17, 2017, 10:15am

Thank you for sending me that link. I did not see it prior. I have read the previous gluon tutorials, but even after reading and implementing those (as well as other tests that I have tried), I can not confidently use the documentation or be sure if I am implementing something in the “proper” way.

feevos · February 14, 2018, 12:59pm

+1 vote for model parallelism tutorial. It is really important and currently not covered extensively. E.g. in medical segmentation tasks where one needs to tackle 3D convolution problems and memory bottleneck is a big problem.

eric-haibin-lin · March 10, 2018, 2:52pm

There is a model parallel example using Module API… https://github.com/apache/incubator-mxnet/tree/master/example/model-parallel/matrix_factorization I do agree that a tutorial will be much better…

Topic		Replies	Views
The Gluon API framework mp Gluon	3	516	May 14, 2018
How to train a model written in mxnet/gluon on multiple workstations？ Gluon	1	433	September 19, 2018
MXNet digest: June 2018 Discussion	1	434	September 19, 2018
Single-machine multi-GPU training, time is not speeding up Gluon	5	2163	November 16, 2018
Understanding MXNet multi-gpu performance Performance	7	1842	November 5, 2018

Documentation Request: Model Parallelism Tutorial

Related Topics