Efficient way of saving gluon.Trainer?

jdchoi77 · December 1, 2017, 6:42am

What is the best way of saving gluon.Trainer? I need to update a pre-trained model by training it with a new dataset; however, if I recreate the Trainer, it would start with the initial learning rate, which can be confusing for optimizers such as AdaGrad that adjusts the learning rate with respect to the frequently occurring features. I could not find a method such as save_params for Trainer so please let me know if there is an easy way of saving it. Thanks!

feevos · January 5, 2018, 11:04am

From the documentation:

gluon.Trainer

you can use save_states(fname) to save your trainer parameters, and then load_states(fname) to load it in the previous configuration.

e.g.

trainer = gluon.Trainer(mynet.collect_params(),'adam',{'learning_rate':lr})

flname  = r'trainer_adam.states'
trainer.save_states(flname)

then restore

trainer.load_states(flname)

edit: The save command works, but when trying to restore trainer with load_states I get an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-a93de5be24c4> in <module>()
      1 with autograd.record():
----> 2     trainer.load_states(r'../saved_models/resunet-trainer-epoch-41-stats.states')

/home/foivos/mxnet/gluon/trainer.pyc in load_states(self, fname)
    224             Path to input states file.
    225         """
--> 226         if self._update_on_kvstore:
    227             self._kvstore.load_optimizer_states(fname)
    228             self._optimizer = self._kvstore._updater.optimizer

AttributeError: 'Trainer' object has no attribute '_update_on_kvstore'

edit 2: Whithout being able to understand completely what is going on (am learning Gluon/mxnet these days), it seems you need to call at least once the step operation in order to “create” (?) the attribute '_update_on_kvstore'. When I perform at least one trainer.step(Nbatch) loading states works normally. You need to perform a trainer.step(Nbatch) operation also before saving states for the first time.

edit 3: Updated to the latest version of mxnet v1.1.0, now there is no problem in loading previously saved states directly (the error I described above does not appear). With delayed initialization you need to make a single forward pass before updating the parameters, so the optimizer knows the correct dimensions of the layers (I got an error loading trainer.load_states('some_flname.states') without running a single forward pass). I think it relates to this issue

Topic		Replies	Views
Load checkpoint and train Gluon	1	1273	July 19, 2019
Saving and loading bert model Gluon	6	3102	September 12, 2021
Continue training from the checkpoint saved via export Gluon	0	286	October 14, 2020
Loading from saved params - but "params not initialized" error Gluon	2	1787	July 27, 2018
There are some question during the training process Discussion	1	458	June 1, 2018

Efficient way of saving gluon.Trainer?

Related Topics