I tried train my model using adam by modify the code as follow:
adam_optimizer = mx.optimizer.Adam()
optimizer_params = {‘wd’: 0.0,
‘learning_rate’: lr,
‘lr_scheduler’: lr_scheduler,
‘rescale_grad’: (1.0/batch_size)}
#train
mod.fit(train_data, eval_metric=eval_metrics, epoch_end_callback=epoch_end_callback,
batch_end_callback=batch_end_callback, kvstore=args.kvstore,
optimizer=adam_optimizer , optimizer_params=optimizer_params,
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch,
num_epoch=end_epoch)
but, the training loss at the beginning became very large (Train-RPNAcc = 0.843564, RPNLogLoss=1.250399, RPNL1Loss=30.644248, RCNNAcc=0.818452, RCNNLogLoss=2.148393, RCNNL1Loss=90.618619). If I change it back by using sgd, the loss value would became nomall (Train-RPNAcc = 0.904762, RPNLogLoss=0.297273, RPNL1Loss=1.345441, RCNNAcc=0.849330, RCNNLogLoss=0.461176, RCNNL1Loss=1.354734).
It seems that the pretrained model has not been load correctly when I using adam.
why?
thanks a lot!