I cont use gluon’s LR for I have a large data which must be loaded by dataiter of mxnet, but i don’t know if it has a L2 or L1 penalty function , how to deal with it?
@janelu9, can you precise why you cannot use Gluon for loading your data?
Are you aware of the Dataset
and DataLoader
classes that might help you with that? DataLoader
allows the use of multiple workers for asynchronously pre-fetching data effectively.
You can use the weight decay wd
parameter of the trainer, this wd
parameter is accepted by all optimizers. In most cases you can see it as L2 regularization, and it is precisely true (with a factor 2) when using SGD. More details here: https://bbabenko.github.io/weight-decay/
trainer = gluon.Trainer(net.collect_params(), 'sgd',
{'learning_rate': LEARNING_RATE,
'wd':WDECAY,
'momentum':MOMENTUM})