According to published paper, it seems that nadam method is the best method for optimize, but when I tried nadam optimizer with the mnist example, wired things occured, accuracy
could decend quickly and finally dropped to 0.2. I wonder that why optimizer return an even worse result? Could such thing be normal?
Here are test results
using ctx=cpu() optimizer=‘nadam’
INFO:root:Epoch[0] Batch [100] Speed: 22101.11 samples/sec accuracy=0.249307
INFO:root:Epoch[0] Batch [200] Speed: 27835.34 samples/sec accuracy=0.244600
INFO:root:Epoch[0] Batch [300] Speed: 27830.02 samples/sec accuracy=0.272900
INFO:root:Epoch[0] Batch [400] Speed: 27835.30 samples/sec accuracy=0.254700
INFO:root:Epoch[0] Batch [500] Speed: 26670.45 samples/sec accuracy=0.199500
INFO:root:Epoch[0] Train-accuracy=0.194444
INFO:root:Epoch[0] Time cost=2.359
INFO:root:Epoch[0] Validation-accuracy=0.208100
INFO:root:Epoch[1] Batch [100] Speed: 21338.35 samples/sec accuracy=0.190990
INFO:root:Epoch[1] Batch [200] Speed: 7712.44 samples/sec accuracy=0.191300
INFO:root:Epoch[1] Batch [300] Speed: 6215.06 samples/sec accuracy=0.208300
INFO:root:Epoch[1] Batch [400] Speed: 6810.12 samples/sec accuracy=0.199500
INFO:root:Epoch[1] Batch [500] Speed: 6810.13 samples/sec accuracy=0.202300
INFO:root:Epoch[1] Train-accuracy=0.194747
INFO:root:Epoch[1] Time cost=7.764
INFO:root:Epoch[1] Validation-accuracy=0.211700
INFO:root:Epoch[2] Batch [100] Speed: 6883.36 samples/sec accuracy=0.194455
INFO:root:Epoch[2] Batch [200] Speed: 6809.94 samples/sec accuracy=0.195800
INFO:root:Epoch[2] Batch [300] Speed: 6668.24 samples/sec accuracy=0.201100
INFO:root:Epoch[2] Batch [400] Speed: 6668.24 samples/sec accuracy=0.194900
INFO:root:Epoch[2] Batch [500] Speed: 6810.11 samples/sec accuracy=0.197300
INFO:root:Epoch[2] Train-accuracy=0.192727
INFO:root:Epoch[2] Time cost=8.842
INFO:root:Epoch[2] Validation-accuracy=0.208300
INFO:root:Epoch[3] Batch [100] Speed: 6809.95 samples/sec accuracy=0.191485
INFO:root:Epoch[3] Batch [200] Speed: 6662.77 samples/sec accuracy=0.194300
INFO:root:Epoch[3] Batch [300] Speed: 6810.12 samples/sec accuracy=0.207700
INFO:root:Epoch[3] Batch [400] Speed: 6810.28 samples/sec accuracy=0.182800
INFO:root:Epoch[3] Batch [500] Speed: 6810.12 samples/sec accuracy=0.195300
INFO:root:Epoch[3] Train-accuracy=0.193131
INFO:root:Epoch[3] Time cost=8.827
INFO:root:Epoch[3] Validation-accuracy=0.209400
INFO:root:Epoch[4] Batch [100] Speed: 6883.33 samples/sec accuracy=0.198119
INFO:root:Epoch[4] Batch [200] Speed: 6810.21 samples/sec accuracy=0.198100
INFO:root:Epoch[4] Batch [300] Speed: 6809.96 samples/sec accuracy=0.202300
INFO:root:Epoch[4] Batch [400] Speed: 6738.59 samples/sec accuracy=0.201000
INFO:root:Epoch[4] Batch [500] Speed: 6809.96 samples/sec accuracy=0.197300
INFO:root:Epoch[4] Train-accuracy=0.194747
INFO:root:Epoch[4] Time cost=8.826
INFO:root:Epoch[4] Validation-accuracy=0.211400
INFO:root:Epoch[5] Batch [100] Speed: 6738.41 samples/sec accuracy=0.203168
INFO:root:Epoch[5] Batch [200] Speed: 6688.91 samples/sec accuracy=0.196500
INFO:root:Epoch[5] Batch [300] Speed: 6738.44 samples/sec accuracy=0.203700
INFO:root:Epoch[5] Batch [400] Speed: 6810.14 samples/sec accuracy=0.201000
INFO:root:Epoch[5] Batch [500] Speed: 6810.12 samples/sec accuracy=0.197100
INFO:root:Epoch[5] Train-accuracy=0.194646
INFO:root:Epoch[5] Time cost=8.868
INFO:root:Epoch[5] Validation-accuracy=0.190300
INFO:root:Epoch[6] Batch [100] Speed: 6668.08 samples/sec accuracy=0.202772
INFO:root:Epoch[6] Batch [200] Speed: 6810.12 samples/sec accuracy=0.201900
INFO:root:Epoch[6] Batch [300] Speed: 6810.11 samples/sec accuracy=0.201300
INFO:root:Epoch[6] Batch [400] Speed: 6810.13 samples/sec accuracy=0.203800
INFO:root:Epoch[6] Batch [500] Speed: 6810.12 samples/sec accuracy=0.199600
INFO:root:Epoch[6] Train-accuracy=0.188485
INFO:root:Epoch[6] Time cost=8.826
INFO:root:Epoch[6] Validation-accuracy=0.191500
INFO:root:Epoch[7] Batch [100] Speed: 6809.97 samples/sec accuracy=0.192178
INFO:root:Epoch[7] Batch [200] Speed: 6810.14 samples/sec accuracy=0.195700
INFO:root:Epoch[7] Batch [300] Speed: 6810.12 samples/sec accuracy=0.192200
INFO:root:Epoch[7] Batch [400] Speed: 6521.27 samples/sec accuracy=0.198400
INFO:root:Epoch[7] Batch [500] Speed: 6599.68 samples/sec accuracy=0.191200
INFO:root:Epoch[7] Train-accuracy=0.194242
INFO:root:Epoch[7] Time cost=8.979
INFO:root:Epoch[7] Validation-accuracy=0.193200
INFO:root:Epoch[8] Batch [100] Speed: 6605.41 samples/sec accuracy=0.197525
INFO:root:Epoch[8] Batch [200] Speed: 6527.83 samples/sec accuracy=0.197900
INFO:root:Epoch[8] Batch [300] Speed: 6627.89 samples/sec accuracy=0.200800
INFO:root:Epoch[8] Batch [400] Speed: 6668.57 samples/sec accuracy=0.202500
INFO:root:Epoch[8] Batch [500] Speed: 6599.34 samples/sec accuracy=0.190000
INFO:root:Epoch[8] Train-accuracy=0.195051
INFO:root:Epoch[8] Time cost=9.073
INFO:root:Epoch[8] Validation-accuracy=0.208400
INFO:root:Epoch[9] Batch [100] Speed: 6532.14 samples/sec accuracy=0.200891
INFO:root:Epoch[9] Batch [200] Speed: 6466.19 samples/sec accuracy=0.200800
INFO:root:Epoch[9] Batch [300] Speed: 6599.30 samples/sec accuracy=0.199600
INFO:root:Epoch[9] Batch [400] Speed: 6738.63 samples/sec accuracy=0.205600
INFO:root:Epoch[9] Batch [500] Speed: 6338.11 samples/sec accuracy=0.198600
INFO:root:Epoch[9] Train-accuracy=0.197071
INFO:root:Epoch[9] Time cost=9.123
INFO:root:Epoch[9] Validation-accuracy=0.209500
using ctx=gpu() optimizer=‘nadam’
INFO:root:Epoch[0] Batch [100] Speed: 14225.57 samples/sec accuracy=0.296238
INFO:root:Epoch[0] Batch [200] Speed: 14548.91 samples/sec accuracy=0.412700
INFO:root:Epoch[0] Batch [300] Speed: 14548.88 samples/sec accuracy=0.434700
INFO:root:Epoch[0] Batch [400] Speed: 14548.88 samples/sec accuracy=0.423600
INFO:root:Epoch[0] Batch [500] Speed: 14548.89 samples/sec accuracy=0.423300
INFO:root:Epoch[0] Train-accuracy=0.424747
INFO:root:Epoch[0] Time cost=4.140
INFO:root:Epoch[0] Validation-accuracy=0.418100
INFO:root:Epoch[1] Batch [100] Speed: 14548.89 samples/sec accuracy=0.432772
INFO:root:Epoch[1] Batch [200] Speed: 14548.89 samples/sec accuracy=0.425300
INFO:root:Epoch[1] Batch [300] Speed: 14225.57 samples/sec accuracy=0.383200
INFO:root:Epoch[1] Batch [400] Speed: 14548.90 samples/sec accuracy=0.319800
INFO:root:Epoch[1] Batch [500] Speed: 14225.58 samples/sec accuracy=0.299300
INFO:root:Epoch[1] Train-accuracy=0.300707
INFO:root:Epoch[1] Time cost=4.155
INFO:root:Epoch[1] Validation-accuracy=0.271500
INFO:root:Epoch[2] Batch [100] Speed: 14548.91 samples/sec accuracy=0.275149
INFO:root:Epoch[2] Batch [200] Speed: 14548.89 samples/sec accuracy=0.325900
INFO:root:Epoch[2] Batch [300] Speed: 14548.89 samples/sec accuracy=0.357300
INFO:root:Epoch[2] Batch [400] Speed: 14548.91 samples/sec accuracy=0.347500
INFO:root:Epoch[2] Batch [500] Speed: 14225.57 samples/sec accuracy=0.361400
INFO:root:Epoch[2] Train-accuracy=0.342626
INFO:root:Epoch[2] Time cost=4.140
INFO:root:Epoch[2] Validation-accuracy=0.354200
INFO:root:Epoch[3] Batch [100] Speed: 14548.91 samples/sec accuracy=0.357426
INFO:root:Epoch[3] Batch [200] Speed: 14548.88 samples/sec accuracy=0.202400
INFO:root:Epoch[3] Batch [300] Speed: 14548.88 samples/sec accuracy=0.206400
INFO:root:Epoch[3] Batch [400] Speed: 14548.89 samples/sec accuracy=0.199300
INFO:root:Epoch[3] Batch [500] Speed: 14548.92 samples/sec accuracy=0.192800
INFO:root:Epoch[3] Train-accuracy=0.194141
INFO:root:Epoch[3] Time cost=4.124
INFO:root:Epoch[3] Validation-accuracy=0.193400
INFO:root:Epoch[4] Batch [100] Speed: 14548.91 samples/sec accuracy=0.192574
INFO:root:Epoch[4] Batch [200] Speed: 13916.31 samples/sec accuracy=0.192300
INFO:root:Epoch[4] Batch [300] Speed: 14548.87 samples/sec accuracy=0.206800
INFO:root:Epoch[4] Batch [400] Speed: 14548.91 samples/sec accuracy=0.203000
INFO:root:Epoch[4] Batch [500] Speed: 14225.58 samples/sec accuracy=0.200900
INFO:root:Epoch[4] Train-accuracy=0.194444
INFO:root:Epoch[4] Time cost=4.187
INFO:root:Epoch[4] Validation-accuracy=0.197000
INFO:root:Epoch[5] Batch [100] Speed: 14548.88 samples/sec accuracy=0.199604
INFO:root:Epoch[5] Batch [200] Speed: 14225.58 samples/sec accuracy=0.195100
INFO:root:Epoch[5] Batch [300] Speed: 14225.59 samples/sec accuracy=0.202500
INFO:root:Epoch[5] Batch [400] Speed: 14548.88 samples/sec accuracy=0.204800
INFO:root:Epoch[5] Batch [500] Speed: 14548.89 samples/sec accuracy=0.202400
INFO:root:Epoch[5] Train-accuracy=0.197071
INFO:root:Epoch[5] Time cost=4.155
INFO:root:Epoch[5] Validation-accuracy=0.206700
INFO:root:Epoch[6] Batch [100] Speed: 14225.60 samples/sec accuracy=0.196733
INFO:root:Epoch[6] Batch [200] Speed: 14225.57 samples/sec accuracy=0.195900
INFO:root:Epoch[6] Batch [300] Speed: 14548.88 samples/sec accuracy=0.205500
INFO:root:Epoch[6] Batch [400] Speed: 14548.91 samples/sec accuracy=0.206000
INFO:root:Epoch[6] Batch [500] Speed: 14225.58 samples/sec accuracy=0.204300
INFO:root:Epoch[6] Train-accuracy=0.194444
INFO:root:Epoch[6] Time cost=4.171
INFO:root:Epoch[6] Validation-accuracy=0.208300
INFO:root:Epoch[7] Batch [100] Speed: 14225.60 samples/sec accuracy=0.202772
INFO:root:Epoch[7] Batch [200] Speed: 13620.24 samples/sec accuracy=0.201000
INFO:root:Epoch[7] Batch [300] Speed: 14225.58 samples/sec accuracy=0.207600
INFO:root:Epoch[7] Batch [400] Speed: 13916.32 samples/sec accuracy=0.205500
INFO:root:Epoch[7] Batch [500] Speed: 14225.58 samples/sec accuracy=0.200200
INFO:root:Epoch[7] Train-accuracy=0.192323
INFO:root:Epoch[7] Time cost=4.249
INFO:root:Epoch[7] Validation-accuracy=0.191600
INFO:root:Epoch[8] Batch [100] Speed: 14548.90 samples/sec accuracy=0.198713
INFO:root:Epoch[8] Batch [200] Speed: 14225.56 samples/sec accuracy=0.196500
INFO:root:Epoch[8] Batch [300] Speed: 13916.36 samples/sec accuracy=0.204200
INFO:root:Epoch[8] Batch [400] Speed: 14548.87 samples/sec accuracy=0.201800
INFO:root:Epoch[8] Batch [500] Speed: 14225.58 samples/sec accuracy=0.197700
INFO:root:Epoch[8] Train-accuracy=0.194949
INFO:root:Epoch[8] Time cost=4.202
INFO:root:Epoch[8] Validation-accuracy=0.208800
INFO:root:Epoch[9] Batch [100] Speed: 14548.89 samples/sec accuracy=0.202376
INFO:root:Epoch[9] Batch [200] Speed: 14225.59 samples/sec accuracy=0.201400
INFO:root:Epoch[9] Batch [300] Speed: 14548.89 samples/sec accuracy=0.205800
INFO:root:Epoch[9] Batch [400] Speed: 14225.57 samples/sec accuracy=0.203800
INFO:root:Epoch[9] Batch [500] Speed: 14548.89 samples/sec accuracy=0.200600
INFO:root:Epoch[9] Train-accuracy=0.193939
INFO:root:Epoch[9] Time cost=4.171
INFO:root:Epoch[9] Validation-accuracy=0.206100
several days ago I found that thing, but I recognized such thing as a incorrect learning rate, in this example, just delete learning rate, and things will be better.
but now, such thing coming back.
with batch_size=10000,ctx=mx.gpu(),optimizer=‘nadam’,using default learning rate
......
INFO:root:Epoch[25] Validation-accuracy=0.988000
INFO:root:Epoch[26] Train-accuracy=0.991017
INFO:root:Epoch[26] Time cost=0.984
INFO:root:Epoch[26] Validation-accuracy=0.986100
INFO:root:Epoch[27] Train-accuracy=0.987683
INFO:root:Epoch[27] Time cost=0.969
INFO:root:Epoch[27] Validation-accuracy=0.976800
INFO:root:Epoch[28] Train-accuracy=0.786283
INFO:root:Epoch[28] Time cost=0.984
INFO:root:Epoch[28] Validation-accuracy=0.106700
INFO:root:Epoch[29] Train-accuracy=0.103833
INFO:root:Epoch[29] Time cost=0.984
INFO:root:Epoch[29] Validation-accuracy=0.089200
INFO:root:Epoch[30] Train-accuracy=0.097083
INFO:root:Epoch[30] Time cost=0.984
INFO:root:Epoch[30] Validation-accuracy=0.100900
INFO:root:Epoch[31] Train-accuracy=0.102583
INFO:root:Epoch[31] Time cost=0.984
......