Seq2seq Discussion

https://d2l.ai/chapter_recurrent-modern/seq2seq.html

shouldn’t the masked softmax loss output be
[2.30126,2.30126,0] ?
because first sequence has 4 elements each equals to 2.30126,
second sequence has 2 elements each equals to 2.30126,
dividing by their valid length means 2.301264/4=2.30126 and 2.301262/2=2.30126.
it seems to me it’s divided by 4 which includes the padding length, makes no sense!