Batch Norm And Batch Size 1 Recommendations

Hi,

I’ve been trying to train a model using ResnetV2 from the model zoo. Because I am generating training data on the fly, I have been training with a batch size = 1.

I noticed a weird behavior when running my network for inference: I would get more accurate results using with ag.record() rather than with ag.predict_mode().

According to this:

Warning: the estimates for the batch mean and variance can themselves have high variance when the batch size is small (or when the spatial dimensions of samples are small). This can lead to instability during training, and unreliable estimates for the global statistics.

How should I approach BatchNorm usage? Is there any danger in using ag.record() for inference?Should I make a custom ResnetV2 model with no BatchNorm layers?