Mxnet model to train multiple different classes in a single annotated image

Hello intelligent people. Am seeking for an answer to my question. I’d like to know if there is a mxnet model that can train on a type of dataset that has multiple different classes annotated in a single image. It’s like for i.e. an annotated image that has 5 bboxes of different classes, let’s say classes are named as ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’. Is there actually a type of mxnet model that can handle this type of dataset? Could you please add some link to that model so that I could check it and see if I can start from there.

Thanks in advance for your answer.