Hi, I’m training a gluon resnet in the AWS SageMaker MXNet 1.3 container. At some point I’m using some augmentation via the mxnet.io.ImageRecordIter
https://mxnet.incubator.apache.org/api/python/io/io.html.
When using this, things train correctly:
def get_data(path, augment, num_cpus, batch_size, data_shape, resize=-1, num_parts=1, part_index=0):
return mx.io.ImageRecordIter(
path_imgrec=path,
resize=resize,
data_shape=data_shape,
batch_size=batch_size,
rand_crop=augment,
#random_resized_crop=augment,
#max_rotate_angle=25,
#max_aspect_ratio=0.2,
#max_shear_ratio=0.2,
#brightness=0.2,
#contrast=0.2,
#saturation=0.2,
#pca_noise=0.2,
rand_mirror=augment,
preprocess_threads=num_cpus,
num_parts=num_parts,
part_index=part_index)
When using the thing below (couple extra augmentations), the whole thing errors, logging a
def get_data(path, augment, num_cpus, batch_size, data_shape, resize=-1, num_parts=1, part_index=0):
return mx.io.ImageRecordIter(
path_imgrec=path,
resize=resize,
data_shape=data_shape,
batch_size=batch_size,
rand_crop=augment,
random_resized_crop=augment,
max_rotate_angle=25,
max_aspect_ratio=0.2,
max_shear_ratio=0.2,
brightness=0.2,
contrast=0.2,
saturation=0.2,
pca_noise=0.2,
rand_mirror=augment,
preprocess_threads=num_cpus,
num_parts=num_parts,
part_index=part_index)
the error is
terminate called recursively
terminate called after throwing an instance of 'dmlc::Error'
Let me know if the question is more appropriate for AWS. Cheers