How do I train mxnet with parquet files?
I have the training data stored in a bunch of parquet files(hundreds) and they cannot fit in memory (2TB+). Until now we have been able to not deal with the issue, because we could handle the training data in memory (we ran on 728GB memory sagemaker instances, but that is no longer sufficient)
We have been looking a long time for solutions, but nothing seems to be working. We are considering switching to PyTorch as that can handle a petastorm reader, which should work with parquet files. However, we feel like there has to be some solution we are not seeing.