Load data from csv file with pandas and feed to NN models

LewsTherin511 · December 21, 2020, 3:10pm

Hi there, I’m having a problem in loading correctly a .csv file to use as input for a very simple dense NN model.
The csv file contains all the input features and a ‘target’ column, to use as output for regression.

This is what I’m doing so far:

def main():

	batch_size = 500

	## load input file
	df_data = pd.read_csv('some_file.csv', index_col=0)
	## random train/test split
	df_train = df_data.sample(frac=0.8,random_state=200)
	df_test = df_data.drop(df_train.index)

    ## data pre-processing
	df_train.reset_index(drop=True, inplace=True)
	df_test.reset_index(drop=True, inplace=True)	
	y_train = df_train['target'].to_numpy(dtype=np.float64)
	y_test = df_test['target'].to_numpy(dtype=np.float64)
	X_train = df_train.drop(['target'], axis=1).to_numpy(dtype=np.float64)
	X_test = df_test.drop(['target'], axis=1).to_numpy(dtype=np.float64)


	dataset = mx.gluon.data.dataset.ArrayDataset(X_train, y_train)
	data_loader = mx.gluon.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)

	##   building model 
	model = nn.Sequential()
	model.add(nn.Dense(150))
	model.add(nn.Dense(1))
	model.initialize(init.Normal(sigma=0.01))

	## loss function (squared loss)
	loss = gloss.L2Loss()

	## optimization algorithm, specify:
	trainer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': 0.03})

	##   training   #
	num_epochs = 10
	for epoch in range(1, num_epochs + 1):
		for X_batch, Y_batch in data_loader:
			with autograd.record():
				l = loss(model(X_batch), Y_batch)
			l.backward()
			trainer.step(batch_size)
		# overall (entire dataset) loss after epoch
		l = loss(model(X_train), y_train)
		print(f'\nEpoch {epoch}, loss: {l.mean().asnumpy()}')

I was getting the error:

mxnet.base.MXNetError: [16:09:03] src/operator/numpy/linalg/./../../tensor/../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node  at 1-th input: expected float64, got float32

I tried switching the np.float64 to np.float32, but the I get:

File "/home/lews/anaconda3/envs/gluon/lib/python3.7/site-packages/mxnet/gluon/block.py", line 1136, in forward
raise ValueError('In HybridBlock, there must be one NDArray or one Symbol in the input.'
ValueError: In HybridBlock, there must be one NDArray or one Symbol in the input. Please check the type of the args.

What is the correct way to load this data?

murry01 · December 25, 2020, 6:44pm

Hi. Based on the little I know.
I think you should check the third line of your code again.

It should be:

df_data = pd.read_csv(‘some_file.csv’) not

df_data = pd.read_csv(‘some_file.csv’, index_col=0)
##

Kindly exclude index_col=0, in the code.

LewsTherin511 · December 29, 2020, 1:54pm

I fixed it by using

 ## data pre-processing
y_train = np.array(df_train['target'].to_numpy().reshape(-1,1), dtype=np.float32)
y_test = np.array(df_test['target'].to_numpy().reshape(-1,1), dtype=np.float32)
X_train = np.array(df_train.drop(['target'], axis=1).to_numpy(), dtype=np.float32)
X_test = np.array(df_test.drop(['target'], axis=1).to_numpy(), dtype=np.float32)

Topic		Replies	Views
RNN explanation and input data format Gluon	0	357	March 19, 2019
Loading sparse data into gluon's DataLoader? Gluon	2	515	December 1, 2019
My first neural network for classification in mxnet gluon, I don't understand what is the problem	1	703	July 23, 2019
Loading parameters and architecture Gluon	0	304	April 23, 2020
Loading model from .params and .json fails Gluon	14	4979	July 24, 2019

Load data from csv file with pandas and feed to NN models

Related Topics