hi,
i’m trying to train a persian g2p model.
i have a csv file containing the data which is look like word,pos,phoneset
now, i read it with python’s csv module and try to encode it as a one-hot representation
but, i got the error saying “ValueError: Setting an array element with a sequence”
i know my arrays are with different size, but i am confused on how can i fix that error
this is my preprocessing and training code:
import csv
from seq2seq import *
from scikit_learn import *
import numpy as np
import mxnet as mx
from mxnet import autograd, gluon, nd
print("preprocessing data...")
x, y = [], []
# this function converts each charactor to it's ascii and returns an nd.array
def convert_ascii(t):
return nd.array([ord(c) for c in t])
with open("l1.csv", "r") as f:
s = 0
r = csv.reader(f)
for row in r:
a = [nd.one_hot(convert_ascii(row[0]), depth=32), nd.one_hot(convert_ascii(row[1]), depth = 10)]
b = nd.one_hot(convert_ascii(row[2]), depth = 32)
x.append(a)
y.append(b)
s += 1
x = np.array(x, dtype = np.float32)
y = np.array(y, dtype = np.float32)
net = seq2seq(s, x.size, 5000000, 5000000)
# train
print("training the data...")
clf = GluonClassifier(model = net, loss_function = gluon.loss.SoftmaxCrossEntropyLoss, init_function = mxnet.initializer.Xavier, batch_size = 256, epochs = 1000000, verbose = True)
clf.fit(x, y)
the GluonClassifier is my scikit-learn like wrapper to gluon, and seq2seq is a lstm sequence2sequence model
thanks in advance.