How to hash fast?

I have a class which has array instance inside it. It is array representing the state of the game and turn. I used to hash that class like this, when I used numpy:

def __hash__(self):
    return int.from_bytes(self.array.tobytes(), byteorder='big') + self.turn * array.size ** 2

Now I am trying to migrate to on GPU. Unfortunately, .tobytes() is not available, and if I extract numpy array first:

def __hash__(self):
    return int.from_bytes(self.array.asnumpy().tobytes(), byteorder='big') + self.turn * array.size ** 2

the hash function becomes soooo slow. I created array anp with numpy and amx with, and defined state_hash_function_np and state_hash_function_mx for numpy.ndarray and respectively. Then the mxnet function is 120x slower!!!

In [22]: %timeit state_hash_function_np(anp, 1)
394 ns ± 1.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [23]: %timeit state_hash_function_mx(amx, 1)
45.5 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I tried other variations, using flatten, sum and other ways to get number from array, but every realisation is worse than the one for numpy array.

How can I hash mxnet GPU arrays efficiently enough?