Is it possible to speed up fullyconnected calculation for sparse input?

tppppppppp · November 29, 2017, 3:55pm

Hi,
I’m training a neural network model with mxnet. The input is a sparse one-hot vector. It has dimension of 1 million, but only tens of them are 1. The input is fullyconnected to a hidden layer with 200 nodes. I found it is very slow when training. Is there any way to speed up the fullyconnected calculation as the input is pretty sparse?

eric-haibin-lin · November 29, 2017, 9:37pm

You can encode the data in CSR format and replace FC with sparse.dot and broadcast_add. You can see the linear classification example here: https://github.com/apache/incubator-mxnet/tree/master/example/sparse

tppppppppp · December 8, 2017, 3:00am

@eric-haibin-lin I tried encoding the data in CSR format and replacing ndarray.FullyConnected with nd.sparse.dot, but I found its efficiency became even worse. My code is as following:
original: time cost 0:00:00.365667
#out1 = mx.nd.FullyConnected(features, self.w1.data(ctx), self.b1.data(ctx), num_hidden=self.num_hidden) #act1 = mx.nd.Activation(out1, act_type=‘relu’)

new: time cost 0:00:00.495941
out1 = mx.nd.sparse.dot(features, self.w1.data(ctx))
act1 = mx.nd.broadcast_add(out1, self.b1.data(ctx))

where w1 is weight matrix, and b1 is bias matrix. Features is the input, which is a 200 X 1000000 matrix with about 2000 non-zero values. And I have encoded it in CSR format.

eric-haibin-lin · December 8, 2017, 7:40pm

Hi @tppppppppp
Did you call act1.wait_to_read() to make sure the operation is completed?
https://mxnet.incubator.apache.org/tutorials/basic/ndarray.html#lazy-evaluation-and-automatic-parallelization

tppppppppp · December 10, 2017, 9:04am

@eric-haibin-lin I called act1.asnumpy() instead.

eric-haibin-lin · December 13, 2017, 7:20pm

I am a bit confused - what is the shape of w1 you are using? FullyConnected is calculating feature dot w1_transpose, which is different from dot(feature,w1). Are you getting consistent result here?

The following code works for me:

import mxnet as mx
import scipy.sparse as spsp
csr = spsp.rand(200, 1000000, format='csr', density=0.00001)
x_sparse = mx.nd.sparse.csr_matrix(csr)
w = mx.nd.ones((1000000, 100))
import time
mx.nd.waitall();
a = time.time(); 
y = mx.nd.sparse.dot(x_sparse, w); 
y.wait_to_read(); 
b = time.time(); 
print(b-a);
# 0.00143098831177

w_t = w.T
x_dense = x_sparse.tostype('default')
mx.nd.waitall();
c = time.time(); 
y2 = mx.nd.FullyConnected(x_dense, w_t, no_bias=True, num_hidden=100);
y2.wait_to_read(); 
d = time.time(); 
print(d - c);
# 0.451608896255

BenBBear · June 4, 2018, 2:35pm

but have you tried it with gpu

BenBBear · June 4, 2018, 3:14pm

import mxnet as mx
import scipy.sparse as spsp


csr = spsp.rand(200, 1000000, format='csr', density=0.00001).astype('float32')
x_sparse = mx.nd.sparse.csr_matrix(csr).as_in_context(mx.gpu())
w = mx.nd.ones((1000000, 100)).as_in_context(mx.gpu())
import time
mx.nd.waitall();
a = time.time();
y = mx.nd.sparse.dot(x_sparse, w);
y.wait_to_read();
b = time.time();
print(b-a);
# 0.3979964256286621

w_t = w.T
x_dense = x_sparse.tostype('default')
mx.nd.waitall();
c = time.time();
y2 = mx.nd.FullyConnected(x_dense, w_t, no_bias=True, num_hidden=100);
y2.wait_to_read();
d = time.time();
print(d - c);
# 0.0007915496826171875

correct me if there is anything went wrong, thx!

yaoqi-zd · November 6, 2020, 3:07am

Hi, do you have any idea to speed up the dot production between sparse and dense matrix on gpu?

BenBBear · November 6, 2020, 12:00pm

This topic is quite far in time. I try to give you my advice.

Turn the dense tensor into a sparse one using the indices of the sparse tensor. Then do sparse-vs-sparse dot product. It might accelerate your execution when the sparsity ratio is high.

It has less time complexity than the full dot product in a thought experiment, not sure it would work well in practice.

yaoqi-zd · November 8, 2020, 2:53am

thanks for your reply! I’ll have a try

Topic		Replies	Views
Lazy update with Adam optimizer is much slower for sparse input Performance	1	1219	March 10, 2018
Forward pass performance (for one image) is quite slow. Concerns mxnet 0.11.0 Performance	2	1053	January 23, 2018
Speed Issue converting NDarray to np.array Performance	2	664	August 21, 2019
How to speed up the train of neural network model with mxnet? Performance	12	3077	August 10, 2018
Sparse _backward_dot is slow	1	574	September 19, 2018

Is it possible to speed up fullyconnected calculation for sparse input?

Related Topics