How to efficiently compute the following tensor-matrix product?

I have a tensor A \in R^{m,n,d} and a matrix B \in R^{m,d}.

Let A_i denote the n \times d matrix at location A[i].

Let b_j denote the vector contained by the j'th row of B, i.e. b_j = B[j].

I am trying to figure out the right set of MXNet operations to efficiently compute the matrix C whose k\text{'th} row is equal to the matrix-vector product of A_k * b_k = \texttt{nd.dot}(A[k], B[k]).

Any help is greatly appreciated.

Found the answer is to reshape B to have size (m,d,1) and to use “batch_dot”:

B = B.\text{reshape}(m,d,1) and then call \text{nd.batch_dot}(A,B).