The running time of MXPredForward and MXPredGetOutput

yizhao · June 27, 2019, 2:08am

The running time of C API MXPredForward is much shorter than the running time of MXPredGetOutput:

auto start = std::chrono::high_resolution_clock::now();
MXPredForward(pred_hnd);
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
LOGI("MXPredForward: %d microseconds.", duration.count());
std::vector<float> data(size);
start = std::chrono::high_resolution_clock::now();
MXPredGetOutput(pred_hnd, output_index, &(data[0]), static_cast<mx_uint>(size));
stop = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
LOGI("MXPredGetOutput: %d microseconds.", duration.count());

The result is:

I/MXNET: MXPredForward: 106 microseconds.
I/MXNET: MXPredGetOutput: 3748967 microseconds.

Why? Is it something related to lazy evaluation?
The code runs on Pixel3 with Snapdragon 835.

lgo · June 30, 2019, 9:34pm

Hi yizhao,

MXPredForward is an asynchronous call, it just start processing your input data through the network.

MXPredGetOutput requires that all operations are finished in order to extract the output data, so it blocks until the inference is finished.

regards,

Lieven

yizhao · July 3, 2019, 2:26pm

Hi Lieven,
Thanks for your reply. It’s really helpful, I understand it now.
There is another question about measuring the running time of networks:
In python, I divide a network into two parts, say part1 and part2. The output of part1 is the input to part2. I want to measure their running time separately. I want to find out which part needs longer computation time.
Code version 1:

import time
# run part 1
start = time.time()
module_part1.forward(...)
output_1 = module_part1.get_outputs()
end = time.time()
time1 = end - start
# run part 2
start = time.time()
module_part2.forward(output_1, ...)
output_2 = module_part2.get_outputs()
print(output_2)
end = time.time()
time2 = end - start

Running the above code shows that time2 > time1.
However, if I run change the code to version 2:

import time
# run part 1
start = time.time()
module_part1.forward(...)
output_1 = module_part1.get_outputs()
if output_1[0][0][0][0] == 0: # Add this line, access output_1 in some way
  do_nothing = 1
end = time.time()
time1 = end - start
# run part 2
start = time.time()
module_part2.forward(output_1, ...)
output_2 = module_part2.get_outputs()
print(output_2)
end = time.time()
time2 = end - start

Running version 2 shows that time1 > time2.
I suspect that this is related to lazy evaluation?
Which version of code gives the correct running time of the two parts of the network?
Thanks very much!

lgo · July 3, 2019, 5:09pm

Hi,

so both calls to forward will return immediately but will process the respective inputs asynchronously. So in both part 1 and part 2 you should wait for the output to be available.
In your version 2 you only wait for the output of part 1.

Assuming output_1 and output_2 are of type mxnet.ndarray, you can call output_1.wait_to_read() and output_2.wait_to_read(). If these are python lists containing mxnet.ndarray’s, just iterate over the list elements and invoke wait_to_read() and each of them.

Lieven

yizhao · July 4, 2019, 12:48am

Hi, Lieven
Thanks so much for your help! I understand it now.

Topic		Replies	Views
Run time is different between python and c++?	7	1744	July 13, 2020
It's strange.C++ predicts much more slowly than python predicts	1	531	May 26, 2019
`MXImperativeInvokeEx` is taking a long time Performance	8	771	January 6, 2019
Mxnet slow, compare to opencv Performance	0	350	September 24, 2021
No noticable speed improvement with higher compute capability Performance	6	686	April 16, 2019

The running time of MXPredForward and MXPredGetOutput

Related Topics