Hi Lieven,
Thanks for your reply. It’s really helpful, I understand it now.
There is another question about measuring the running time of networks:
In python, I divide a network into two parts, say part1 and part2. The output of part1 is the input to part2. I want to measure their running time separately. I want to find out which part needs longer computation time.
Code version 1:
import time
# run part 1
start = time.time()
module_part1.forward(...)
output_1 = module_part1.get_outputs()
end = time.time()
time1 = end - start
# run part 2
start = time.time()
module_part2.forward(output_1, ...)
output_2 = module_part2.get_outputs()
print(output_2)
end = time.time()
time2 = end - start
Running the above code shows that time2 > time1.
However, if I run change the code to version 2:
import time
# run part 1
start = time.time()
module_part1.forward(...)
output_1 = module_part1.get_outputs()
if output_1[0][0][0][0] == 0: # Add this line, access output_1 in some way
do_nothing = 1
end = time.time()
time1 = end - start
# run part 2
start = time.time()
module_part2.forward(output_1, ...)
output_2 = module_part2.get_outputs()
print(output_2)
end = time.time()
time2 = end - start
Running version 2 shows that time1 > time2.
I suspect that this is related to lazy evaluation?
Which version of code gives the correct running time of the two parts of the network?
Thanks very much!
so both calls to forward will return immediately but will process the respective inputs asynchronously. So in both part 1 and part 2 you should wait for the output to be available.
In your version 2 you only wait for the output of part 1.
Assuming output_1 and output_2 are of type mxnet.ndarray, you can call output_1.wait_to_read() and output_2.wait_to_read(). If these are python lists containing mxnet.ndarray’s, just iterate over the list elements and invoke wait_to_read() and each of them.