I’m a beginner. From the diagram of BRNN in unrolled form, it seems that forward and backward passes can run in parallel. Therefore, please explain this statement:
that they require both a forward and a backward pass and that the backward pass is dependent on the outcomes of the forward pass
Hey @arvindpdmn, theoretically yes. But that’s the ideal case with the assumption that we have extra GPU available. However, we assume to use the full capacity of all the GPUs even for training standard RNN. As a result, the computational time will double for BRNN.
I agree with @arvindpdmn, the statement there is a dependence between the backward pass and the outcomes of the forward pass is strange.
While @gold_piggy has a point, the computation time doubles, and you will need to keep the output of the backward pass in memory if the GPU only accommodates one direction, this does not mean there is a dependence.