The tensorflow documentation for dynamic range quantization states that:
At inference, weights are converted from 8-bits of precision to floating point and computed using floating-point kernels. This conversion is done once and cached to reduce latency.
and also in dynamic range quantization, the activations are always stored in float 32, however, they are converted to 8-bit integers while processing and back to floating point after the processing is done.
I am confused that if weights are converted to float32 at inference time, then how is quantization done?