0
votes

I trained the face recognition model with the quantization-aware training method of tensorflow version 1.12.0. The network uses inception-resnet_v1(The source of the code is tensorflow/models/research/slim/nets/). After the training is completed, I get ckpt, then I create a new freeze.py file to generate eval.pb, and then successfully generate the tflite model with toco. But when I finally tested the tflite model with image, I got the following error:

File "src/test_tflite.py", line 21, in <module>
    Interpreter.allocate_tensors()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/lite/python/interpreter.py", line 71, in allocate_tensors
    Return self._interpreter.AllocateTensors()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 106, in AllocateTensors
    Return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_AllocateTensors(self)
RuntimeError: tensorflow/contrib/lite/kernels/pooling.cc:103 input->params.scale != output->params.scale (102483008 != 102482528)Node number 116 (MAX_POOL_2D) failed to prepare.

I tried to replace the network, inception-v3, inception-resnet-v2, but all got a similar error.

My training code is based on the facenet framework and I made small changes based on the original training. After defining total_loss_op, add the following two lines of code:

train_graph = tf.get_default_graph()
tf.contrib.quantize.create_training_graph(input_graph=train_graph, quant_delay=20000)

In the freeze.py file, when the inference graph is defined, I add the following code:

g = tf.get_default_graph()
tf.contrib.quantize.create_eval_graph(input_graph=g)

Then load the ckpt that was trained before, and finally save it as a pb file. The code is as follows:

saver = tf.train.Saver(tf.global_variables())
sess = tf.Session()
with sess.as_default():
   saver.restore(sess, ckpt_model_path)
   frozen_graph_def = graph_util.convert_variables_to_constants(
             sess, sess.graph_def, ['embeddings'])
   tf.train.write_graph(
             frozen_graph_def,
             os.path.dirname(save_pb_path),
             os.path.basename(save_pb_path),
             as_text=False)

Then I used the tensorflow1.12.0 toco tool to convert the pb file and successfully generated tflite. The specific commands are as follows:

./bazel-bin/tensorflow/contrib/lite/toco/toco \
--input_file=inception_resnet_v1_fake_quantized_eval.pb \
--output_file=tflite_model.tflite \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--inference_type=QUANTIZED_UINT8 \
--input_shape="1,160,160,3" \
--input_array=input \
--output_array=embeddings \
--std_value=127.5 \
--mean_value=127.5 \
--default_ranges_min=-1.0 \
--default_ranges_max=1.0

Finally, I used the generated tflite model to test the image and I got the following error.

RuntimeError: tensorflow/contrib/lite/kernels/pooling.cc:103 input->params.scale != output->params.scale (102483008 != 102482528)Node number 116 (MAX_POOL_2D) failed to prepare.

My test code is as follows:

import numpy as np
import tensorflow as tf
import scipy

#Load TFLite model and allocate tensors.
interpreter = tf.contrib.lite.Interpreter(model_path="tensorflow-1.12.0/tflite_model.tflite")

interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
image = scipy.misc.imread("src/1511.jpg")
image_ = np.array([image.astype('uint8')])
print(image_.shape)
print(type(image_))
print(input_details)
print(output_details)
interpreter.set_tensor(input_details[0]['index'], image_)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
1

1 Answers

0
votes

While converting the model, the function HardcodeMinMaxForConcatenation in hardcode_min_max.cc tweaks the minmax of input arrays and output array of concatenation layer to be the same.

Then the function HardcodeMinMaxForAverageOrMaxPool in the same file, would find the output array of max pooling layer get minmax information and skip changing it to the same as the input arrays'.

It turns out the minmax of input array and output array of pooling layer not the same.

I believe it is a bug.