2
votes

I read about and am interested in TensorFlow's translate.py using Seq2SeqModel, and I want to use two seq2seq model(BotEngine, GrammarGenerator) utilizing translate.py which used in TF's tutorial code, like this.

    with tf.Session() as sess:
    with tf.variable_scope("be_model"):
        model_be = BotEngine.create_model(sess, True)
        print("be model created")
    with tf.variable_scope("gg_model"):
        model_gg = GrammarGenerator.create_model(sess, True)
        print("gg model created")

when I trained & tested(after read the checkpoint file) the two models respectively, no error occurs. But when I read the two checkpoint files in succession, the following error occurs:

    (tensorflow) C:\test>python conversation.py --conversation
    2017-10-05 14:43:52.150316: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn'
    t compiled to use SSE instructions, but these are available on your machine and
    could speed up CPU computations.
    2017-10-05 14:43:52.150316: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn'
    t compiled to use SSE2 instructions, but these are available on your machine and
     could speed up CPU computations.
    2017-10-05 14:43:52.150316: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn'
    t compiled to use SSE3 instructions, but these are available on your machine and
     could speed up CPU computations.
    2017-10-05 14:43:52.150316: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn'
    t compiled to use SSE4.1 instructions, but these are available on your machine a
    nd could speed up CPU computations.
    2017-10-05 14:43:52.150316: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn'
    t compiled to use SSE4.2 instructions, but these are available on your machine a
    nd could speed up CPU computations.
    2017-10-05 14:43:52.150316: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn'
    t compiled to use AVX instructions, but these are available on your machine and
    could speed up CPU computations.
    2017-10-05 14:43:52.151316: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn'
    t compiled to use AVX2 instructions, but these are available on your machine and
     could speed up CPU computations.
    2017-10-05 14:43:52.151316: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn'
    t compiled to use FMA instructions, but these are available on your machine and
    could speed up CPU computations.
    Reading model parameters from ./seq2seq_bemodel\seq2seq.ckpt-6000
    [be model created]

    Reading model parameters from ./seq2seq_ggmodel\seq2seq.ckpt-43800
    2017-10-05 14:45:06.332559: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/Variable_
    1 not found in checkpoint
    2017-10-05 14:45:06.333559: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/be_pro_w
    not found in checkpoint
    2017-10-05 14:45:06.334559: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/Variable
    not found in checkpoint
    2017-10-05 14:45:06.337559: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/bia
    s not found in checkpoint
    2017-10-05 14:45:06.338559: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/be_proj_b
     not found in checkpoint
    2017-10-05 14:45:06.341559: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/ker
    nel not found in checkpoint
    2017-10-05 14:45:06.344559: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnOutputProje
    ction/bias not found in checkpoint
    2017-10-05 14:45:06.345559: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnOutputProje
    ction/kernel not found in checkpoint
    2017-10-05 14:45:06.346559: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnV_0 not fou
    nd in checkpoint
    2017-10-05 14:45:06.348560: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/AttnW_0 not fou
    nd in checkpoint
    2017-10-05 14:45:06.351560: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/bias not found
    in checkpoint
    2017-10-05 14:45:06.353560: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/kernel not foun
    d in checkpoint
    2017-10-05 14:45:06.355560: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/multi_rnn_cell/
    cell_0/lstm_cell/bias not found in checkpoint
    2017-10-05 14:45:06.359560: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/multi_rnn_cell/
    cell_0/lstm_cell/kernel not found in checkpoint
    2017-10-05 14:45:06.360560: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/multi_rnn_cell/
    cell_1/lstm_cell/kernel not found in checkpoint
    2017-10-05 14:45:06.362560: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/multi_rnn_cell/
    cell_1/lstm_cell/bias not found in checkpoint
    2017-10-05 14:45:06.363560: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/multi_rnn_cell/
    cell_2/lstm_cell/bias not found in checkpoint
    2017-10-05 14:45:06.366561: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/attention_decoder/multi_rnn_cell/
    cell_2/lstm_cell/kernel not found in checkpoint
    2017-10-05 14:45:06.367561: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/embedding_attention_decoder/embedding not found in checkpoint

    2017-10-05 14:45:06.371561: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/rnn/embedding_wrapper/embedding not found in checkpoint
    2017-10-05 14:45:06.374561: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/rnn/embedding_wrapper/multi_rnn_cell/cell_0/lstm_cell/bias no
    t found in checkpoint
    2017-10-05 14:45:06.375561: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/rnn/embedding_wrapper/multi_rnn_cell/cell_0/lstm_cell/kernel
    not found in checkpoint
    2017-10-05 14:45:06.377561: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/rnn/embedding_wrapper/multi_rnn_cell/cell_1/lstm_cell/bias no
    t found in checkpoint
    2017-10-05 14:45:06.378561: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/rnn/embedding_wrapper/multi_rnn_cell/cell_1/lstm_cell/kernel
    not found in checkpoint
    2017-10-05 14:45:06.379561: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/rnn/embedding_wrapper/multi_rnn_cell/cell_2/lstm_cell/bias no
    t found in checkpoint
    2017-10-05 14:45:06.381561: W c:\l\tensorflow_1501907206084\work\tensorflow-1.2.
    1\tensorflow\core\framework\op_kernel.cc:1158] Not found: Key be_model/embedding
    _attention_seq2seq/rnn/embedding_wrapper/multi_rnn_cell/cell_2/lstm_cell/kernel
    not found in checkpoint
    Traceback (most recent call last):
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\client\session.py", line 1139, in _do_call
        return fn(*args)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\client\session.py", line 1121, in _run_fn
        status, run_metadata)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\contextlib.py", line 66, in
    __exit__
        next(self.gen)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
        pywrap_tensorflow.TF_GetCode(status))
    tensorflow.python.framework.errors_impl.NotFoundError: Key be_model/Variable_1 n
    ot found in checkpoint
             [[Node: gg_model/save/RestoreV2_1 = RestoreV2[dtypes=[DT_INT32], _devic
    e="/job:localhost/replica:0/task:0/cpu:0"](_arg_gg_model/save/Const_0_0, gg_mode
    l/save/RestoreV2_1/tensor_names, gg_model/save/RestoreV2_1/shape_and_slices)]]

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "conversation.py", line 113, in 
        main()
      File "conversation.py", line 102, in main
        conversation()
      File "conversation.py", line 55, in conversation
        model_gg = GrammarGenerator.create_model(sess, True)
      File "C:\test\GrammarGenerator.py", line 50, in create_model
        model.saver.restore(session,ckpt.model_checkpoint_path)  
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\training\saver.py", line 1548, in restore
        {self.saver_def.filename_tensor_name: save_path})
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\client\session.py", line 789, in run
        run_metadata_ptr)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\client\session.py", line 997, in _run
        feed_dict_string, options, run_metadata)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\client\session.py", line 1132, in _do_run
        target_list, options, run_metadata)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\client\session.py", line 1152, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.NotFoundError: Key be_model/Variable_1 n
    ot found in checkpoint
             [[Node: gg_model/save/RestoreV2_1 = RestoreV2[dtypes=[DT_INT32], _devic
    e="/job:localhost/replica:0/task:0/cpu:0"](_arg_gg_model/save/Const_0_0, gg_mode
    l/save/RestoreV2_1/tensor_names, gg_model/save/RestoreV2_1/shape_and_slices)]]

    Caused by op 'gg_model/save/RestoreV2_1', defined at:
      File "conversation.py", line 113, in 
        main()
      File "conversation.py", line 102, in main
        conversation()
      File "conversation.py", line 55, in conversation
        model_gg = GrammarGenerator.create_model(sess, True)
      File "C:\test\GrammarGenerator.py", line 45, in create_model
        dtype=tf.float32)
      File "C:\test\seq2seq_model.py", line 159, in __init__
        self.saver = tf.train.Saver(tf.global_variables()) 
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\training\saver.py", line 1139, in __init__
        self.build()
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\training\saver.py", line 1170, in build
        restore_sequentially=self._restore_sequentially)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\training\saver.py", line 691, in build
        restore_sequentially, reshape)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\training\saver.py", line 407, in _AddRestoreOps
        tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\training\saver.py", line 247, in restore_op
        [spec.tensor.dtype])[0])
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\ops\gen_io_ops.py", line 640, in restore_v2
        dtypes=dtypes, name=name)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\framework\op_def_library.py", line 767, in apply_op
        op_def=op_def)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\framework\ops.py", line 2506, in create_op
        original_op=self._default_original_op, op_def=op_def)
      File "C:\Users\coco\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\pyt
    hon\framework\ops.py", line 1269, in __init__
        self._traceback = _extract_stack()

    NotFoundError (see above for traceback): Key be_model/Variable_1 not found in ch
    eckpoint
             [[Node: gg_model/save/RestoreV2_1 = RestoreV2[dtypes=[DT_INT32], _devic
    e="/job:localhost/replica:0/task:0/cpu:0"](_arg_gg_model/save/Const_0_0, gg_mode
    l/save/RestoreV2_1/tensor_names, gg_model/save/RestoreV2_1/shape_and_slices)]]


    (tensorflow) C:\test> :(

The above error message said

'Key be_model/Variable_1 not found in checkpoint'.

but I already used inspect_checkpoint.py to check all tensors stored in both checkpoint files, but there isn't any duplicated tensor scope, and most of all, be_model/Variable_1 tensor exists on my be_model checkpoint file.

    (tensorflow) C:\test\seq2seq_bemodel>python inspect_checkpoint.py --file_name se
    q2seq.ckpt-5800
    be_model/Variable (DT_FLOAT) []
    be_model/Variable_1 (DT_INT32) []
    be_model/be_model/be_pro_w/Adam (DT_FLOAT) [2179,150]

I also didn't understand the error raised above; I didn't use Variable_1 tensor on my code.

1

1 Answers

2
votes

The problem was in the saver attribute of my seq2seq object. I called the method with tf.global_variables() arguments. I think I was wrong in looking over saver object habitually, that's my fault.

When encountering a problem related to the scope of variables or especially the saver object, I think inspect_checkpoint.py looks very useful.