Predicting and Training in different threads Keras Tensorflow

Question

I am using Keras and Tensorflow to make a kind-of online learning, where I receive new data periodically and I retrain my models with this new data. I can have several models stored in ".h5" files so that when i need to train or predict I load the model and then I perform the necessary operations.

Currently I separated the training and the predictions in two different threads, so that predictions can be made while the other thread trains. With locks I try to make sure that no prediction or training is done in the same model at the same time (I think this works), but I am aware that keras is not so prepared for this. I always some different errors regarding the graph or session of tensorflow, for instance:

Traceback (most recent call last): File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\flask\app.py", line 2292, in wsgi_app response = self.full_dispatch_request() File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\flask\app.py", line 1815, in full_dispatch_request rv = self.handle_user_exception(e) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\flask\app.py", line 1718, in handle_user_exception reraise(exc_type, exc_value, tb) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\flask_compat.py", line 35, in reraise raise value File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\flask\app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\flask\app.py", line 1799, in dispatch_request return self.view_functionsrule.endpoint File "C:\Users\a703572\PycharmProjects\ai-pred-eng\src\run_keras_server.py", line 859, in predict_times 0] + '.h5') File "C:\Users\a703572\PycharmProjects\ai-pred-eng\src\run_keras_server.py", line 164, in get_prediction model, scaler = self.load_model_file(self.graph_pred, self.session, path) File "C:\Users\a703572\PycharmProjects\ai-pred-eng\src\run_keras_server.py", line 114, in load_model_file model = load_model(path) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\saving.py", line 419, in load_model model = _deserialize_model(f, custom_objects, compile) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\saving.py", line 287, in _deserialize_model K.batch_set_value(weight_value_tuples) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2470, in batch_set_value get_session().run(assign_ops, feed_dict=feed_dict) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 206, in get_session session.run(tf.variables_initializer(uninitialized_vars)) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\variables.py", line 2831, in variables_initializer return control_flow_ops.group(*[v.initializer for v in var_list], name=name) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3432, in group return _GroupControlDeps(dev, deps, name=name) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3384, in _GroupControlDeps return no_op(name=name) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\contextlib.py", line 88, in exit next(self.gen) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 4249, in device self._device_function_stack.pop_obj() File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\traceable_stack.py", line 110, in pop_obj return self._stack.pop().obj IndexError: pop from empty list

Or the error:

Exception in thread Thread-1: Traceback (most recent call last): File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\threading.py", line 916, in _bootstrap_inner self.run() File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\threading.py", line 1182, in run self.function(*self.args, **self.kwargs) File "C:\Users\a703572\PycharmProjects\ai-pred-eng\src\run_keras_server.py", line 632, in train self.update_prediction_historics_all() File "C:\Users\a703572\PycharmProjects\ai-pred-eng\src\run_keras_server.py", line 649, in update_prediction_historics_all self.update_prediction_historics_dataset(new_dataset, loadModel=True) File "C:\Users\a703572\PycharmProjects\ai-pred-eng\src\run_keras_server.py", line 672, in update_prediction_historics_dataset 0] + ".h5", loadModel=loadModel)[ File "C:\Users\a703572\PycharmProjects\ai-pred-eng\src\run_keras_server.py", line 198, in get_predictions_sequential model, scaler = self.load_model_file(self.graph_pred, self.session, path) File "C:\Users\a703572\PycharmProjects\ai-pred-eng\src\run_keras_server.py", line 114, in load_model_file model = load_model(path) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\saving.py", line 419, in load_model model = _deserialize_model(f, custom_objects, compile) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\saving.py", line 225, in _deserialize_model model = model_from_config(model_config, custom_objects=custom_objects) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\saving.py", line 458, in model_from_config return deserialize(config, custom_objects=custom_objects) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\layers__init__.py", line 55, in deserialize printable_module_name='layer') File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\utils\generic_utils.py", line 145, in deserialize_keras_object list(custom_objects.items()))) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\sequential.py", line 301, in from_config model.add(layer) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\sequential.py", line 181, in add output_tensor = layer(self.outputs[0]) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\base_layer.py", line 431, in call self.build(unpack_singleton(input_shapes)) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\layers\core.py", line 872, in build constraint=self.bias_constraint) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\base_layer.py", line 252, in add_weight constraint=constraint) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 402, in variable v = tf.Variable(value, dtype=tf.as_dtype(dtype), name=name) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\variables.py", line 183, in call return cls._variable_v1_call(*args, **kwargs) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\variables.py", line 146, in _variable_v1_call aggregation=aggregation) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\variables.py", line 125, in previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\variable_scope.py", line 2444, in default_variable_creator expected_shape=expected_shape, import_scope=import_scope) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\variables.py", line 187, in call return super(VariableMetaclass, cls).call(*args, **kwargs) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\variables.py", line 1329, in init constraint=constraint) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\variables.py", line 1492, in _init_from_args ops.add_to_collections(collections, self) File "C:\Users\a703572\AppData\Local\Programs\Python\Python36\lib\contextlib.py", line 88, in exit next(self.gen) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 5347, in init_scope yield File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 4369, in exit self._graph._pop_control_dependencies_controller(self) File "C:\Users\a703572\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 4390, in _pop_control_dependencies_controller assert self._control_dependencies_stack[-1] is controller AssertionError

My solution was using a graph for prediction and a graph for training, and every time I want to perform a tf operation I use:

with server_predict.graph_pred.as_default():
    with tf.Session(graph=server_predict.graph_pred) as sess:

And I also added the line:

        backend.set_session(sess)

Despite this, I keep having the errors coming from the tf session or graph, as It seems that the operations are not properly separated. Another error is the one I wrote in this issue that is still opened, regarding the tf session. Solution given using k.clear_session() (k = keras backend) did not work for me.

Does any one have had a similar problem or has programmed a similar task that might help me?

Thanks!!

Found a "wrap" to make this work. Instead of launching two threads over the same class (custom), what I have is two objects of the same class, one is dedicated to training and the other to predict. This is not a real multithread app (even though the two objects are launched from the same main). Until I (we) find a proper multithread solution this might help.

However I do not understand know how I got the errors before, and just by having two objects not, even if these objects run in the same process. Is it that keras/tensorflow can only make operations on only one graph but defines different graphs for different objects on the same process?

Tough one... but it seems keras has only one graph, no matter how many models you have. Is it possible to have two keras instances, one in each thread? — Daniel Möller
are you actually asking me if that's posible or if I can program it? XD I dont know if it is possible to have two keras instances, do you know about this? — Adrián Arroyo Perez
No, I don't... :( --- I don't really know much about threads, but if two different threads imported their own Keras and kept everything internal, maybe it would be feasible? — Daniel Möller

Ian Quah Ian Quah · Accepted Answer · 2018-12-04T18:59:47

Easiest solution is to have two separate keras models - the first runs in inference mode, and the second runs in training mode. Every time the inference model gets a new dataset to predict on, it first checks to see if it has the most "up to date" .h5 file, if not then it loads it in first then runs the prediction. This way you can avoid locks and such.

It's hard to give advice specific to your case because what you want is likely not the same as what I need

This is my opinion after having done something similar with Tensorflow Multiprocessing

Predicting and Training in different threads Keras Tensorflow

1 Answers