I have a graph that unfortunately has some nodes that do not support batching.(custom ops that have yet to be fleshed out) I've been able to have multiple threads calling sess.run and throwing the data in through the feed_dict. I have now converted my data to tfrecords to properly utilize queues, but still can't find a way to tell it to run multiple instances of the graph in parallel without just having multiple threads calling sess.run(). I assume the tensorflow developers have created a more "pythonic" way to do so somewhere, but I have yet to find it. How do I do this within tensorflow?
edit: even with the data batched, the previous question stands, as my computation spends half its time on the cpu, and half on the gpu, thus regardless of batching, one will be waiting on the other for half the time. I'd like to have the graph train multiple samples asynchronously to fill that space.
edit 2: I guess I have to put pseudocode here for people who don't want to read the text above.
import tensorflow as tf
resultOfCPUCalculation = someCPUOnlyOP(inputData)\\does not support batching
gpuResults = aBunchOfGPUOps(resultOfCPUCalculation)
with tf.Session() as sess:
sess.run([gpuResults])
//only uses 1 cpu core, and the gpu is idle while it's doing it's thing.
I'd like to do this in a "pipeline" manner, where as soon as the cpu op is done, it starts on another sample.