I am developing an AI model with Tensorflow.js and Node.js. As part of this, I need to read and parse my large dataset in a streaming fashion (it's way too big to fit in memory all at the same time). This process results ultimately results in a pair of generator functions (1 for the input data, and another for the output data) that iteratively yield Tensorflow.js Tensors:
function* example_parser() {
while(thereIsData) {
// do reading & parsing here....
yield next_tensor;
}
}
....which are wrapped in a pair of tf.data.generator()s, followed by a tf.data.zip().
This process can be fairly computationally intensive at times, so I would like to refactor into a separate Node.js worker process / thread as I'm aware that Node.js executes Javascript in a single-threaded fashion.
However, I am also aware that if I were to transmit the data normally via e.g. process.send(), the serialisation / deserialisation would slow the process down so much that I'm better off keeping everything inside the same process.
To this end, my question is this:
How can I efficiently transmit (a stream of) Tensorflow.js Tensors between Node.js processes without incurring a heavy serialisation / deserialisation penalty?