I have a generator which yields infinite amount of data (Random image crops). I would like to create a tf.Dataset based on let's say 10,000 first data points and cache it to use them to train models?
Currently, I have a generator which takes 1-2 seconds to create each datapoint and this is the main performance blocker. I have to wait a minute to generate a batch of 64 images (the preprocessing() function is very expensive, so I would like to reuse the results).
ds = tf.Dataset.from_generator() method allows us to create such infinite dataset. Instead, I would like to create a finite dataset using N first outputs from the generator and cache it like:
ds = ds.cache().
Alternative solution is to keep generating new data, and using cached datapoints while rendering the generator.