In the book Deep Learning with Python by François Chollet (creator of Keras), section 5.3 (see the companion Jupyter notebook), the following is unclear to me:
Let's put this in practice by using the convolutional base of the VGG16 network, trained on ImageNet, to extract interesting features from our cat and dog images, and then training a cat vs. dog classifier on top of these features.
[...]
There are two ways we could proceed:
- Running the convolutional base over our dataset, recording its output to a Numpy array on disk, then using this data as input to a standalone densely-connected classifier similar to those you have seen in the first chapters of this book. This solution is very fast and cheap to run, because it only requires running the convolutional base once for every input image, and the convolutional base is by far the most expensive part of the pipeline. However, for the exact same reason, this technique would not allow us to leverage data augmentation at all.
- Extending the model we have (conv_base) by adding Dense layers on top, and running the whole thing end-to-end on the input data. This allows us to use data augmentation, because every input image is going through the convolutional base every time it is seen by the model. However, for this same reason, this technique is far more expensive than the first one.
Why can't we augment our data (generate more images from the existing data), run the convolutional base over the augmented dataset (one time), record its output and then use this data as input to a standalone fully-connected classifier?
Wouldn't it give similar results to the second alternative but be faster?
What am I missing?