I am planning to perform grid search with k-fold cross validation (CV) to optimise hyperparameters of LSTM. Let's say I have n hyperparamter combinations and defined k-fold CV. It means that I have to run LSTM n x k times, which can be computationally intensive.
Q1. is there any practical tip that can save time?
To save time, what if 1) I split the whole training data into test vs val (e.g., 80:20), no k-fold, and find optimal hyperparmeters; in this case n x 1 runs, and 2) perform k-fold CV for LSTM only with the optical hyperparameter found from 1) to demonstrate overall performance of the selected LSTM? Does it make sense?