I am working on a document classification problem using CNN/LSTM and embeddings generated from universal sentence encoder. I have 10,000 records and each record has about 100~600 sentences. I save all the document matrices into one json file before I feed them into the neural network models. The overall json file is about 20GB which will take too much memory.
I'm not sure if I should save documents in text format and convert them into sentence embeddings during the training process. What's the potential solution?