Choice of infrastructure for faster deep learning model training with tensorflow?

Question

I am a newbie in deep learning with tensorflow.I am trying out a seq2seq model sample code.

I wanted to understand:

What is the minimum values of number of layers, layer size and batch size that I could start off with to be able to test the seq2seq model with satisfactory accuracy?
Also,the minimum infrastructure setup required in terms of memory and cpu capability to train this deep learning model within a max time of a few hours.

My experience has been training a seq2seq model to build a neural network with 2 layers of size 900 and batch size 4

took around 3 days to train on a 4GB RAM,3GHz Intel i5 single core processor.
took around 1 day to train on a 8GB RAM,3GHz Intel i5 single core processor.

Which helps the most for faster training - more RAM capacity, multiple CPU cores or a CPU + GPU combination core?

chris chris · Accepted Answer · 2016-12-28T20:11:46

Disclaimer: I'm also new, and could be wrong on a lot of this.

I am a newbie in deep learning with tensorflow.I am trying out a seq2seq model sample code.

I wanted to understand:

What is the minimum values of number of layers, layer size and batch size that I could start off with to be able to test the seq2seq model with satisfactory accuracy?

I think that this will just have to be up to your experimentation. Find out what works for your data set. I have heard a few pieces of advice: don't pick your own architecture if you can - find someone else's that is tried and tested. Seems deeper networks are better than wider if you're going to choose between the too. I also think bigger batch sizes are better if you have the memory. I've heard to maximize network size and then regularize so you don't overfit.

I have the impression these are big questions that no one really knows the answer to (could be very wrong about this!). We'd all love a smart way of choosing layer size / number of layers, but no one knows exactly how changing these things affects training.

Also,the minimum infrastructure setup required in terms of memory and cpu capability to train this deep learning model within a max time of a few hours.

Depending on your model, that could be an unreasonable request. Seems like some models train for hundreds if not thousands of hours (on GPUs).

My experience has been training a seq2seq model to build a neural network with 2 layers of size 900 and batch size 4 took around 3 days to train on a 4GB RAM,3GHz Intel i5 single core processor. took around 1 day to train on a 8GB RAM,3GHz Intel i5 single core processor. Which helps the most for faster training - more RAM capacity, multiple CPU cores or a CPU + GPU combination core?

I believe a GPU will help you the most. I have seen some stuff that uses the CPU (asynchronous actor critic or something? They didn't use locking) where it seemed like CPU was better, but I think GPU will give you huge speedups.

Choice of infrastructure for faster deep learning model training with tensorflow?

1 Answers