81
votes

I am training on 970 samples and validating on 243 samples.

How big should batch size and number of epochs be when fitting a model in Keras to optimize the val_acc? Is there any sort of rule of thumb to use based on data input size?

5
I would say this highly depends on your data. If you are just playing around with some simple task, like XOR-Classifiers, a few hundred epochs with a batch size of 1 is enough to get like 99.9% accuracy. For MNIST I mostly experienced reasonable results with something around 10 to 100 for batch size and less than 100 epochs. Without details to your problem, your architecture, your learning rules / cost functions, your data and so on one can not answer this accurately.daniel451
is there a way to include all the data in every training epoch?kRazzy R
@kRazzyR . Actually for every training all the data will be considered with splited batch. if you want to include all the data in a single time use batchsize of data length.Wickkiey

5 Answers

75
votes

Since you have a pretty small dataset (~ 1000 samples), you would probably be safe using a batch size of 32, which is pretty standard. It won't make a huge difference for your problem unless you're training on hundreds of thousands or millions of observations.

To answer your questions on Batch Size and Epochs:

In general: Larger batch sizes result in faster progress in training, but don't always converge as fast. Smaller batch sizes train slower, but can converge faster. It's definitely problem dependent.

In general, the models improve with more epochs of training, to a point. They'll start to plateau in accuracy as they converge. Try something like 50 and plot number of epochs (x axis) vs. accuracy (y axis). You'll see where it levels out.

What is the type and/or shape of your data? Are these images, or just tabular data? This is an important detail.

21
votes

Great answers above. Everyone gave good inputs.

Ideally, this is the sequence of the batch sizes that should be used:

{1, 2, 4, 8, 16} - slow 

{ [32, 64],[ 128, 256] }- Good starters

[32, 64] - CPU

[128, 256] - GPU for more boost
11
votes

I use Keras to perform non-linear regression on speech data. Each of my speech files gives me features that are 25000 rows in a text file, with each row containing 257 real valued numbers. I use a batch size of 100, epoch 50 to train Sequential model in Keras with 1 hidden layer. After 50 epochs of training, it converges quite well to a low val_loss.

6
votes

I used Keras to perform non linear regression for market mix modelling. I got best results with a batch size of 32 and epochs = 100 while training a Sequential model in Keras with 3 hidden layers. Generally batch size of 32 or 25 is good, with epochs = 100 unless you have large dataset. in case of large dataset you can go with batch size of 10 with epochs b/w 50 to 100. Again the above mentioned figures have worked fine for me.

-8
votes

Epochs is up to your wish, depending upon when validation loss stops improving further. This much should be batch size:


# To define function to find batch size for training the model
# use this function to find out the batch size

    def FindBatchSize(model):
        """#model: model architecture, that is yet to be trained"""
        import os, sys, psutil, gc, tensorflow, keras
        import numpy as np
        from keras import backend as K
        BatchFound= 16

        try:
            total_params= int(model.count_params());    GCPU= "CPU"
            #find whether gpu is available
            try:
                if K.tensorflow_backend._get_available_gpus()== []:
                    GCPU= "CPU";    #CPU and Cuda9GPU
                else:
                    GCPU= "GPU"
            except:
                from tensorflow.python.client import device_lib;    #Cuda8GPU
                def get_available_gpus():
                    local_device_protos= device_lib.list_local_devices()
                    return [x.name for x in local_device_protos if x.device_type == 'GPU']
                if "gpu" not in str(get_available_gpus()).lower():
                    GCPU= "CPU"
                else:
                    GCPU= "GPU"

            #decide batch size on the basis of GPU availability and model complexity
            if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params <1000000):
                BatchFound= 64    
            if (os.cpu_count() <16) and (total_params <500000):
                BatchFound= 64  
            if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params <2000000) and (total_params >=1000000):
                BatchFound= 32      
            if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params >=2000000) and (total_params <10000000):
                BatchFound= 16  
            if (GCPU== "GPU") and (os.cpu_count() >15) and (total_params >=10000000):
                BatchFound= 8       
            if (os.cpu_count() <16) and (total_params >5000000):
                BatchFound= 8    
            if total_params >100000000:
                BatchFound= 1

        except:
            pass
        try:

            #find percentage of memory used
            memoryused= psutil.virtual_memory()
            memoryused= float(str(memoryused).replace(" ", "").split("percent=")[1].split(",")[0])
            if memoryused >75.0:
                BatchFound= 8
            if memoryused >85.0:
                BatchFound= 4
            if memoryused >90.0:
                BatchFound= 2
            if total_params >100000000:
                BatchFound= 1
            print("Batch Size:  "+ str(BatchFound));    gc.collect()
        except:
            pass

        memoryused= [];    total_params= [];    GCPU= "";
        del memoryused, total_params, GCPU;    gc.collect()
        return BatchFound