MNIST training time in CPU

Question

I have created a simple feed forward Neural Network library in Java - and I need a benchmark to compare and troubleshoot my library.

Computer specs:

AMD Ryzen 7 2700X Eight-Core Processor
RAM 16.0 GB
WINDOWS 10 OS
JVM args: -Xms1024m -Xmx8192m

Note that I am not using a GPU.

Please list the following specs:

Computer specs?
GPU or CPU (CPU is proffered but GPU is good info)
Number of inputs 784 (this is fixed)
For each layer:
- How many nodes?
- What activation function?
Output layer:
- How many nodes? (10 if classification or 1 as regression)
- What activation function?
What loss function?
What gradient descent algorithm (i.e.: vanilla)
What batch size?
How many epochs? (not iterations)
And finally, what is the training time and accuracy?

Thank you so much

Edit

Just to give an idea of what I am dealing with. I created a network with

784 input nodes
784 in hidden layer 0
256 in hidden layer 1
128 in hidden layer 2
1 output nodes
mini-batch size 5
16 threads for backprop And it has been training for ~8 hours and has only completed 694 iterations - that is not even 20% of one epoch.

How is this done in minutes as I've seen some claims?

I'm unclear as to what you want here. Do you want someone to train a network with the MNIST dataset and report the performance? — rayryeng
@rayryeng I am assuming someone already has - and they have the information available — Edv Beq
Probably not - MNIST is a toy dataset and usually just used for demonstration purposes to show that neural networks can achieve higher accuracy and performance within just a few epochs. Results are quite reproducible regardless of any package or framework you use. Most likely someone will have to set this up and run it again to give you what you want - that I don't believe many people will want to do. — rayryeng
@EdvBeq, you might want to consider spinning up virtual machines of varying specs in AWS, GCP, Azure, etc. and building the desired list yourself :) — Peter Leimbigler

user8426627 user8426627 · Accepted Answer · 2019-07-18T00:16:20

784 input nodes 784 in hidden layer 0 256 in hidden layer 1 128 in hidden layer 2 1 output nodes mini-batch size 5

You could do thinner do: 784 => 784/2, 160, 40, batch size at least 50.

And yes event in java, what generally slow, the naive solution must run like several minutes for COMPLETE train means 10~20 epoch.

How have you implemented it? Do not tell you have a neuron class and each neuron is represented by an instance.

It dont suppose to run so horrible slow. The optimisations i know is to represent second matrix of dot transposed and use strassen-vinograd algorithm, but you do wrong something else

Look at my dot implementation:

import java.util.Arrays;

public class JTensor {


private float [] data;// contains data.length

private int width;



public static void dot_T(double [] out, double [] x, double [] y, int wy) {

    int iOut = 0;
    for (int ix = 0; ix < x.length; ix+=wy) {

        for (int iy = 0; iy < y.length;) {

            int ixv = ix;

            int iyLimit = iy + wy;
            double summ = 0;
            for(;iy <iyLimit;) {                    
                summ += x[ixv++] * y[iy++];             

            }
            out[iOut++] = summ;         



        }


    }


}

public static void main(String[] args) {

    System.out.println("generate random");

    int size = 1000;

    double []x = new double [size * size];

    double []y = new double [size * size];

    double []out = new double [size * size];

     for (int i = 0; i < x.length; i++) {            
         x[i] = (double)i   ;       
    }

     for (int i = 0; i < y.length; i++) {            
         y[i] = (double)i   ;           
    }       

    System.out.println("start ");
    long start = System.nanoTime();

    JTensor.dot_T(out, x,y,size);

    long end = System.nanoTime();

    System.out.println("elapsed " + ((end- start)/ (1000.0*1000*1000)));

    //System.out.println(Arrays.toString( x));
    //System.out.println(Arrays.toString( y));
    //System.out.println(Arrays.toString( out));


}

MNIST training time in CPU

2 Answers