Reproducing scikit-learn's MLPClassifier in TensorFlow

Question

I am new to Tensorflow, having previously extensively used scikit-learn. As one of my first exercises in trying to transition to TensorFlow, I'm trying to reproduce some of the results I obtained with scikit-learn's MLPClassifier.

When I use the MLPClassifier with mostly default settings, I get up to 98% accuracy on the test set. However, when I implement what I believe is an equivalent single layer ANN in TensorFlow, I get less than 90% accuracy on the test set. The only way I can get TensorFlow to yield similar accuracy is to train over the training set multiple (> 50) times.

Any idea on where the difference may be coming from? Or is there any implementation of the sklearn MLPClassifier in Tensorflow to which I can compare my code?

As far as I am concerned, I am using the same optimizer (Adam), the same learning rate, L2 regularization with the same parameter, the same activation function (ReLU) and softmax evaluation at the output layer.

My implementation of the TensorFlow graph is the following:

n_units = 500

X = tf.placeholder(tf.float32, [None, n_features])
Y = tf.placeholder(tf.float32, [None, n_classes])    

# Create weights for all layers
W_input = tf.Variable(tf.truncated_normal([n_features, n_units]))
W_out = tf.Variable(tf.truncated_normal([n_units, n_classes]))

# Create biases for all layers
b_1 = tf.Variable(tf.zeros([n_units]))
b_2 = tf.Variable(tf.zeros(([n_classes])))

# Mount layers
hidden_layer = tf.nn.relu(tf.matmul(X, W_input) + b_1)
logits = tf.matmul(hidden_layer, W_out) + b_2

# Get all weights into a single list
all_weights = tf.concat([tf.reshape(W_input, [-1]), tf.reshape(W_out, [-1])], 0)

# Compute loss function
cross_entropy = tf.reduce_mean(
    tf.losses.softmax_cross_entropy(onehot_labels=Y, logits=logits))

# Compute regularization parameter
regularizer = 0.0001*tf.nn.l2_loss(all_weights)

# Train step
train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy + regularizer)

# Get number of correct predictions
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(Y, 1))

# Class prediction
prediction = tf.argmax(tf.nn.softmax(logits), 1)

# Get accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

My implementation of the sklearn model is simply:

clf = neural_network.MLPClassifier(hidden_layer_sizes = (500,), random_state=42)

Adit Sanghvi Adit Sanghvi · Accepted Answer · 2018-03-01T22:39:32

A MLP Classifier is a neural network. In essence, it needs to be trained for multiple iterations (epochs) before it learns appropriate weights on the hidden layers using backpropagation, after which it can classify correctly.

If you look at sklearns implementation, there is a default parameter called max_iter

max_iter : int, optional, default 200

Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

Essentially, it runs for 200 epochs before giving you your accuracy of 0.98. This is why you need to run the same graph in tensorflow 200 times (I assume 50 as you've stated is also enough) to get the exact same output.

Reproducing scikit-learn's MLPClassifier in TensorFlow

1 Answers