I am new to Tensorflow, having previously extensively used scikit-learn. As one of my first exercises in trying to transition to TensorFlow, I'm trying to reproduce some of the results I obtained with scikit-learn's MLPClassifier.
When I use the MLPClassifier with mostly default settings, I get up to 98% accuracy on the test set. However, when I implement what I believe is an equivalent single layer ANN in TensorFlow, I get less than 90% accuracy on the test set. The only way I can get TensorFlow to yield similar accuracy is to train over the training set multiple (> 50) times.
Any idea on where the difference may be coming from? Or is there any implementation of the sklearn MLPClassifier in Tensorflow to which I can compare my code?
As far as I am concerned, I am using the same optimizer (Adam), the same learning rate, L2 regularization with the same parameter, the same activation function (ReLU) and softmax evaluation at the output layer.
My implementation of the TensorFlow graph is the following:
n_units = 500
X = tf.placeholder(tf.float32, [None, n_features])
Y = tf.placeholder(tf.float32, [None, n_classes])
# Create weights for all layers
W_input = tf.Variable(tf.truncated_normal([n_features, n_units]))
W_out = tf.Variable(tf.truncated_normal([n_units, n_classes]))
# Create biases for all layers
b_1 = tf.Variable(tf.zeros([n_units]))
b_2 = tf.Variable(tf.zeros(([n_classes])))
# Mount layers
hidden_layer = tf.nn.relu(tf.matmul(X, W_input) + b_1)
logits = tf.matmul(hidden_layer, W_out) + b_2
# Get all weights into a single list
all_weights = tf.concat([tf.reshape(W_input, [-1]), tf.reshape(W_out, [-1])], 0)
# Compute loss function
cross_entropy = tf.reduce_mean(
tf.losses.softmax_cross_entropy(onehot_labels=Y, logits=logits))
# Compute regularization parameter
regularizer = 0.0001*tf.nn.l2_loss(all_weights)
# Train step
train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy + regularizer)
# Get number of correct predictions
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(Y, 1))
# Class prediction
prediction = tf.argmax(tf.nn.softmax(logits), 1)
# Get accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
My implementation of the sklearn model is simply:
clf = neural_network.MLPClassifier(hidden_layer_sizes = (500,), random_state=42)