I'm currently applying Tensorflow to the Titanic machine learning problem on Kaggle: https://www.kaggle.com/c/titanic
My training data is 891 by 8 (891 data points and 8 features). The goal is to predict whether a passenger on the Titanic survived or not. So it's a binary classification problem.
I'm using a single layer neural network. This is my cost function:
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction,y))
This is my optimizer:
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=momentum).minimize(cost)
Here is my question/problem:
I tried submitting some predictions made by the neural network to Kaggle, and so far all my attempts have 0% accuracy. However, when I replaced the predictions for the first 10 passengers to the predictions made by RandomForestClassifier() from sk-learn, the accuracy sky-rocketed to 50%..
My guess for the incompetence of the neural network is that it's caused by inadequate training data. So I was thinking about adding noise to the input data, but I don't really have an idea how.
My 8 features of the training data are: ['Pclass', 'Sex', 'Age', 'Fare', 'Child', 'Fam_size', 'Title', 'Mother']. Some are categorical and some are continuous.
Any ideas/links are much appreciated! Thanks a lot in advance.
EDIT:
I found what's wrong with my submissions. For some reason my predictions were all floats instead of int. So I just did this:
result_df.astype(int)
Thank you everyone for pointing out that my submission format is wrong.