0
votes

I've got a simple program that is supposed to create a logistic regression training model for some data.

There is one output class y (0 = false, 1 = true) There are 25 features I'm struggling to define my variables and placeholders shapes correctly.
Here's the code.

#!/usr/bin/env python3

import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn import model_selection
import matplotlib.pyplot as plt
import seaborn as sns
import sys

sns.set(style='white')
sns.set(style='whitegrid',color_codes=True)



bank_data = pd.read_csv('data/bank.csv',header=0,delimiter = ';')
bank_data = bank_data.dropna()

bank_data.drop(bank_data.columns[[0,3,8,9,10,11,12,13]],axis=1,inplace=True)
data_set = pd.get_dummies(bank_data,columns = ['job','marital','default','housing','loan','poutcome'])
data_set.drop(data_set.columns[[14,27]],axis=1,inplace=True)
data_set_y = data_set['y']
data_set_y.replace(('yes','no'),(1.0,0.0),inplace=True)
data_set_X = data_set.drop(['y'],axis=1)
num_samples = data_set.shape[0]
num_features = data_set_X.shape[1]
print ('num_features = ', num_features)


X = tf.placeholder('float',[None,num_features])
y = tf.placeholder('float',[None,1])

W = tf.Variable(tf.zeros([num_features,1]),dtype=tf.float32)
b = tf.Variable(tf.zeros([1]),dtype=tf.float32)

train_X,test_X,train_y,test_y = model_selection.train_test_split(data_set_X,data_set_y,random_state=0)

print (train_y.head())
print (train_X.head())

prediction = tf.add(tf.matmul(X,W),b)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
num_epochs = 1000


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epoch in range(num_epochs):
        _,l = sess.run([optimizer,cost],feed_dict = {X: train_X, y: train_y})
        if epoch % 50 == 0:
            print ('loss = %f' % (l))

The current error I'm getting is: ValueError: Cannot feed value of shape (3390,) for Tensor 'Placeholder_1:0', which has shape '(?, 1)'

y_train is a pandas series that simply contains either a 0 or a 1. Do I need to reshape y_train into two one-hot vectors and change my dimensions for the y placeholder accordingly?

Here is the head output for both the y training data. 4384 0.0 2560 0.0 1470 0.0 1771 0.0 2604 0.0

Having to deal with shaping my tensors is becoming a serious nightmare. Any help appreciated.

1
This is not first time when I see this very strange thing. Your prediction is one scalar for every element in batch. How do you think, what is result of applying softmax to scalar?Vladimir Bystricky
There are so many logistic regression examples using TF. For example, Stanford TF class is open access and codes on github. Also logistic regression examples included with TF repo. Please read through the numerous examples first.brown.2179

1 Answers

0
votes

You should convert train_y from 1-dimensional tensor to 2-dimensional. For example add the line:

....
train_X,test_X,train_y,test_y 
    = model_selection.train_test_split(data_set_X,data_set_y,random_state=0)
train_y = np.reshape(train_y, (-1,1))
....