0
votes

I've a wrote a code based on this Tensorflow example . the issue that I'm having is that accuracy that I get doesn't make any sense ( it' either 1 or 0 ) so my question is what I'm missing here?

import tensorflow as tf
import  numpy as np
import  csv
import os


#defining  batch fuuntion

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]


os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
Training_File = 'Training.csv'
Test_File     = 'Test.csv'
numberOFClasses = 19
batchSize = 19

# read training data
filePointer  = open(Training_File, 'r', newline='')
reader = csv.reader(filePointer)
Training_Data   = []
Training_Labels = []
row = next(reader)
len(row)
#### Getting Training_Data and labels
for  row in reader:
    Training_Data.append(row[:-2])
    Training_Labels.append(row[-1])
# close TrainingFile  and  getting Data and labels from Test
len(Training_Data)
filePointer.close();

filePointer =open(Test_File, 'r', newline='')
reader  =   csv.reader(filePointer)
Test_Data = []
Test_Labels=[]
row = next(reader)

for row in reader:
    Test_Data.append(row[:-2])
    Test_Labels.append(row[-1])
len(Test_Labels)
filePointer.close()
len(Training_Data[0])



x = tf.placeholder('float',[None,len(row[:-2])])
w = tf.Variable(tf.zeros([len(row[:-2]),numberOFClasses]))
b = tf.Variable(tf.zeros([numberOFClasses]))
model = tf.add(tf.matmul(x,w),b)
y_ = tf.placeholder(tf.float32,[None,numberOFClasses])
y =  tf.nn.softmax(model)

cross_entropy= -tf.reduce_sum(y_*tf.log(y),reduction_indices=[1])
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
index =0
batch_xs = []
batch_ys = []
batch_txs= []
batch_tys= []
# Training processing


for i in batch(Training_Data,batchSize):
    batch_xs.append(i)
for i in batch(Training_Labels,batchSize):
    batch_ys.append(i)

for i in batch(Test_Data,batchSize):
    batch_txs.append(i)
for i in batch(Test_Labels,batchSize):
    batch_tys.append(i)


#print(np.reshape(batch_ys[len],(1,batchSize)))
for i in range(len(batch_xs) -1 ):
    sess.run(train_step,feed_dict={x:batch_xs[i],y_:np.reshape(batch_ys[i],(1,batchSize))})



correct_prediction = tf.equal(tf.arg_max(y,1),tf.arg_max(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
for i in range(len(batch_txs) -1):
    print(sess.run(accuracy,feed_dict={x:batch_txs[i],y_:np.reshape(batch_tys[i],(1,batchSize))}))

UPDATE I've changed the size of the batches:

.............................................
numberOFClasses = 19

batchSize = 19 * 3
....................................
for i in range(int(len(batch_xs)/batchSize) ):
    print(sess.run(train_step,feed_dict={x:batch_xs[i],y_:np.reshape(batch_ys[i],(batchSize,numberOFClasses))}))



correct_prediction = tf.equal(tf.arg_max(y,1),tf.arg_max(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
for i in range(len(batch_txs) -1):
    print(sess.run(accuracy,feed_dict={x:batch_txs[i],y_:np.reshape(batch_tys[i],(1,batchSize))}))

the result still the same, So I just don't get what I'm missing here

2ndUpdate

Running this part of the code : for j in range(len(batch_xs)-1): print(sess.run(train_step,feed_dict={x:batch_xs[j],y_:np.reshape(batch_ys[j],(numberOFClasses,3))}))

delivers a huge error message but I guess this part is relevant :

InvalidArgumentError (see above for traceback): Incompatible shapes: [19,3] vs. [57,19]
 [[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_Placeholder_1_0, Log)]]

So since my batch size is tree times the number of classes, I should get 57 predictions-> Y_ .

Shaping the feeding of Y_ [57,1]

for j in range(len(batch_xs)-1): print(sess.run(train_step,feed_dict={x:batch_xs[j],y_:np.reshape(batch_ys[j],(batchSize,1))}))

the print delivers None as return value but no error which is (I guess) ok.

But running the accuracy part delivers 1 and 0 as mentioned in the beginning.

the Test and Train Data and labels are 100% correct !

here is part of the end of CSV file :

enter image description here

2
I'm not sure but I feel there s something strange with y_:np.reshape(batch_tys[i],(1,batchSize)): should y_ not be of shape (batch_size, numOfClasses) instead ? Since batch_size == numOfClasses incidently, that may not raise any exception, but just not perform what you think at all, like comparing all y's to the same y_ (not one different per sample)... If your network is not trained enough and outputs always the same class for instance (big biases), it would explain the 0 or 1 accuracy. - gdelab
@gdelab thanks a lot for your comment, here is how I understand the whole thing. The train data is X and the predictionS of the training will be put out in Y and I'm using Y_ to evaluate how good the training is by reading the labels from the file. I got your point how may solve this ? by make the batch size bigger ? - Engine
Y_ just needs to be of the right shape... If Y_ is given in 1-hot vectors form (for each sample, so 2-D matrix), then it should be of shape (batch_size, numOfClasses) just like Y, and you only need to change your reshapes accordingly in the feed_dicts (to set the shape to (batch_size, numOfClasses)). If it is given as the class index for each sample (1-D vector of length batch_size, containing numbers from 0 to numClasses- 1), then you should change the reshape (to (batch_size, 1)), and replace argmax(y_) by y_. - gdelab
And to be sure you don't miss exceptions that could be useful, I suggest you use a batch_size different from numOfClasses , for example 17 and 19. - gdelab
Y_ is onehot coded and reshaping is woking for batchsize = numberofClass, and I'm not reading tell the end to avoid exception for now. I've tried to change make a bigg batch size but I get shape errors ! - Engine

2 Answers

1
votes

It can't be the source of your problem or errors, but I think all occurrences of row[:-2] should be replaced by row[:-1], if you want to take all indices but one (Python excludes the index end given in a range like row[begin:end])

You should have:

y_ = tf.placeholder(tf.float32,[None,numberOFClasses])
...
sess.run(train_step,feed_dict={x:batch_xs[i],y_:np.reshape(batch_ys[i],(batchSize, numberOFClasses ))})
...
print(sess.run(accuracy,feed_dict={x:batch_txs[i],y_:np.reshape(batch_tys[i],(batchSize, numberOFClasses ))}))

Anyway, you should definetly use batch_size != numberOFClasses, because it throws an error that you can use to understand what is wrong in your code. If you don't, you lose the exception message but the error is still there, hidden (you network still does not learn what you want). When you get the error look which reshapecauses a problem, and try to understand why (look what the shapes are and should be)

0
votes

From your code sample it's impossible to tell exactly (at least batch() and batchSize are missing to be sure), but my guess is that you have batches of size one (whether intended or not), and so you get either an accuracy of one (if the sample was predicted correctly) or zero (if the sample was misclassified). For meaningful accuracies, you want to evaluate over larger batches.