0
votes

As you know tf.one_hot can do the one hot encoding. However, when my dataset is very large, I need to do batch trainning. In this way, when i use a for loop to loop over all batches, in each iteration, when i do tf.one_hot, the dimension of one hot matrix will be smaller than i expected.

For example, for column 'a' we have 47 categories, but in one batch their might be only 20 shown, and when i do one_hot on this batch, it will create a matrix with dimension of rows * 20 instead of a dimension of rows * 47.

How to get a dimension of rows * 47 one hot matrix in each batch?

Thank you!

1

1 Answers

1
votes

tf.one_hot() takes an argument, depth, as its second, that determines how long the one-hot vector should be. If you run your operation like this:

b = tf.one_hot( a, 47 )

it should give you a last dimension of 47.

Tough to say without the code, but some people don't hard code the one_hot size, but try to get it from the label tensor, with something like

max_class = tf.reduce_max( a )
b = tf.one_hot( a, max_class )

If that is the case in your code, then maybe a batch only went up to class 20.

Otherwise need to see your code to say something.

If TensorFlow is running out of memory, it will stop with an error, won't just silently bite off half of your data. :)