0
votes

The code:

import numpy as np
import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split

x_data = np.linspace(0, 1000000, 1000)
y_true = np.sin(x_data)
y_true += np.random.randn(len(x_data))




feature_columns = [tf.feature_column.numeric_column('x', shape=[1])]
estimator = tf.estimator.DNNRegressor(feature_columns=feature_columns, hidden_units=[10,10,10], optimizer=lambda:
                                  tf.train.AdamOptimizer(
                                      learning_rate=0.1
                                  ))


X_train, X_test, y_train, y_test = train_test_split(x_data, y_true,    test_size=0.3)

input_function = tf.estimator.inputs.numpy_input_fn({'x': X_train},y_train,
                                                batch_size=8,     num_epochs=None,
                                                shuffle=True)

train_input_function = tf.estimator.inputs.numpy_input_fn({'x': X_train},y_train,
                                                      batch_size=8, num_epochs=1000,
                                                      shuffle=False)
test_input_function = tf.estimator.inputs.numpy_input_fn({'x': X_test},y_test,
                                                     batch_size=8, num_epochs=1000,
                                                     shuffle=False)


estimator.train(input_fn=input_function, steps=1000)

train_metrics = estimator.evaluate(input_fn=train_input_function, steps=1000)
test_metrics = estimator.evaluate(input_fn=test_input_function, steps=1000)


print('TRAINING DATA METRICS')
print(train_metrics)
print()

print('TEST DATA METRICS')
print(test_metrics)
print()

Works very well. But if I change line y_true = np.sin(x_data) into y_true=tf.square(x_data), I get an error:

Traceback (most recent call last): File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1576, in _create_c_op c_op = c_api.TF_FinishOperation(op_desc) tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 2 for 'strided_slice' (op: 'StridedSlice') with input shapes: [1000], [1,700], [1,700], [1].

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:/Users/Admin/Documents/PycharmProjects/TF_API_2/api.py", line 21, in X_train, X_test, y_train, y_test = train_test_split(x_data, y_true, test_size=0.3) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\model_selection_split.py", line 2059, in train_test_split safe_indexing(a, test)) for a in arrays)) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\model_selection_split.py", line 2059, in safe_indexing(a, test)) for a in arrays)) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils__init__.py", line 162, in safe_indexing return X[indices] File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 524, in _slice_helper name=name) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 690, in strided_slice shrink_axis_mask=shrink_axis_mask) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 10187, in strided_slice name=name) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func return func(*args, **kwargs) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op op_def=op_def) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1731, in init control_input_ops) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1579, in _create_c_op raise ValueError(str(e)) ValueError: Shape must be rank 1 but is rank 2 for 'strided_slice' (op: 'StridedSlice') with input shapes: [1000], [1,700], [1,700], [1].

If I use **2 instead of tf.square, the code cannot compiled too, with error: ERROR:tensorflow:Model diverged with loss = NaN. Traceback (most recent call last):

File "C:/Users/Admin/Documents/PycharmProjects/TF_API_2/api.py", line 35, in estimator.train(input_fn=input_function, steps=1000) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 376, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1145, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1173, in _train_model_default saving_listeners) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1451, in _train_with_estimator_spec _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 583, in run run_metadata=run_metadata) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1059, in run run_metadata=run_metadata) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1150, in run raise six.reraise(*original_exc_info) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\six.py", line 693, in reraise raise value File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1135, in run return self._sess.run(*args, **kwargs) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1215, in run run_metadata=run_metadata)) File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\basic_session_run_hooks.py", line 635, in after_run raise NanLossDuringTrainingError tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.

What is the problem with this one line [y_true = tf.square(x_data)]?

1
Don't mix numpy and tensorflow operations. I'd suggest y_true=np.square(x_data) instead.cs95
OK. But still get same "tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training." error.mikinoqwert
Looks like you ran into numeric overflow. Your numbers are just too big. If you are trying to teach the network how to square numbers, you may want to try it on smaller numbers.cs95
Oh. Right. I changed x_data to x_data=np.linspace(0, 1000, 1000) and the problem does not occur. Please add your last comment as answer, so I will be able to mark this as solution.mikinoqwert
@coldspeed Please tell me, why program outputs that big loss: TRAINING DATA METRICS {'average_loss': 13975338000.0, 'label/mean': 349618.28, 'loss': 111802700000.0, 'prediction/mean': 359011.06, 'global_step': 1000} TEST DATA METRICS {'average_loss': 12280204000.0, 'label/mean': 293979.97, 'loss': 98241634000.0, 'prediction/mean': 325393.22, 'global_step': 1000}mikinoqwert

1 Answers

1
votes

There are two distinct issues here:

#1, don't mix numpy and tensorflow operations together. Unless you're evaluating your graph in eager execution mode, they almost always never go together.

#2, when your network produces NaNs after a few iterations, that's usually a good sign you're running into numeric overflow/underflow. In this case, the culprit is x_data which has input too large. Either normalize it (0-1) or reduce the range of the generated data (how about np.random.randint?).