1
votes

I am using TensorFlow 2.0 and Python 3.7 for CIFAR-10 classification.

Dimensions of training and testing sets are:

X_train.shape = (50000, 32, 32, 3), y_train.shape = (50000, 10)

X_test.shape = (10000, 32, 32, 3), y_test.shape = (10000, 10)

But, when I execute the following code:

# Create training and testing datasets-
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
test_dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test))

It gives me the error:

--------------------------------------------------------------------------- InvalidArgumentError Traceback (most recent call last) in 1 # Create training and testing datasets- ----> 2 train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train)) 3 test_dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test))

~/.local/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in from_tensor_slices(tensors) 433 Dataset: A Dataset. 434 """ --> 435 return TensorSliceDataset(tensors) 436 437 class _GeneratorState(object):

~/.local/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in init(self, element) 2352 def init(self, element):
2353 """See Dataset.from_tensor_slices() for details.""" -> 2354 element = structure.normalize_element(element) 2355 batched_spec = structure.type_spec_from_value(element) 2356
self._tensors = structure.to_batched_tensor_list(batched_spec, element)

~/.local/lib/python3.7/site-packages/tensorflow_core/python/data/util/structure.py in normalize_element(element) 109 else: 110 normalized_components.append( --> 111 ops.convert_to_tensor(t, name="component_%d" % i)) 112 return nest.pack_sequence_as(element, normalized_components) 113

~/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype, dtype_hint) 1182 preferred_dtype = deprecation.deprecated_argument_lookup(
1183 "dtype_hint", dtype_hint, "preferred_dtype", preferred_dtype) -> 1184 return convert_to_tensor_v2(value, dtype, preferred_dtype, name) 1185 1186

~/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in convert_to_tensor_v2(value, dtype, dtype_hint, name) 1240
name=name, 1241 preferred_dtype=dtype_hint, -> 1242 as_ref=False) 1243 1244

~/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx, accept_composite_tensors) 1294 1295
if ret is None: -> 1296 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 1297 1298 if ret is NotImplemented:

~/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py in _default_conversion_function(failed resolving arguments) 50 def _default_conversion_function(value, dtype, name, as_ref): 51 del as_ref # Unused. ---> 52 return constant_op.constant(value, dtype, name=name) 53 54

~/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in constant(value, dtype, shape, name) 225 """ 226 return _constant_impl(value, dtype, shape, name, verify_shape=False, --> 227 allow_broadcast=True) 228 229

~/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast) 233 ctx = context.context() 234 if ctx.executing_eagerly(): --> 235 t = convert_to_eager_tensor(value, ctx, dtype) 236 if shape is None: 237 return t

~/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype) 93 except AttributeError: 94 dtype = dtypes.as_dtype(dtype).as_datatype_enum ---> 95 ctx.ensure_initialized() 96 return ops.EagerTensor(value, ctx.device_name, dtype) 97

~/.local/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py in ensure_initialized(self) 490 if self._default_is_async == ASYNC: 491 pywrap_tensorflow.TFE_ContextOptionsSetAsync(opts, True) --> 492 self._context_handle = pywrap_tensorflow.TFE_NewContext(opts) 493 finally: 494 pywrap_tensorflow.TFE_DeleteContextOptions(opts)

InvalidArgumentError: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0

I also tried the following code to fix the error:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" 

When I execute the following code:

gpus = tf.config.experimental.list_physical_devices('GPU')

gpus
[]

How do I fix this?

Thanks

1

1 Answers

0
votes

I have also encountered the same issues with you. The cause might be that the GPU device your progress selected doesn't have enough gpu-memory to run your progress.

My solution is:

  1. Add CUDA_VISIBLE_DEVICES=* to your training/testing script (.sh file). Example: CUDA_VISIBLE_DEVICES=0 python3 train_cifar.py
  2. Add this line in your training.py file: os.environ['CUDA_VISIBLE_DEVICES'] = '0'

Tips: the GPU ids in step-1 and step-2 must be matched.

(When I first met this error, I also had os.environ['CUDA_VISIBLE_DEVICES']='0' in my code, but the process seems to still select device cuda:1 for training, wired... But adopting step-1 solves my problem, I hope it can solve yours too.)