Using tf.data.experimental.CsvDataset in Keras Sequential Model

Question

I'm trying to train a Keras model with my structured input data stored in csv files. I' reading files as

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow import feature_column

import pathlib

csvs =  sorted(str(p) for p in pathlib.Path('.').glob("My_Dataset/*/*/*.csv"))

data_set=tf.data.experimental.CsvDataset(
    csvs, record_defaults=defaults, compression_type=None, buffer_size=None,
    header=True, field_delim=',', use_quote_delim=True, na_value=""
)
print(type(data_set))

#Output: <class 'tensorflow.python.data.experimental.ops.readers.CsvDatasetV2'>

data_set.take(1)

#Output: <TakeDataset shapes: ((), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), ()), types: (tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32)>

validate_ds = data_set.batch(1000).take(20).repeat()
train_ds = data_set.batch(1000).skip(20).take(80).repeat()

model = tf.keras.Sequential([
    layers.Dense(49,activation='elu'),  
    layers.Dense(49,activation='elu'),  
    layers.Dense(49,activation='elu'),  
    layers.Dense(1,activation='sigmoid') 
])


model.compile(optimizer='adam',
            loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
            metrics=['accuracy'])    #have to find the related evaluation metrics


model.fit(train_ds,
        validation_data=validate_ds,
        validation_steps=5,
        steps_per_epoch= 5,
        epochs=20,
        verbose=1
        )

But when I compile the model, I get this error:

ValueError: in user code: ValueError: Data is expected to be in format x, (x,), (x, y), or (x, y, sample_weight), found: (<tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>, <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=float32>, <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=float32>, ..... <tf.Tensor 'ExpandDims_49:0' shape=(None, 1) dtype=float32>)

I'm just stuck... Please help!

Edit

As per the answer by Nikaido, I managed to fix the syntactical errors, but now, I'm getting zero accuracy on model training. Which is very unlikely. At least I know there is no problem with my dataset in the csv files. I have checked on the same model using Dataframe. But the issue is I have a large dataset, and now I have to configure my input pipelines to load the dataset from the disk.

Nikaido Nikaido · Accepted Answer · 2020-11-14T20:04:35

You didn't specify the target in your csv. The model fit expects couples (x, y)

if your csv has the target as last values, you can create a function like this one:

def preprocess(*fields):
    return tf.stack(fields[:-1]), tf.stack(fields[-1:]) # x, y

to split the dataset in data and labels

validate_ds = dataset.map(preprocess).take(1000).batch(32).repeat()
train_ds = dataset.map(preprocess).skip(1000).take(1000).batch(32).repeat()

Regarding the results, I suppose that the problem is that you are trying to make a classification on a target value that is a float

(All you values in the CSV are floats)

What kind of dataset do you have and what you need to do?

Anyway for this specific update I suggest you to open a new question, because it is a totally different problem

Using tf.data.experimental.CsvDataset in Keras Sequential Model

1 Answers