1
votes

I am new to Tensorflow. I am trying to build and serve a model using Estimator on Google ML Engine. However, I am not sure how I can save the model for serving after trying a few ways.

I have successfully trained the model with acceptable accuracy. When I was trying to save the model for serving, I searched around and found a few ways to do so. However, I still ran into a number of problems...

I tried 3 ways of exporting based on suggestions made for a few other questions posted:

1) Getting a serialized example as input - I ran into an error "TypeError: Object of type bytes is not JSON serializable". Also, I couldn't find a good way to feed a serialized example for serving effectively. As I am using ML Engine for serving, it seems it would be easier to use a JSON input.

2) Getting a JSON as input with "basic" pre-processing - I was able to successfully export the model. After loading the model onto ML Engine, I tried making a few predictions. Although a prediction result was returned, I found that, no matter how I change the JSON inputs, the same result was returned. I looked at the validation results obtained during the training. The model should be able to return variety of results. I thought there is something wrong with the pre-processing within the serving function, so I tried the third way...

3) JSON input with the "same" pre-processing - I couldn't get my head around this, but I think it might be needed to do exactly the same pre-processing as how I process my data during model training. However, as the serving input function makes use of tf.placeholders, I have no idea how I could replicate the same pre-processing to make the exported model works...

(Please pardon my bad coding style...)


Training code:

col_names = ['featureA','featureB','featureC']
target_name = 'langIntel'

col_def = {}
col_def['featureA'] = {'type':'float','tfType':tf.float32,'len':'fixed'}
col_def['featureB'] = {'type':'int','tfType':tf.int64,'len':'fixed'}
col_def['featureC'] = {'type':'bytes','tfType':tf.string,'len':'var'}


def _float_feature(value):
    if not isinstance(value, list): value = [value]
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))

def _int_feature(value):
    if not isinstance(value, list): value = [value]
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

def _bytes_feature(value):
    if not isinstance(value, list): value = [value]
    return tf.train.Feature(
        bytes_list=tf.train.BytesList(
            value=[p.encode('utf-8') for p in value]
        )
    )

functDict = {'float':_float_feature,
    'int':_int_feature,'bytes':_bytes_feature
}

training_targets = []
# Omitted validatin partition


with open('[JSON FILE PATH]') as jfile:
    json_data_input = json.load(jfile)

random.shuffle(json_data_input)


with tf.python_io.TFRecordWriter('savefile1.tfrecord') as writer:
    for item in json_data_input:
        if item[target_name] > 0:
            feature = {}

            for col in col_names:
                feature[col] = functDict[col_def[col]['type']](item[col])

            training_targets.append(item[target_name])

            example = tf.train.Example(
                features=tf.train.Features(feature=feature)
            )
            writer.write(example.SerializeToString())


def _parse_function(example_proto):
        example = {}

        for col in col_names:
            if col_def[col]['len'] == 'fixed':
                example[col] = tf.FixedLenFeature([], col_def[col]['tfType'])
            else:
                example[col] = tf.VarLenFeature(col_def[col]['tfType'])

        parsed_example = tf.parse_single_example(example_proto, example)

        features = {}

        for col in col_names:
            features[col] = parsed_example[col]

        labels = parsed_example.get(target_name)

        return features, labels


def my_input_fn(batch_size=1,num_epochs=None):
    dataset = tf.data.TFRecordDataset('savefile1.tfrecord')

    dataset = dataset.map(_parse_function)
    dataset = dataset.shuffle(10000)
    dataset = dataset.repeat(num_epochs)
    dataset = dataset.batch(batch_size)
    iterator = dataset.make_one_shot_iterator()
    features, labels = iterator.get_next()

    return features, labels

allColumns = None

def train_model(
    learning_rate,
    n_trees,
    n_batchespl,
    batch_size):

    periods = 10

    vocab_list = ('vocab1', 'vocab2', 'vocab3')

    featureA_bucket = tf.feature_column.bucketized_column(
        tf.feature_column.numeric_column(
            key="featureA",dtype=tf.int64
            ), [5,10,15]
    )
    featureB_bucket = tf.feature_column.bucketized_column(
        tf.feature_column.numeric_column(
            key="featureB",dtype=tf.float32
        ), [0.25,0.5,0.75]
    )
    featureC_cat = tf.feature_column.indicator_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            key="featureC",vocabulary_list=vocab_list,
            num_oov_buckets=1
        )
    )


    theColumns = [featureA_bucket,featureB_bucket,featureC_cat]

    global allColumns
    allColumns = theColumns

    regressor = tf.estimator.BoostedTreesRegressor(
        feature_columns=theColumns,
        n_batches_per_layer=n_batchespl,
        n_trees=n_trees,
        learning_rate=learning_rate
    )

    training_input_fn = lambda: my_input_fn(batch_size=batch_size,num_epochs=5)
    predict_input_fn = lambda: my_input_fn(num_epochs=1)

    regressor.train(
        input_fn=training_input_fn
    )

    # omitted evaluation part

    return regressor

regressor = train_model(
    learning_rate=0.05,
    n_trees=100,
    n_batchespl=50,
    batch_size=20)

Export Trial 1:

def _serving_input_receiver_fn():
    serialized_tf_example = tf.placeholder(dtype=tf.string, shape=None, 
        name='input_example_tensor'
    )

    receiver_tensors = {'examples': serialized_tf_example}
    features = tf.parse_example(serialized_tf_example, feature_spec)
    return tf.estimator.export.ServingInputReceiver(features, 
        receiver_tensors
    )

servable_model_dir = "[OUT PATH]"
servable_model_path = regressor.export_savedmodel(servable_model_dir,
    _serving_input_receiver_fn
)

Export Trial 2:

def serving_input_fn():
    feature_placeholders = {
        'featureA': tf.placeholder(tf.int64, [None]),
        'featureB': tf.placeholder(tf.float32, [None]),
        'featureC': tf.placeholder(tf.string, [None, None])
    }

    receiver_tensors = {'inputs': feature_placeholders}

    feature_spec = tf.feature_column.make_parse_example_spec(allColumns)

    features = tf.parse_example(feature_placeholders, feature_spec)
    return tf.estimator.export.ServingInputReceiver(features, 
        feature_placeholders
    )

servable_model_dir = "[OUT PATH]"
servable_model_path = regressor.export_savedmodel(
    servable_model_dir, serving_input_fn
)

Export Trial 3:

def serving_input_fn():
    feature_placeholders = {
        'featureA': tf.placeholder(tf.int64, [None]),
        'featureB': tf.placeholder(tf.float32, [None]),
        'featureC': tf.placeholder(tf.string, [None, None])
    }    

    def toBytes(t):
        t = str(t)
        return t.encode('utf-8')

    tmpFeatures = {}

    tmpFeatures['featureA'] = tf.train.Feature(
        int64_list=feature_placeholders['featureA']
    )
    # TypeError: Parameter to MergeFrom() must be instance
    # of same class: expected tensorflow.Int64List got Tensor.
    tmpFeatures['featureB'] = tf.train.Feature(
        float_list=feature_placeholders['featureB']
    )
    tmpFeatures['featureC'] = tf.train.Feature(
        bytes_list=feature_placeholders['featureC']
    )

    tmpExample = tf.train.Example(
        features=tf.train.Features(feature=tmpFeatures)
    )
    tmpExample_proto = tmpExample.SerializeToString()

    example = {}

    for key, tensor in feature_placeholders.items():
        if col_def[key]['len'] == 'fixed':
            example[key] = tf.FixedLenFeature(
                [], col_def[key]['tfType']
            )
        else:
            example[key] = tf.VarLenFeature(
                col_def[key]['tfType']
            )

    parsed_example = tf.parse_single_example(
        tmpExample_proto, example
    )

    features = {}

    for key in tmpFeatures.keys():
        features[key] = parsed_example[key]

    return tf.estimator.export.ServingInputReceiver(
        features, feature_placeholders
    )

servable_model_dir = "[OUT PATH]"
servable_model_path = regressor.export_savedmodel(
    servable_model_dir, serving_input_fn
)

How should the serving input function be structured in order for a JSON file to be inputted for prediction? Many thanks for any insights!

1

1 Answers

0
votes

Just to provide an update - I still wasn't able to get the export done. I have then rebuilt the training models using Keras and have successfully exported the models for serving (rebuilding the models probably used less of my time figuring out how to export an estimator model in my case...)