We want to deploy a trained Tensorflow Model to AWS Sagemaker for inference with a tensorflow-serving-container. Tensorflow version is 2.1. Following the guide at https://github.com/aws/sagemaker-tensorflow-serving-container the following steps have been taken:
- Build TF 2.1 AMI and publish it to AWS ECR after sucessful local testing
- Setting Sagemaker Execution Role Permissions for S3 and ECR.
- Pack saved TF model folder (saved_model.pb, assets, variables) into model.tar.gz
- Created endpoint with realtime predictor:
import os
import sagemaker
from sagemaker.tensorflow.serving import Model
from sagemaker.tensorflow.model import TensorFlowModel
from sagemaker.predictor import json_deserializer, json_serializer, RealTimePredictor
from sagemaker.content_types import CONTENT_TYPE_JSON
def create_tfs_sagemaker_model():
sagemaker_session = sagemaker.Session()
role = 'arn:aws:iam::XXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXX
bucket = 'tf-serving'
prefix = 'sagemaker/tfs-test'
s3_path = 's3://{}/{}'.format(bucket, prefix)
image = 'XXXXXXXX.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-tensorflow-serving:2.1.0-cpu'
model_data = sagemaker_session.upload_data('model.tar.gz', bucket, os.path.join(prefix, 'model'))
endpoint_name = 'tf-serving-ep-test-1'
tensorflow_serving_model = Model(model_data=model_data, role=role, sagemaker_session=sagemaker_session, image=image, framework_version='2.1')
tensorflow_serving_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)
rt_predictor = RealTimePredictor(endpoint=endpoint_name, sagemaker_session=sagemaker_session, serializer=json_serializer, content_type=CONTENT_TYPE_JSON, accept=CONTENT_TYPE_JSON)
- Create batch-transform job:
def create_tfs_sagemaker_batch_transform():
sagemaker_session = sagemaker.Session()
print(sagemaker_session.boto_region_name)
role = 'arn:aws:iam::XXXXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXXX'
bucket = 'XXXXXXX-tf-serving'
prefix = 'sagemaker/tfs-test'
image = 'XXXXXXXXXX.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-tensorflow-serving:2.1.0-cpu'
s3_path = 's3://{}/{}'.format(bucket, prefix)
model_data = sagemaker_session.upload_data('model.tar.gz', bucket, os.path.join(prefix, 'model'))
tensorflow_serving_model = Model(model_data=model_data, role=role, sagemaker_session=sagemaker_session, image=image, name='deep-net-0', framework_version='2.1')
print(tensorflow_serving_model.model_data)
out_path = 's3://XXXXXX-serving-out/'
input_path = "s3://XXXXXX-serving-in/"
tensorflow_serving_transformer = tensorflow_serving_model.transformer(instance_count=1, instance_type='ml.c4.xlarge', accept='application/json', output_path=out_path)
tensorflow_serving_transformer.transform(input_path, content_type='application/json')
Both steps 4 and 5 are running and in the AWS Cloudwatch logs we see successful starting of the instances, loading of the model and TF-Serving entering the event loop – see below:
2020-07-08T17:07:16.156+02:00 INFO:main:starting services
2020-07-08T17:07:16.156+02:00 INFO:main:nginx config:
2020-07-08T17:07:16.156+02:00 load_module modules/ngx_http_js_module.so;
2020-07-08T17:07:16.156+02:00 worker_processes auto;
2020-07-08T17:07:16.156+02:00 daemon off;
2020-07-08T17:07:16.156+02:00 pid /tmp/nginx.pid;
2020-07-08T17:07:16.157+02:00 error_log /dev/stderr error;
2020-07-08T17:07:16.157+02:00 worker_rlimit_nofile 4096;
2020-07-08T17:07:16.157+02:00 events { worker_connections 2048;
2020-07-08T17:07:16.157+02:00 }
2020-07-08T17:07:16.162+02:00 http { include /etc/nginx/mime.types; default_type application/json; access_log /dev/stdout combined; js_include tensorflow-serving.js; upstream tfs_upstream { server localhost:10001; } upstream gunicorn_upstream { server unix:/tmp/gunicorn.sock fail_timeout=1; } server { listen 8080 deferred; client_max_body_size 0; client_body_buffer_size 100m; subrequest_output_buffer_size 100m; set $tfs_version 2.1; set $default_tfs_model None; location /tfs { rewrite ^/tfs/(.*) /$1 break; proxy_redirect off; proxy_pass_request_headers off; proxy_set_header Content-Type 'application/json'; proxy_set_header Accept 'application/json'; proxy_pass http://tfs_upstream; } location /ping { js_content ping; } location /invocations { js_content invocations; } location /models { proxy_pass http://gunicorn_upstream/models; } location / { return 404 '{"error": "Not Found"}'; } keepalive_timeout 3; }
2020-07-08T17:07:16.162+02:00 }
2020-07-08T17:07:16.162+02:00 INFO:tfs_utils:using default model name: model
2020-07-08T17:07:16.162+02:00 INFO:tfs_utils:tensorflow serving model config:
2020-07-08T17:07:16.162+02:00 model_config_list: { config: { name: "model", base_path: "/opt/ml/model", model_platform: "tensorflow" }
2020-07-08T17:07:16.162+02:00 }
2020-07-08T17:07:16.162+02:00 INFO:main:using default model name: model
2020-07-08T17:07:16.162+02:00 INFO:main:tensorflow serving model config:
2020-07-08T17:07:16.163+02:00 model_config_list: { config: { name: "model", base_path: "/opt/ml/model", model_platform: "tensorflow" }
2020-07-08T17:07:16.163+02:00 }
2020-07-08T17:07:16.163+02:00 INFO:main:tensorflow version info:
2020-07-08T17:07:16.163+02:00 TensorFlow ModelServer: 2.1.0-rc1+dev.sha.075ffcf
2020-07-08T17:07:16.163+02:00 TensorFlow Library: 2.1.0
2020-07-08T17:07:16.163+02:00 INFO:main:tensorflow serving command: tensorflow_model_server --port=10000 --rest_api_port=10001 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0
2020-07-08T17:07:16.163+02:00 INFO:main:started tensorflow serving (pid: 13)
2020-07-08T17:07:16.163+02:00 INFO:main:nginx version info:
2020-07-08T17:07:16.163+02:00 nginx version: nginx/1.18.0
2020-07-08T17:07:16.163+02:00 built by gcc 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
2020-07-08T17:07:16.163+02:00 built with OpenSSL 1.1.1 11 Sep 2018
2020-07-08T17:07:16.163+02:00 TLS SNI support enabled
2020-07-08T17:07:16.163+02:00 configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.18.0/debian/debuild-base/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'
2020-07-08T17:07:16.163+02:00 INFO:main:started nginx (pid: 15)
2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.075708: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.075760: I tensorflow_serving/model_servers/server_core.cc:573] (Re-)adding model: model
2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180755: I tensorflow_serving/util/retrier.cc:46] Retrying of Reserving resources for servable: {name: model version: 1} exhausted max_num_retries: 0
2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180887: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: model version: 1}
2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180919: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 1}
2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180944: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 1}
2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.180995: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /opt/ml/model/1
2020-07-08T17:07:16.163+02:00 2020-07-08 15:07:15.205712: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-08T17:07:16.164+02:00 2020-07-08 15:07:15.205825: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /opt/ml/model/1
2020-07-08T17:07:16.164+02:00 2020-07-08 15:07:15.208599: I external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2020-07-08T17:07:16.164+02:00 2020-07-08 15:07:15.328057: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.578796: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /opt/ml/model/1
2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.626494: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 1445495 microseconds.
2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.630443: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /opt/ml/model/1/assets.extra/tf_serving_warmup_requests
2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.632461: I tensorflow_serving/util/retrier.cc:46] Retrying of Loading servable: {name: model version: 1} exhausted max_num_retries: 0
2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.632484: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: model version: 1}
2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.634727: I tensorflow_serving/model_servers/server.cc:362] Running gRPC ModelServer at 0.0.0.0:10000 ...
2020-07-08T17:07:17.165+02:00 [warn] getaddrinfo: address family for nodename not supported
2020-07-08T17:07:17.165+02:00 2020-07-08 15:07:16.635747: I tensorflow_serving/model_servers/server.cc:382] Exporting HTTP/REST API at:localhost:10001 ...
2020-07-08T17:07:17.165+02:00 [evhttp_server.cc : 238] NET_LOG: Entering the event loop …
But both (endpoint and batch transform) fail the Sagemaker Ping Health check with:
2020-07-08T17:07:32.169+02:00 2020/07/08 15:07:31 [error] 16#16: *1 js: failed ping{ "error": "Could not find any versions of model None" }
2020-07-08T17:07:32.170+02:00 169.254.255.130 - - [08/Jul/2020:15:07:31 +0000] "GET /ping HTTP/1.1" 502 157 "-" "Go-http-client/1.1"
Also, when tested locally with self built docker tf-serving-container the model is running without problems and can be queried with curl. What could be the issue?