I am running a Python script with Tensorflow in Amazon Sagemaker notebook instance. I have no trouble writing to the storage in the notebook normally, but for some reason I am unsuccessful when trying to save Tensorflow model checkpoints. This code previously worked before it was ported to Sagemaker.
Below is a reduced version of my code:
bucket = 'sagemaker-complaints-data'
prefix = 'DeepTestV2' # place to upload training files within the bucket
timestamp = str(int(time()))
out_dir = os.path.abspath(os.path.join(bucket, prefix, "runs", timestamp))
checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints"))
checkpoint_prefix = os.path.join(checkpoint_dir, "model")
path = saver.save(sess, checkpoint_prefix, global_step=current_step)
print("Saved model checkpoint to {}\n".format(path))
No errors are being thrown and the print statement is outputting the correct path. I have researched whether there are any known issues with using checkpoints in Sagemaker but have come across literally no posts describing this.