3
votes

I had created a learning/model fit notebook on Jupyter, and was using this on my gaming laptop for the past year, with little issue.

Since I am now increasing the training dataset 10-fold, I wanted to move the Jupyter notebook to AWS Sagemaker, so that it can provide the extra horsepower, and so that I don't have to leave my laptop on my desk open an unusable until the training is completed.

I created the Sagemaker instance, and opened the Jupyter notebook. Using the code and original data, that ran within 3 hours on my laptop, I attemtped to run the cells in the notebook, to get an overall time, so I can pick the correct hardware scenario for my larger runs.

Each and every time, I attempt to run the notebook, it crashes the browser. I have tried Chrome and Firefox on both Windows 10, and Ubuntu 16.04 laptops.

I can't figure out how to do two things I believe might help.
1) review the jupyter notebook server code log. I created the lifecycle create / start scripts, and then view the logs created via Cloudwatch, nothing regarding the crashes there. 2) review the log inside the browser. I have opened the developer mode on both, but once it crashes with an "Aw snap" etc.., I can no longer do anything in the window, so I can see no output.

Here is the piece of code attempting to run. I have tried with both show_metric=True and False:

from datetime import datetime
start_time=datetime.now().strftime("%Y-%m-%d %H:%M")
tf.reset_default_graph()
# Build neural network
phr_net = tflearn.input_data(shape=[None, len(phr_train_x[0])])
phr_net = tflearn.fully_connected(phr_net, 8)
phr_net = tflearn.fully_connected(phr_net, 8)
phr_net = tflearn.fully_connected(phr_net, len(phr_train_y[0]), activation='softmax')
phr_net = tflearn.regression(phr_net)

# Define model and setup tensorboard
phr_model = tflearn.DNN(phr_net, tensorboard_dir='phr_tflearn_logs')
# Start training (apply gradient descent algorithm)
phr_model.fit(phr_train_x, phr_train_y, n_epoch=EPOCH_RUN_LENGTH, batch_size=8, show_metric=True)
phr_model.save('model.phr_tflearn')
print("start: ", start_time, "end: ", datetime.now().strftime("%Y-%m-%d %H:%M"))

I am a good googler, and did not find anything to help. The AWS documentation just sends me in circles. Anyone have any advice?

1
Which part of AWS documentation is not clear? If you can point to us we can improve it. Thank you.Pranav Chiplunkar

1 Answers

0
votes

Thanks for using Amazon SageMaker. I would suggest to open an AWS Forum Post under AWS SageMaker https://forums.aws.amazon.com/forum.jspa?forumID=285&start=0 so that the SageMaker team can work with you to know more about what instance type you are using, your Notebook Instance Arn, etc.