AWS SageMaker - Realtime Data Processing

4

votes

My company does online consumer behavior analysis and we do realtime predictions using the data we collected from various websites (with our java script embedded).

We have been using AWS ML for real time prediction but now we are experimenting with AWS SageMaker it occurred to us that the realtime data processing is a problem compared to AWS ML. For example we have some string variables that AWS ML can convert to numerics and use them for real time prediction in AWS ML automatically. But it does not look like SageMaker can do it.

Does anyone have any experience with real time data processing and prediction in AWS SageMaker?

predictionamazon-sagemaker

1

votes

Yes it can! you have to create a Pipeline (Preprocess + model + Postprocess) and deploy it as endpoint for real time inference. you can double check the inference example in sagemaker github site. it's using sagemaker-python-sdk to train and deploy. 1: This is for small data sklearn model.

https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/scikit_learn_inference_pipeline

2: it also support big data (spark ML Pipeline serving container), you can also find the example in its official github.

4

votes

It sounds like you're only familiar with the training component of SageMaker. SageMaker has several different components:

Jupyter Notebooks
Labeling
Training
Inference

You're most likely dealing with #3 and #4. There are a few ways to work with SageMaker here. You can use one of the built-in algorithms which provide both training and inference containers that can be launched on SageMaker. To use these you can work entirely from the console and just point at your data in S3, similar to AWS ML. If you're not using the built-in algos then you can use the sagemaker-python-sdk to create both training and prediction containers if you're using a common framework like tensorflow, mxnet, pytorch, or others. Finally, if you're using a super custom algorithm (which you weren't if you're porting from AWS ML) then you can bring your own docker container for training and for inference.

To create an inference endpoint you can go to the console under the inference section and click around to build your endpoint. See the gif here for an example:

Beyond that if you want to use code to invoke the endpoint in real time you can use any of the AWS SDKs, I'll demonstrate with the python SDK boto3 here:

import boto3
sagemaker = boto3.client("runtime.sagemaker")
response = sagemaker.invoke_endpoint(EndpointName="herpderp", Body="some content")

In this code if you needed to convert the incoming string values to numerical values then you could easily do that with the code.

0

votes

In this case, you will need to preprocess your data before feeding it into the InvokeEndpoint request body. If you use python, you can use int('your_integer_string') or float('your_float_string') to convert a string to an integer or float. If you use java, you can use Integer.parseInt("yourIntegerString") or Long.parseLong("yourLongString") or Double.parseDouble("yourDoubleString") or Float.parseFloat("yourFloatString").

Hope this helps!

-Han

0

votes

AWS SageMaker is a robust machine learning service in AWS that manages every major aspect of machine learning implementation, including data preparation, model construction, training and fine-tuning, and deployment.

Preparation

SageMaker uses a range of resources to make it simple to prepare data for machine learning models, even though it comes from many sources or is in a variety of formats.

It's simple to mark data, including video, images, and text, that's automatically processed into usable data, with SageMaker Ground Truth. GroundWork will process and merge this data using auto-segmentation and a suite of tools to create a single data label that can be used in machine learning models. AWS, in conjunction with SageMaker Data Wrangler and SageMaker Processing, reduces a data preparation phase that may take weeks or months to a matter of days, if not hours.

Build

SageMaker Studio Notebooks centralize everything relevant to your machine learning models, allowing them to be conveniently shared along with their associated data. You can choose from a variety of built-in, open-source algorithms to start processing your data with SageMaker JumpStart, or you can build custom parameters for your machine learning model.

Once you've chosen a model, SageMaker starts processing data automatically and offers a simple, easy-to-understand interface for tracking your model's progress and performance.

Training

SageMaker provides a range of tools for training your model from the data you've prepared, including a built-in debugger for detecting possible errors.

Machine Learning The training job's results are saved in an Amazon S3 bucket, where they can be viewed using other AWS services including AWS Quicksight.

Deployment

It's pointless to have strong machine learning models if they can't be easily deployed to your hosting infrastructure. Fortunately, SageMaker allows deploying machine learning models to your current services and applications as easy as a single click.

SageMaker allows for real-time data processing and prediction after installation. This has far-reaching consequences in a variety of areas, including finance and health. Businesses operating in the stock market, for example, may make real-time financial decisions about stock and make more attractive acquisitions by pinpointing the best time to buy.

Incorporation with Amazon Comprehend, allows for natural language processing, transforming human speech into usable data to train better models, or provide a chatbot to customers through Amazon Lex.

In conclusion…

Machine Learning is no longer a niche technological curiosity; it now plays a critical role in the decision-making processes of thousands of companies around the world. There has never been a better time to start your Machine Learning journey than now, with virtually unlimited frameworks and simple integration into the AWS system.

AWS SageMaker - Realtime Data Processing

4 Answers