0
votes

When you create feature groups in SageMaker Feature Store you take the following steps:

  1. Setup SageMaker Python SDK and boto client
  2. Inspect data we want to use, and apply transformations (e.g. remove NAs, round numbers, etc.)
  3. Ingest transformed data into feature store
  4. Build training data by running Athena query on Feature Groups
  5. Select columns for training
  6. Save training dataset to S3 bucket
  7. Train and deploy model
  8. Use GetRecord functionality to make prediction on recent data from feature store.

You can see a detailed example of the steps here.

But how does the feature store apply the transformations to the data prior to making a prediction? Obviously the newly ingested data must be transformed (so it's the same as in training), but we only made these transformations in step 2, BEFORE anything was added to a feature group. It doesn't appear as though following these steps allows the feature store to have any knowledge of the transformations.

For example, in the linked example they add the transformed data to the transaction_feature_group as follows:

transaction_feature_group.ingest(data_frame=transformed_transaction_data, max_workers=5, wait=True)

So we can see that the transformed data is what gets loaded into the feature group. But what about new data added over time? How is this new data getting automatically transformed?

1

1 Answers

0
votes

Feature Store is not aware of the transformations. You can use feature generation pipelines (which can do the transformation if needed) prior to ingesting features into feature store. Please checkout this link for more details.

source:aws