0
votes

I have a record which I want to ingest into a feature group of a sagemaker feature store. The feature name 'z' is fractional in definition schema. I have some missing data from feature 'z'. When I try to ingest it, I get errors:

[{'FeatureName': 'ji', 'ValueAsString': '8829a094'}, {'FeatureName': 'time', 'ValueAsString': '2020-08-27T13:00:00Z'}, {'FeatureName': 't2', 'ValueAsString': '289.26111111111106'}, {'FeatureName': 're', 'ValueAsString': '86'}, {'FeatureName': 'pwat', 'ValueAsString': '0.9609375'}, {'FeatureName': 'li700', 'ValueAsString': '3'}, {'FeatureName': 'c', 'ValueAsString': '0'}, {'FeatureName': 'd', 'ValueAsString': '0'}, {'FeatureName': 'x', 'ValueAsString': '0'}, {'FeatureName': 'y', 'ValueAsString': '0.0'}, {'FeatureName': 'z', 'ValueAsString': 'None'}]

Attempted to parse the feature value for the feature named [z] into a FeatureValue of type Fractional. The provided value must be within the range of a double precision floating point number defined by the IEEE 754 standard. The input format can be in either decimal form or scientific notation.

How do you deal with missing data for ingesting into feature groups?

1

1 Answers

0
votes

found the answer myself:

def transform_row(row) -> list:
    columns = list(row.asDict())
    record = []
    for column in columns:
        feature = {'FeatureName': column, 'ValueAsString': str(row[column])}
        # We can't ingest null value for a feature type into a feature group
        if str(row[column]) not in ['NaN', 'NA', 'None', 'nan', 'none']:
            record.append(feature)
    return record


def ingest_to_feature_store(args: argparse.Namespace, rows) -> None:
    feature_group_name = args.feature_group_name
    session = boto3.session.Session()
    featurestore_runtime_client = session.client(service_name='sagemaker-featurestore-runtime')
    rows = list(rows)
    for _, row in enumerate(rows):
        record = transform_row(row)
        response = featurestore_runtime_client.put_record(FeatureGroupName=feature_group_name, Record=record)
        assert response['ResponseMetadata']['HTTPStatusCode'] == 200