I created an XGBoost model with AWS SageMaker. Now I'm trying to use it through Batch Transform Job, and it's all going pretty well for small batches.
However, there's a slightly bigger batch of 600.000 rows in a ~16MB file and I can't manage to run it in one go. I tried two things:
1.
Setting 'Max payload size' of the Transform job to its maximum (100 MB):
transformer = sagemaker.transformer.Transformer(
model_name = config.model_name,
instance_count = config.inference_instance_count,
instance_type = config.inference_instance_type,
output_path = "s3://{}/{}".format(config.bucket, config.s3_inference_output_folder),
sagemaker_session = sagemaker_session,
base_transform_job_name = config.inference_job_prefix,
max_payload = 100
)
However, I still get an error (through console CloudWatch logs):
413 Request Entity Too Large
The data value transmitted exceeds the capacity limit.
2.
Setting max_payload to 0, which, by specification, Amazon SageMaker should interpret as no limit on the payload size.
In that case the job finishes successfully, but the output file is empty (0 bytes).
Any ideas either what I'm doing wrong, or how to run a bigger batch?