In my use case, i am hitting the 150 value limit on an sns subscription filter policy as described in Filter policy constraints at filter policies. In total, I expect to have about 500 to 1500 values that I would like to use as inclusion criteria in my filter policy.
It appears there is a limit of one filter policy per sns subscription as well. Applying a second filter policy json via the cli set-subscription-attributes
overlays the first filter policy json. Finally, it appears there is a limit of one subscription on an sns topic per subscribing resource (such as an sqs queue) per my reading of sns subscribe. Using the cli subscribe
multiple times for same topic and queue, returned the same subscription arn each time.
So my only options are to add more sqs queues whenever i hit the 150 value limit, each queue getting its own subscription to the sns topic -- or come up with a different filter policy, that would be less precise in my use case, and do the additional filtering inside my subscriber app to stay below the 150 value limit.
I did not see any SO threads on this. Am I missing something or has anyone found a better way around the 150 value filter policy limit via aws cli or sdk?
Additional background info: The subscriber app is an existing prod service that produces data quality metrics on newly arriving instances of parquet datasets, which are contained in an enterprise s3 data lake and have been on-boarded for this division-level service. As part of the on-boarding of lake datasets to this service, we add them to the filter policy of our subscription to a data lake SNS topic. This topic publishes a list of dataset attributes (s3 bucket, key, dataset name, time stamp, etc) to subscribers for all puts of lake dataset instances — spanning 000s of datasets and a large number of buckets. We do not control this enterprise-level SNS topic, but can subscribe to it. Currently, our subscriber app sees one message per day, per on-boarded dataset. The subscribing app runs in an auto-scaling group that scales based on the number of messages visible in our sqs queue. It has some functionality to discard non-conforming messages. Recently, we hit the filter policy limit when we attempted to extend the service to additional datasets in the lake. I am leaning toward changing our filter policy to include only messages for puts to our divisional-level s3 buckets, then doing dataset-level filtering inside the app. Have to see how this impacts the auto-scaling.
SUBSCRIPTION_ARN=`aws sns --profile myProfile subscribe --topic-arn arn:aws:sns:us-east-1:123456789012:mySNS --protocol sqs --notification-endpoint arn:aws:sqs:us-east-1:999999999999:mySQS --return-subscription-arn`
aws sns --profile myProfile set-subscription-attributes --subscription-arn $SUBSCRIPTION_ARN --attribute-name FilterPolicy --attribute-value file:///myUser/github/repo/filter_policy1.json
where filter_policy1.json is limited to 150 dataset values and takes the form:
{
"dataset": [
"datasetname_1",
"datasetname_5",
"datasetname_256"
],
"_SUCCESS": [
"True"
]
}