2
votes

I've set up a kinesis firehose for others to send me data on, and noticed that occasionally the data occasionally is malformed. The malformed docs fail to properly ETL into redshift - they end up being left in the intermediary Firehose S3 bucket, where they keep generating spammy error messages, referencing the STL_LOAD_ERRORS table

Is there a safe way to remove the problematic records from the S3 bucket? Or any other safe way to clean up the malformed records?

--

Note that I've already tried simply deleting the malformed records from S3. This seems to put in Kinesys Firehose into an infinite loop, generating error spam with the message: "One or more S3 files required by Redshift have been removed from the S3 bucket". As far as I can tell, this spam is supposed to eventually stop, but in my experiments it seems to keep going without break.

1

1 Answers

1
votes

Here is what will work.

  1. STL_Load_Errors table will give you the filename in S3 along with the linenumber and reason of the error.
  2. Find the erroneous record and correct it and re stream it from the source via firehose.