My pipeline is as follows:
Firehose -> Lambda (AWS' Java SDK) -> (S3 & Redshift)
An un-encoded (raw) JSON record is submitted to Firehose. It then triggers a Lambda function which transforms it slightly. Firehose then puts the transformed record into an S3 bucket and into Redshift.
For Firehose to add the transformed data to S3, it requires that the data be Base64 encoded (and Firehose decodes it before adding it to S3).
However, I have a URL within the data that, when decoded, =
characters are replaced with their equivalent unicode character (\u003d
) due to it being the character that Amazon's Base64 decoder uses as padding.
https://www.[snipped].com/...?returnurl\u003dnull\u0026referrer\u003dnull
How can I retain those =
characters within the decoded data?
Note: I've tried using Base64.getUrlEncoder()
, but AWS only seems to support Base64.getEncoder()
.
=
with\u003d
. Interestingly, the S3 record is also added to Redshift via Firehose, and Redshift does show the=
character. – Jacob G.\u003d
isn't equivalent to=
in a utf-8 text file, but it is in JSON and of course the interface to Lambda is always JSON (though irrelevant if the data in and out is always represented in base64). I don't actually understand your setup well enough to know if this is a useful piece of speculation on my part. – Michael - sqlbot