16
votes

I have an AWS Kinesis Firehose Stream set up to feed data into an AWS ElasticSearch cluster, and I can successfully insert documents by sending them to the Firehose Stream, which loads them into ElasticSearch.

But I would like to be able to manually specify/set a document's id value when sending it off to the Firehose Stream. I'm successfully using the AWS PHP SDK to send data to Firehose, I just can't figure out if there's a way to manually set a document's id.

$firehoseParams = [
    'DeliveryStreamName' => 'myStreamName', // REQUIRED
    'Record' => [ // REQUIRED
        'Data' => '{"json_encoded": "data", ...}', // REQUIRED
    ],
];
$firehoseResult = $this->_firehoseClient->putRecord($firehoseParams);

I've tried setting id, _id, and esDocumentId values in the JSON data, all to no avail.

Anyone have any ideas?

1
I tried changing the id once a few years back, and that resulted in some queries not returning the correct values, like when using avg. So you might want to double check it works when you figure it out.WoodyDRN

1 Answers

0
votes

You can use Kinesis Data Streams for this purpose, you can send your documents to the stream and via a lambda function, you can provide the _id property by using the official Elasticsearch API.