19
votes

I'm building an app that is constantly appending to a buffer while many readers consume from this buffer independently (write-once-read-many / WORM). At first I thought of using Apache Kafka, but as I prefer an as-a-service option I started investigating AWS Kinesis Streams + KCL and it seems I can accomplish this task with them.

Basically I need 2 features: ordering (the events must be read in the same order by all readers) and the ability to choose the offset in the buffer from where the reader starts consuming onwards.

Now I'm also evaluating Google Cloud Platform. As I am reading the documentation it seems that Google Pub/Sub is suggested as the equivalent to AWS Kinesis Stream, but at a more detailed level these products seem a lot different:

  • Kinesis guarantees ordering inside a shard, while on Pub/Sub ordering is on a best-effort basis;
  • Kinesis has all the buffer (limited to max 7 days) available to readers, which can use an offset to select the starting reading position, while on PubSub only the messages after the subscription are available for consuption.

If I got it right, PubSub cannot be considered a Kinesis equivalent. Perhaps if used together with Google Dataflow? I must confess that I still can't see how.

So, is PubSub an alternative to Kinesis? If not, is there a Google Cloud Product that would fulfill my requirements?

Thanks!

2
That is what I could see as well. PubSub+DataFlow (approx) not equivalent to Kinesis. While I have used Kinesis extensively, I don't see such documentation or functionality around pubsub and Dataflow. They might be bit far.Kannaiyan
The post at cloud.google.com/blog/big-data/2016/09/… just made me a little more confused. It implies (subtly) that PubSub is an alternative to Kafka, but I still don't see the same capabilities.Renan
With Pub/Sub you need to add the ordering information in the message payload. This may or may not be an issue with your application.gdahlm

2 Answers

8
votes

A rather convoluted solution but it might help:

  • push your events using pub/sub to a single topic. At this point they will be unordered.
  • create a cloud dataflow streaming pipeline that reads from the pub/sub topic. Have it do streaming writes to cloud bigquery, add a timestamp to each table entry.
  • have you readers do queries on the bq table, order by timestamp to have a consistent order. You can use ROW_NUMBER as your offset.

Hope that helps.

2
votes

Pub/Sub now supports ordering natively. As for the requirement that a subscription (~consumer group in Kafka) exist before you consume, it's very rarely a problem for users. If nothing else, you can create snapshots which allow you to reset a new subscription to the state of any other existing subscription.

This is a bit late, but @Renan, if you are still watching would love to hear how you ended up building your system.