1
votes

I am using Pubsub(push) and Cloud Run, where I will deploy a Java application built with Spring Boot.

I have two cases. Let's say I have Service A running in Cloud Run with 10 containers/instances, due to high load. I want to:

  • push a message(from Cloud Function) to all(broadcast) the containers of Service A
  • push a message(from Cloud Function) to a single arbitrary container of Service A

Background: My Cloud Run service will be using server-sent-events to push data directly to the client/browser. This of course means the containers/instances will keep a state. There are cases where I need to push a message to all sse/ws connections on all containers(imagine a chat application with a public chat room, where everyone can see a published message). Since there are no way for the containers in Cloud Run to know or see each other(I assume), I figured the right way to solve this is using pubsub.

Please point me in the right direction if there are tools that suits this situation better.

4

4 Answers

1
votes

You can only push to the service endpoint (URL), not individual instances of that service.

There is only one container per Cloud Run service. You can control how many instances are created by the number of maximum requests per instance (concurrency).

Cloud Run instanced are created and destroyed dynamically based upon traffic. Pub/Sub is a subscription based service. Each subscriber receives one copy of the message. You are looking at X copies of the same message at one point in time and Y copies at another point in time. That violates the Pub/Sub model of message delivery.

1
votes

Cloud Run instances are independent and as you said they can't see and know each others. In addition, Cloud Run contract is to be state less, and so can't have state and update it in push message.

Instances can be active and inactive (processing or not requests) and if you have 10 current active instance, there is maybe 20 or 30 instances provisioned in advance (started) to absorb the traffic increase (if happen).

All of this to say that your design is wrong. You don't need to rely on state on Cloud Run instances and think to update it by push.

You need to store the state externally, on Memorystore or firestore, and to get the data at each request for example.

0
votes

If you want to use pub-sub for this then you will have to create a subscriber for each cloud run instance, which you can do while your program loads/boots up but then you will have to garbage collect the same subscriber when your program exits or need to have a job that at regular intervals cleans up subscribers which have lot of unacked messages.

I would suggest you to look into https://docs.nats.io/nats-concepts/subjects, this does exactly what you want to do.

Also, since you have mentioned that this is going to be a stateful service, it is better if you use app engine than cloud run as a cloud run instance is only active till the time it handles the active connections. If any connections breaks for some reason then you may loose your container with it's state.

0
votes

Cloud Run supports accepting WebSocket connections. While those connections aren't permanent long-live connections (they have 15min timeout in GA and up to 60min timeout in beta), they do prevent google from terminating container instances as long as at least one WebSocket connection is alive in the given container. You can have up to 250 WebSocket connections (or in general 250 HTTP connections at any given time) for each container.

This means that you can make your Java application subscribe to a topic from Google Pubsub as soon as it starts up, and wait for Pubsub messages which will then be relayed to any (or all) WebSocket clients that are connected to that given particular Cloud Run instance.

Google Cloud Pubsub supports a one-to-many subscription pattern, so you can make one message that was published to Pubsub topic to be published to all subscribers, which in this case will be each individual Google Cloud Run container instance that has active WebSocket connections.

  1. Java app will connect to Pubsub Topic when it starts up.
  2. Java app will accept WebSocket connections.
  3. Java app will relay messages from Pubsub subscriptions to corresponding clients based on what is in the message body with your filtering logic.

So your design is feasible with Google Cloud Run (with WebSocket supports now) & Google Cloud Pubsub. I do have some concerns so I am putting them in here.

My first concern would be 15min (60min in beta) HTTP timeout that Google has imposed on Google Cloud Run, which means that your clients' websocket connections will be dropped after that time threshold, and you will need to handle reconnection. In that glimpse of reconnection, some messages can be lost, so it would be difficult to achieve 100% guaranteed message delivery.

My second concern (which you can probably worry about far down in the road) is that due to the nature of the one-to-many fan-out architecture of Pubsub, a single published message to the PubSub topic will be relayed to all subscribers, which means all Cloud Run container instances will receive the message. If that message is meant to be delivered for just one WebSocket in one of many containers, it can be a waste of cpu/network resources(cost), and this problem will only get bigger when there are many Cloud Run containers running at the same time and there is a large message volume. Of course, you can create a topic for each container or each "chatroom" but it can increase complexity, and I believe google puts some limits on # of topics that you can have as well as TPS limits on admin operations.

You might also want to take a look at Redis Pubsub, which allows you to subscribe to specific topics (and no topic create/destroy overhead). You could technically create a topic for each user, or each "chatroom", and let your Java app subscribe to the topic based on the connected WebSockets' interest. This may solve the second concern that I brought up above, as each container instance will only receive messages that are relevant to them... but the tradeoff of this approach would be that your Redis instance can be a bottleneck.