1
votes

My company is interested in using a stand-alone Service Fabric cluster to manage communications with robots. In our scenario, each robot would host its own rosbridge server, and our Service Fabric application would maintain WebSocket clients to each robot. I envision a stateful service partitioned along device ids which opens connections on startup. It should monitor connection health via heartbeats, pass messages from the robots to some protocol gateway service, and listen to other services for messages to pass to the robots.

I have not seen discussion of this style of external communications in the Service Fabric documentation - I cannot tell if this is because:

  • There are no special considerations for managing WebSockets (or any two-way network protocol) this way from Service Fabric. I've seen no discussion of restrictions and see no reason, conceptually, why I can't do this. I originally thought replication would be problematic (duplicate messages?), but since only one replica can be primary at any time this appears to be a non-issue.
  • Service Fabric is not well-suited to bi-directional communication with external devices

I would appreciate some guidance on whether this architecture is feasible. If not, discussion on why it won't work will be helpful. General discussion of limitations around bi-directional communication between Service Fabric services and external devices is welcome. I would prefer if we could keep discussion to stand-alone clusters - we have no plans to use Azure services at this time.

2
Do you have any specific reason to use Service Fabric instead of WebApp?Alex Riabov
We are interested in using reliable collections for storage in some cases. We like the affordances of scalability and reliability on Service Fabric. Although not using Azure at this time, we would eventually like to sell projects hosted in Azure. We like Service Fabric since we can choose between cloud and standalone hosting models while maintaining similar deployment processes.abarger

2 Answers

1
votes

Any particular reason you want SF to host the client and not the other way around?

Doing the way you suggest, I think you will face big challenges to make SF find these devices on your network and keep track of them, for example, Firewall, IPs, NAT, planned maintenance, failures, connection issues, unless you are planning to do it by hand.

From the brief description I saw in the docs your provided about rosbridge server, I could understand that you have to host it on a Server(like you would with a service fabric service) and your devices would connect to it, in this case, your devices would have installed the ROS to make this communication.

Regarding your concerns about the communication, service fabric services are just executable programs you would normally run on your local machine, if it works there will likely work on service fabric environment on premise, the only extra care you have to worry is the external access to the cluster(if in azure or network configurations) and service discovery.

In my point of view, you should use SF as the central point of communication, and each device would connect to SF services. The other approach would be using Azure IoT Hub to bridge the communication between both. There is a nice Iot Hub + Service Fabric Sample that might be suitable for your needs.

Because you want to avoid Azure, you could in this case replace IoT Hub with another messaging platform or implement the rosbridge in your service to handle the calls.

1
votes

I hope I understood everything right.

About the obstacles:

I think the major issue here is that bi-directional connection can be established between service replica and the robot.

This has two major problems:

  1. Only primary replica has write access - i.e. only one replica would be able to modify state. This issue hence could be mitigated by creating a separate partition for each robot (but please remember that you can't change partition count after the service was created) or by creating a separate service instance for each robot (this would allow you to dynamically add or remove robots but would require additional logic related to service discoverability).
  2. The replica can be shutdown (terminated), moved to another node (shutdown and start of new replica) or even demoted (the primary replica get's demoted to secondary and another secondary replica get's promoted to primary) by various reasons. So the service code and robot communication code should be able to handle this.

About WebSockets

This looks possible by implementing custom ICommunicationListener and other things using WebSockets.