What are the Pros and Cons when using remote janusgraph connection over embedded?

Question

I am using embedded janusgraph in my java backend my code depends on janusgraph instanciated from graph = JanusGraphFactory.open(conf)

AFAIK this connects to Cassandra and elastic search directly and run the janusgraph processor in my backend application JVM. But if I want to scale janusgraph I need to run separate janusgraph servers on a cluster and need to connect to these servers as the client from my backend.

According to remote janusgraph example on github this is accomplished using instantiating an EmptyGraph graph = EmptyGraph.instance(); which is not instance of JanusGraph but of org.apache.tinkerpop.gremlin.structure.util.empty.EmptyGraph;.

I can understand from the example above that I can only use gremlin queries by submitting them to janusgraph server, but I will not be able to use the management APIs directly unless submitting the code as a string to the server.

Finally, I can understand that it is better for scalability to run janusgraph server separately but I will lose the direct access in my code to janusgraph apis so I want to know if something I miss understand and what are the pros and cons in remote deployment approach and what I will lose against embedded approach?

Edit:

According to this answer correct it if wrong:

Pros/Cons of connecting to the remote gremlin server

Pros

The server has much more control and all the queries are centralized.
Since every one is running traversal/queries via the remote gremlin server, all are transactionally protected. The remote gremlin server runs your traversal/queries by default in a transaction.
Central strategy management
Central schema management

Cons

Tough to do a manual transaction management
You have to use groovy script as string and send it to remove (Cluster submit) for transactional execution of your code.

Hi, do you have any example of how to do transaction management with remote janusGraph — Serhii Zadorozhnyi

Bishnu Bishnu · Accepted Answer · 2020-05-18T11:57:28

Whatever Pros, Cons listed above are correct, along with that I will list out my learnings:

With the gremlin server approach, as a user, the architecture will look like a web server(additional cost) which is contacting the storage system. The upscale/downscale of these gremlin servers has to be handled manually based on the load, else it will become bottleneck of the entire system.

In embedded mode, you have a storage system (say Cassandra) and another one that does interact with this via tinker pop gremlin. With this, you don't have to maintain gremlin servers, it just your program/client is interacting with the storage server.

Consider data loading via Apache Spark, once you run job with more executors the gremlin server should be capable enough to handle loads.

What are the Pros and Cons when using remote janusgraph connection over embedded?

1 Answers