1
votes

I am working on an import runner for a new graph database.

It needs to work with:

  • Amazon Neptune - Gremlin implementation, has great infrastructure support in production, but a pain to work with locally, and does not support Cypher. No visualization tool provided.

  • Janusgraph - easy to work with locally as a Gremlin implementation, but requires heavy investment to support in production, hence using Amazon Neptune. No visualization tool provided.

  • Neo4j - Excellent visualization tool, Cypher language feels very familiar, even works with Gremlin clients, but requires heavy investment to support in production, and there appears to be no visualization tool that is anywhere nearly as good as the one found in Neo4j that works with Gremlin implementations.

So I am creating the graph where the Entity (Nodes/Verticies) have multiple Types (Labels), some being orthogonal to each other, as well as multi-dimensional.

For example, an Entity representing an order made online would be labeled as Order, Online, Spend, Transaction.

             | Spend       Chargeback
----------------------------------------
 Transaction | Purchase    Refund
 Line        | Sale        Return

Zooming into the Spend column.

          | Online      Instore
----------------------------------------
 Purchase | Order       InstorePurchase
 Sale     | OnlineSale  InstoreSale 

In Neo4j and its Cypher query language, this proves to be very powerful for creating Relationships/Edges across multiple types without explicitly knowing what transaction_id values are in the graph :

MATCH (a:Transaction), (b:Line)
WHERE a.transaction_id = b.transaction_id
MERGE (a)<-[edge:TRANSACTED_IN]-(b)
RETURN count(edge);

Problem is, Gremlin/Tinkerpop does not natively support multiple Labels for its Verticies.

Server implementations like AWS Neptune will support this using a delimiter eg. Order::Online::Spend::Transaction and the Gremlin client does support it for a Neo4j server but I haven't been able to find an example where this works for JanusGraph.

Ultimately, I need to be able to run a Gremlin query equivalent to the Cypher one above:

g
  .V().hasLabel("Line").as("b")
  .V().hasLabel("Transaction").as("a")
  .where("b", eq("a")).by("transaction_id")
  .addE("TRANSACTED_IN").from("b").to("a")';

So there are multiple questions here:

  1. Is there a way to make JanusGraph accept multiple vertex labels?
  2. If not possible, or this is not the best approach, should there be an additional vertex property containing a list of labels?
  3. In the case of option 2, should the label name be the high-level label (Transaction) or the low-level label (Order)?
1
Just a quick remark on your 'no visualization tool provided' for Amazon Neptune. There is graph database browser product that is available on aws marketplace for use against a Neptune instance.James Render
I'm using Amazon Neptune. When working locally, I spin up a Gremlin-Server using docker. Once a graph traversal is working against that, I'll try it against Neptune.. I think in several months of doing this, I've had one issue that was resolved by updating the version of Neptune I was using.James Render

1 Answers

3
votes

Is there a way to make JanusGraph accept multiple vertex labels?

No, there is not a way to have multiple vertex labels in JanusGraph.

If not possible, or this is not the best approach, should there be an additional vertex property containing a list of labels?

In the case of option 2, should the label name be the high-level label (Transaction) or the low-level label (Order)?

I'll answer these two together. Based on what you have described above I would create a single label, probably named Transaction, and with different properties associated with them such as Location (Online or InStore) and Type (Purchase, Refund, Return, Chargeback, etc.). Looking at how you describe the problem above you are really talking only about a single entity, a Transaction where all the other items you are using as labels (Online/InStore, Spend/Refund) are really just additional metadata about how that Transaction occurred. As such the above approach would allow for simple filtering on one or more of these attributes to achieve anything that could be done with the multiple labels you are using in Neo4j.