1
votes

Let's say we have two type of Vertex: LOGIN_USER(property:user_id) and IP(property:ip), EDGE between them is : LOGIN(property:session_id, login_time).

This model's problem is that two many edges between one USER and IP(Can be thousands). Is there anyway to reduce the edge number of the two vertexes and at the same time can keep property: sessionId and login_time? We want to filter these two properties for some query. Edge property doesn't support cardinality:list which vertex property support.

If put all edge property into Vertex, does it impact performance to fetch the Vertex? When titan load property for a Vertex?? When traversal to a Vertex, let's g.V(1).next(), does Titan load all Property for the Vertex?

1

1 Answers

1
votes

When you say "thousands" of edges between USER and IP, do you think it could actually be "millions" or "tens of millions" or more? If not, then "thousands" should not be a problem for Titan with vertex centric indices. Index your edge properties and you should have fast ordering and traversals.

When you start to get deep into "millions", you might start to experience some problems - for me that has always been with processing global queries with titan-hadoop as the Vertex and its edges must be held in memory. That can make for some trouble spots when you're doing global analytics. From an operational perspective, Titan was always happy to keep writing edges into the millions on a vertex, but I'd tend to avoid it. Of course, much of my experience with this came before vertex cutting in Titan 1.0:

Cutting a vertex means storing a subset of that vertex’s adjacency list on each partition in the graph. In other words, the vertex and its adjacency list is partitioned thereby effectively distributing the load on that single vertex across all of the instances in the cluster and removing the hot spot.

which you might experiment with as you start to grow supernodes into the millions.

I suppose the other option for supernodes in the millions of edges would be to model around it. Perhaps you introduce some structure between USER and IP. Convert that single LOGIN edge to some vertices/edges that might introduce a time concept between them like:

USER -> LOGIN_YEAR -> LOGIN_MONTH -> IP

So now, instead of creating just one edge between USER and IP you create a LOGIN_YEAR vertex and a LOGIN_MONTH vertex.