1
votes

I have a titan graph with two sets of vertices, one for UserID and another for Products. The edges between the UserID and Product vertices contain the "Dates" when products were purchased. I am looking for a gremlin query that can give me a groupcount of userids by the last product purchased

For example if had data like below

UserID,Date,Product
A,2016-01-01,'Razor'
A,2016-01-02,'Toothpaste'
B,2016-02-01,'Toothpaste'
B,2016-02-02,'Razor'
C,2016-01-04,'Toothpaste'

I am looking for an output like below

Product, Count
'Toothpaste',2
'Razor',1

Would appreciate any help.

2

2 Answers

3
votes

This solution works for your example data:

g.V().hasLabel('Product').as('p').inE('Purchase').order().by('Date', decr).outV().dedup().select('p').groupCount().by('Name')

This is the algorithm:

  • Start from the products
  • Traverse to the purchase edges
  • Order the edges by date descending
  • Traverse to the users
  • Dedup the users; only the most recent edge per user will remain because of sorting
  • Jump back to the products
  • Group count by product name

Here's a Gremlin Console dump showing it in action:

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> a = graph.addVertex(T.label, 'User', 'UserID', 'A')
==>v[0]
gremlin> b = graph.addVertex(T.label, 'User', 'UserID', 'B')
==>v[2]
gremlin> c = graph.addVertex(T.label, 'User', 'UserID', 'C')
==>v[4]
gremlin> r = graph.addVertex(T.label, 'Product', 'Name', 'Razor')
==>v[6]
gremlin> t = graph.addVertex(T.label, 'Product', 'Name', 'Toothpaste')
==>v[8]
gremlin> a.addEdge('Purchase', r, 'Date', new Date(2016, 0, 1))
==>e[10][0-Purchase->6]
gremlin> a.addEdge('Purchase', t, 'Date', new Date(2016, 0, 2))
==>e[11][0-Purchase->8]
gremlin> b.addEdge('Purchase', t, 'Date', new Date(2016, 1, 1))
==>e[12][2-Purchase->8]
gremlin> b.addEdge('Purchase', r, 'Date', new Date(2016, 1, 2))
==>e[13][2-Purchase->6]
gremlin> c.addEdge('Purchase', t, 'Date', new Date(2016, 0, 4))
==>e[14][4-Purchase->8]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:5 edges:5], standard]
gremlin> g.V().hasLabel('Product').as('p').inE('Purchase').order().by('Date', decr).outV().dedup().select('p').groupCount().by('Name')
==>[Toothpaste:2,Razor:1]
2
votes

The following query works in OLTP and OLAP and doesn't touch more vertices than necessary:

g.V().hasLabel("User").
  local(outE("purchased").order().by("date", decr).limit(1)).inV().
  groupCount().by("name")

Furthermore this query can be perfectly optimized by Titan, when you create a vertex centric index on date.