0
votes

Am I doing something wrong with this this gremlin query? Is this not a performant query ? My 2 nodejs instances on AWS use the gremlin client which talks over websockets through an AWS ELB to 2 Titan 1.0/gremlin server instances.The backend is DynamoDB.We have the right read/write throughput for DynamoDB configured now.

Log:

WARN org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor - Exception processing a script on request [RequestMessage{, requestId=r1, op='eval', processor='', args={gremlin=

def user = g.V().has("userId", userId1).has("tenantId", tenantId).hasLabel(userLabel).next();g.V(user).outE(eIsOwnedByLabel).inV().as('path').inE(eHasAccessToLabel).or(.has('shareToType',allType).outV().has('tenantId',tenantId).outE(eHasAccessToLabel),.has('shareToType',groupType).outV().hasLabel(groupLabel).inE(eIsMemberOfLabel,eIsAdminOfLabel).outV().has('userId',userId).outE(eIsMemberOfLabel,eIsAdminOfLabel).inV().outE(eHasAccessToLabel),__.has('shareToType',userType).outV().hasLabel(userLabel).has('userId',userId).outE(eHasAccessToLabel)).as('role').inV().select('role','path').by('role').by('path');,

bindings={tenantId=1, userLabel=User, userId1=2, eIsOwnedByLabel=is_owned_by, eHasAccessToLabel=has_access_to, eIsMemberOfLabel=is_member_of, eIsAdminOfLabel=is_admin_of, userId=a1, groupLabel=Group, groupType=group, userType=user, allType=all}, accept=application/json, language=gremlin-groovy}}]. org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException

When we stress test, the gremlin servers just stops responding and gives us errors like this:

{"name":"logger","hostname":"a","pid":27881,"level":"ERROR","err":{"message":"null (Error 597)","name":"Error","stack":"Error: null (Error 597)\n at GremlinClient.handleProtocolMessage (/opt/application/sharing-app/node_modules/gremlin/lib/GremlinClient.js:204:39)\n at WebSocketGremlinConnection. (/opt/application/sharing-app/node_modules/gremlin/lib/GremlinClient.js:120:23)\n at emitOne (events.js:96:13)\n at WebSocketGremlinConnection.emit (events.js:188:7)\n at WebSocketGremlinConnection.handleMessage (/opt/application/sharing-app/node_modules/gremlin/lib/WebSocketGremlinConnection.js:69:12)\n at WebSocketGremlinConnection._this.ws.onmessage (/opt/application/sharing-app/node_modules/gremlin/lib/WebSocketGremlinConnection.js:46:20)\n

I tried to run a profile() locally using g.V().has("userId", '1').has("tenantId", '2').hasLabel('User').outE('is_owned_by')....: ==>Traversal Metrics

Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TitanGraphStep([userId.eq(51ce1780-1998-47eb-a1...                     0           0         190.524    24.91
  optimization                                                                               176.456
  backend-query                                                        0                       6.074
  backend-query                                                        0                       5.067
TitanVertexStep(OUT,[is_owned_by],vertex)@[path]                       0           0           0.005     0.00
TitanVertexStep(IN,[has_access_to],edge)                               0           0         190.539    24.91
OrStep([[HasStep([shareToType.eq(all)]), Profil...                     0           0           0.012     0.00
  HasStep([shareToType.eq(all)])                                       0           0           0.000
  EdgeVertexStep(OUT)                                                  0           0           0.000
  HasStep([tenantId.eq(ndgThunderDome)])                               0           0           0.000
  TitanVertexStep(OUT,[has_access_to],edge)                            0           0           0.000
  HasStep([shareToType.eq(group)])                                     0           0           0.000
  EdgeVertexStep(OUT)                                                  0           0           0.000
  HasStep([~label.eq(Group)])                                          0           0           0.000
  TitanVertexStep(IN,[is_member_of, is_admin_of...                     0           0           0.000
  HasStep([userId.eq(a257c260-261f-45df-a1e7-92...                     0           0           0.000
  TitanVertexStep(OUT,[is_member_of, is_admin_o...                     0           0           0.000
  TitanVertexStep(OUT,[has_access_to],edge)                            0           0           0.000
  HasStep([shareToType.eq(user)])                                      0           0           0.000
  EdgeVertexStep(OUT)                                                  0           0           0.000
  HasStep([~label.eq(User)])                                           0           0           0.000
  HasStep([userId.eq(a257c260-261f-45df-a1e7-92...                     0           0           0.000
  TitanVertexStep(OUT,[has_access_to],edge)                            0           0           0.000
EdgeVertexStep(IN)                                                     0           0         190.550    24.91
SelectStep([role, path],[value(role), value(pat...                     0           0           0.021     0.00
SideEffectCapStep([~metrics])                                          1           1         193.286    25.27
                                            >TOTAL                     -           -         764.940        -

TIA

1
Is your question about Gremlin Server stopping or the error itself? The error itself from the server log is the error you get when a traversal returns no data. That must be happening somewhere in your script. Based on your description it's hard to say if that error is related to the server not handling further requests. My intuition says that the two likely aren't related. I think you should try to debug your Gremlin a bit to get rid of that error thus removing it as an issue related to the server hanging. - stephen mallette

1 Answers

0
votes

The script was not the issue. The Titan Db got overloaded with requests and the performance degraded with scripts timing out. Changing dynamodb.properties to add

cache.db-cache=true
cache.db-cache-time=...
cache.db-cache-size=0.3
cache.db-cache-clean-wait=50

Adding cache helped reduce the load on the Db and helped increase the requests/sec flowing through.

Changed gremlin-server.yaml too: threadPoolWorker =2 Not sure how to change threadPoolWorker based on the CPU cores though on our m4.large AWS instance with 2 CPU cores. Also changed by playing around with the values: maxAccumulationBufferComponents:8192 resultIterationBatchSize:2048