I am using neo4j to store application data and below image depicts the graph structure
Each circle is a node and each arrow depicts a relationship and relation type is defined above. It also defines many to many or one to many or one to one relationship for nodes
.
What I would like to retrieve from the graph.
I want to list all the positions for a company and each position will have a array of users and each user will have a array of feedback for each interview round, like below
position ---> candidate1
interview round name (Telephonic)
question1
answer1
and answer given by user1
question1
answer1
and answer given by user2
interview round name (HR Round)
question1
answer1
and answer given by user1
question1
answer1
and answer given by user2
candidate2
interview round name (Telephonic)
question1
answer1
and answer given by user1
question1
answer1
and answer given by user2
interview round name (HR Round)
question1
answer1
and answer given by user1
question1
answer1
and answer given by user2
.
.
.
Many candidates will not have interview rounds than for those candidates the rounds should be null.
Below is the query I am using to retrieve the data I need.
MATCH (comp:Company {dId: "155dyv1wgT"})<-[:`POSITION_COMPANY`]-(pos: Position {status: 'OPEN'})-[:`POSITION_WORKFLOW`]->(:WorkFlow)-[:`WORKFLOW_CANDIDATE-WORKFLOW`]->(cw : CandidateWorkFlow)-[:`CANDIDATE-WORKFLOW_COMPANY-CANDIDATE`]->(cc : CompanyCandidate)
where ((not (has(cc.isSpam) or has(cc.isTrash))) OR (cc.isSpam=false and cc.isTrash=false)) and pos.positionType IN ['PUBLIC','DISCRETE'] with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc
OPTIONAL MATCH (cw)-[:`CANDIDATE_WORKFLOW_INTERVIEW`]->(inwrkflw: InterviewWorkFlow)-[:`INTERVIEW_ROUND`]->(intrnd: InterviewRound)-[:`INTERVIEW_ROUND_FEEDBACK`]->(ffform: FeedbackForm)-[:`FEEDBACK_QUESTION`]-(ffq: Question)
OPTIONAL MATCH (inwrkflw)-[:`INTERVIEW_WORKFLOW_FEEDBACK`]-(ff:Feedback)
OPTIONAL MATCH (iwr : User)-[:`FEEDBACK_BY`]->(ff)-[:`FEEDBACK_ANSWER`]->(answer:Answer)-[:`QUESTION_ANSWER`]->(ffq)
with collect({answer : answer.value, rating: answer.rating, question : ffq.qText, givenBy : iwr.fullName, type: ffq.questionType, givenOn: answer.lastModifiedDate}) as rnds, cc, pos, intrnd
with filter(rnd IN rnds WHERE rnd.type = 'COMMENTS') as comments, filter(rnd IN rnds WHERE rnd.type = 'LINEAR_GENERIC') as ratings, cc, pos, intrnd
with distinct collect({roundName: intrnd.name, ratings: ratings, comments: comments}) as rounds, cc, pos
return collect({cc: cc, rounds: rounds}) as data, pos.dId as posId, pos.title as posTitle
dId
is unique on every node.
The problem with this query is for small data set let's say 1000 candidates with 10 positions it will run fine. But for large data sets it it taking too long to return results. I even waited for 5 mins in neo4j console for the response but there was no response in 5 minutes.
The application will not have 1000 candidates. The candidates count will go upto 100000
for a very least and maximum I can assume to be 1 million per company.
I have tried various way to optimise this query but couldn't able to get response.
The response SLA should be within 20 seconds.
My questions are
- How can I optimise this query to get the results I want ?
- What is wrong in current query ?