2
votes

I am pulling Test instance data from HPALM into Neo4j & then trying to create dashboard graphs in grafana using Neo4j Cypher queries. With every test run a new row is created in Neo4j with the same "test-id". So if I ran a test & it failed yesterday & that same test passed today then there are 2 entries/rows created for same test-id with different status=Passed/Failed & different Date/Time last modified. To create a summary I need query which will filter out duplicate test-id entries such that the most recent (distinct) last run data is shown along with the test status so total test count matches that shown in HPALM. Below is the query which is showing multiple rows with same id but different test status on different date / time of test run. I need query to sort datewise & then eliminate/filter 1st & 3rd row from result because it is old test run entry. I only want latest run entries which are row 2 & 4. row 1 & 2 have duplicate "test-id"='72534' which needs to be made distinct.

MATCH (a:HP) 
WHERE a.project='CHP' and a.type='test-instance' and a.`cycle-id` ='11743' 
RETURN distinct a.id, a.`test-id`, a.`exec-date`, a.`last-modified`, 
       a.status, a.name, a.owner 
ORDER BY a.id

╒═════╤═══════════╤═════════════╤═══════════════════╤════════╤══════╤═══════╕
│a.id │a.`test-id`│a.`exec-date`│a.`last-modified`  │a.status│a.name│a.owner│
╞═════╪═══════════╪═════════════╪═══════════════════╪════════╪══════╪═══════╡
│44079│72534      │2018-03-09   │2018-03-09 07:42:30│Failed  │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼────────┼──────┼───────┤
│44079│72534      │2018-03-20   │2018-03-20 13:01:31│Passed  │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼────────┼──────┼───────┤
│44080│72533      │2018-03-07   │2018-03-07 11:12:54│Failed  │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼────────┼──────┼───────┤
│44080│72533      │2018-03-21   │2018-03-21 10:42:11│Passed  │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼────────┼──────┼───────┤
│44081│72532      │2018-03-05   │2018-03-05 11:51:28│Passed  │(null)│g942477│
├─────┼───────────┼─────────────┼───────────────────┼────────┼──────┼───────┤
│44082│72526      │2018-03-05   │2018-03-05 11:56:54│Failed  │(null)│g942477│

The last-modified column is of type "String" & I also need help on how to sort the entries datewise before filtering duplicates.

Currently because of duplicate entries the summary response shows higher than expected counts of number of tests 'Failed' ,'no run' & 'Not completed' states as a result of below query: I also need help with updated query which gives correct summary count. (after filtering duplicates)

MATCH (a:HP)
WHERE  a.project='CHP'  and a.type='test-instance' and a.`cycle-id`='11743'
RETURN  a.status , count(*) as cnt ,a.`cycle-id`
ORDER BY a.status

╒═════════════╤═══╤════════════╕
│a.status     │cnt│a.`cycle-id`│
╞═════════════╪═══╪════════════╡
│Failed       │4  │11743       │
├─────────────┼───┼────────────┤
│No Run       │2  │11743       │
├─────────────┼───┼────────────┤
│Not Completed│5  │11743       │
├─────────────┼───┼────────────┤
│Passed       │31 │11743       │
└─────────────┴───┴────────────┘

Need sample query in to understand How to use a "Group By" & "Having" clause in neo4j in my case above.

I also need to know sample query for [DateCreated-Today(date-time)]. How to add subtract dates in String type fields?

I am using Neo4j Version:3.0.6 Let me know if you need further details.

1

1 Answers

0
votes

In Neo4j, when using aggregation functions (such as collect(), count(), and sum()), the non-aggregation variables in your WITH clause become the implicit grouping key. So in this case, you need to have the test id as the non-aggregation variable, and collect the rest.

To make sure your collection is in the right order, you'll need to sort before the collection, and then when it's collected just get the head of the collection.

The WITH clause is the key here, as it is a place where you can change what variables are in scope and perform aggregation, and then follow it up with additional operations.

MATCH (a:HP) 
WHERE a.project='CHP' and a.type='test-instance' and a.`cycle-id` ='11743' 
WITH a.`test-id` as testId, a
ORDER BY a.`exec-date` DESC, a.`last-modified` DESC
WITH testId, head(collect(a)) as a
ORDER BY a.id
RETURN a.id as id, testId, a.`exec-date` as `exec-date`, 
       a.`last-modified` as `last-modified`, a.status as status, 
       a.name as name, a.owner as owner

As for your summary query, we'll take a similar approach to ensure we only have the latest a node per exec-date and last-modified, and only then will we get the counts per status:

MATCH (a:HP)
WHERE  a.project='CHP'  and a.type='test-instance' and a.`cycle-id`='11743'
WITH a.`test-id` as testId, a
ORDER BY a.`exec-date` DESC, a.`last-modified` DESC
WITH testId, head(collect(a)) as a
RETURN a.status as status, count(a) as cnt, a.`cycle-id` as `cycle-id`
ORDER BY status

Now, all this said, your data model doesn't look very graphy to me. You may want to make sure this is a good fit for what you want to do, as graph databases excel when there is connected data, but relational databases tend to outperform them when it comes to these kinds of queries, where there are no relationship traversals or interesting graphical connections to evaluate.