0
votes

Neo4j and Cypher gurus,

I am using Neo4j, Elasticsearch and Spring Data Neo4j. I have entity nodes which are related to each other. On the relationship there is a count field which is the total number of relationships between two entities. I am using the following Cypher to bring back the top 50 relationships for a entity:

MATCH (e1:Entity)-[r1:RELATED_TO]-(e2:Entity)
WHERE e1.uuid = '<ENTITY_ID>'
RETURN e1,r1,e2
ORDER BY r1.count DESC
LIMIT 50

Now what I would like to do is visualise a time based graph for the entity by bringing back the top 50 relationships for the last week (last month etc). I am not storing time series data in Neo4j, only the total count for the relationship. The time series data is stored in an Elasticsearch index with the following format.

Date, entityOrRelationshipId, startId, endId, type

Each time a relationship is updated, a row is inserted into the index with the datetime, relationshipId and entityId.

The relationship count can be searched and aggregated using the following Elasticsearch query:

GET localhost:9200/trends/_search

{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "type": "RELATIONSHIP"
                    }
                },
                {
                    "range": {
                        "date": {
                            "gte": "2020-04-01T00:00:00.000+00:00",
                            "lt": "2020-04-28T00:00:00.000+00:00"
                        }
                    }
                },
                {
                    "bool": {
                        "should": [
                            { "term": { "startId": "<ENTITY_ID>"} },
                            { "term": { "endId": "<ENTITY_ID>" } }
                        ]
                    }
                }
            ]
        }
    },
    "aggs": {
        "my_rels": {
            "terms": {
                "field": "entityOrRelationshipId",
                "size": 50
            }
        }
    }
} 

This produces the following results with counts (doc_count) for each relationship id for a certain date range:

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2273,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "my_rels": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 145,
            "buckets": [
                {
                    "key": "2fa94be4-828b-4a20-b5f8-4965d5516149",
                    "doc_count": 303
                },
                {
                    "key": "74fb5f46-a6e8-41a8-bd11-cb374324b285",
                    "doc_count": 197
                },
                {
                    "key": "dc57fdcf-ea88-4808-9310-4e09d368e743",
                    "doc_count": 178
                },
                {
                    "key": "c4fbda1f-717e-4422-bc10-66ca6a6f39d7",
                    "doc_count": 79
                },
                etc.

            ]
        }
    }
}

Using the Neo4J APOC library, how would I combine the Elasticsearch count results into my Cypher query without having to store the count values in Neo4J?

Any help would be much appreciated.

1

1 Answers

0
votes

Assuming that:

  • RELATED_TO relationships have a uuid property for the relationship id, and
  • the Entity id and the "buckets" list are passed to the query as the parameters entityId and buckets, this should work:

    UNWIND $buckets AS b
    MATCH (e1:Entity)-[r1:RELATED_TO]-(e2)
    WHERE e1.uuid = $entityId AND r1.uuid = b.key
    RETURN e1, r1, e2, b.doc_count AS count
    ORDER BY count DESC
    

The LIMIT clause is not needed, since the number of result rows will be determined by the size of the buckets list.