1
votes

I have a neo4j database and I use embedded mode. There are millions of nodes with multiple labels with each node. I can get all the nodes with single label like


    GlobalGraphOperations gb = GlobalGraphOperations.at(graphDb);
    ResourceIterable iterable = gb.getAllNodesWithLabel(DynamicLabel.label("LABEL1"));

This is fine. Now I want to do the same thing but with multiple labels. I want all the nodes which have "LABEL1" and "LABEL2" and "LABEL3" and so on.

2

2 Answers

2
votes

Internally Neo4j maintains a labelscanstore that gives you quickly an iterator for all nodes with a given label - but there's no such scan store for combination of labels.

If you want to find all nodes sharing multiple labels the strategy is to iterate over all nodes for the "cheapest" label - aka the one with the least number of nodes - and filter that for the other labels.

The code snippet below uses a try-with-resources and a JDK 8 lambda (n case of < JDK8 just create a class implementing Predicate. I'm assuming LABEL1 is the label with the fewest nodes:

import org.neo4j.graphdb.*;
import org.neo4j.helpers.Predicate;
import org.neo4j.helpers.collection.FilteringIterator;

...

try (ResourceIterator<Node> nodes = 
    graphDatabaseService.findNodes(DynamicLabel.label("LABEL1"))) {

    Iterator<Node> nodeWithAllLabels = new FilteringIterator<>(nodes,
            node -> node.hasLabel(DynamicLabel.label("LABEL2")) && 
                    node.hasLabel(DynamicLabel.label("LABEL3"))
    );

    // do stuff with nodeWithAllLabels
}
2
votes

You could execute a Cypher query. Here is a code snippet:

Map<String, Object> params = new HashMap<String, Object>();
params.put( "required", Arrays.asList( "LABEL1", "LABEL2", "LABEL3" ) );
String query = "MATCH (n) WHERE ALL(x IN {required} WHERE x IN LABELS(n)) RETURN n";
Result result = db.execute( query, params );

[UPDATE]

However, the above query would iterate through all the nodes, which is not performant.

Thanks to @StefanArmbruster's suggestion, we can make the above query more efficient by specifying the least likely label in the MATCH clause (to take advantage of the internally-generated node label index):

Map<String, Object> params = new HashMap<String, Object>();
params.put( "otherLabels", Arrays.asList( "LABEL2", "LABEL3" ) );
String query = "MATCH (n:LABEL1) WHERE ALL(x IN {otherLabels} WHERE x IN LABELS(n)) RETURN n";
Result result = db.execute( query, params );