
I am trying to test some queries on some neo4j databases, different in the amount of data. If I test that queries on small amount of data, everything goes right and execution time is small, but when I start execute queries on database with 2794 nodes and 94863 relations, It will take a long time to get following error in Neo4j API: Java heap space Neo.DatabaseError.General.UnknownFailure enter image description here First query:

    MATCH (u1:User)-[r1:Rated]->(m:Movie)<-[r2:Rated]-(u2:User)
WITH 1.0*SUM(r1.Rate)/count(r1) as pX, 
1.0*SUM(r2.Rate)/count(r2) as pY, u1, u2
MATCH (u1:User)-[r1:Rated]->(m:Movie)<-[r2:Rated]-(u2:User)
WITH SUM((r1.Rate-pX)*(r2.Rate-pY)) as pomProm,
SQRT(SUM((r1.Rate-pX)^2)) as sumX, 
SQRT(SUM((r2.Rate-pY)^2)) as sumY, pX,pY,u1,u2
SET s.value = pomProm / (sumX * sumY)

And second query

    MATCH (u1:User)-[r1:Rated]->(m:Movie)<-[r2:Rated]-(u2:User)
WITH SUM(r1.Rate * r2.Rate) AS pomProm,
SQRT(REDUCE(r1Pom = 0, i IN COLLECT(r1.Rate) | r1Pom + toInt(i^2))) AS r1V,
SQRT(REDUCE(r2Pom = 0, j IN COLLECT(r2.Rate) | r2Pom + toInt(j^2))) AS r2V,
u1, u2
SET s.value = pomProm / (r1V * r2V)

Data in database are generated from following Java code:

public enum Labels implements Label {
    Movie, User

public enum RelationshipLabels implements RelationshipType {

public static void main(String[] args) throws IOException, BiffException {
    Workbook workbook = Workbook.getWorkbook(new File("C:/Users/User/Desktop/DP/dvdlist.xls"));
    Workbook names = Workbook.getWorkbook(new File("C:/Users/User/Desktop/DP/names.xls"));
    String path = new String("C:/Users/User/Documents/Neo4j/test7.graphDatabase");
    GraphDatabaseFactory dbFactory = new GraphDatabaseFactory();
    GraphDatabaseService db = dbFactory.newEmbeddedDatabase(path);
    int countMovies = 0;
    int numberOfSheets = workbook.getNumberOfSheets();
    IndexDefinition indexDefinition;
    try (Transaction tx = db.beginTx()) {
        Schema schema = db.schema();
        indexDefinition = schema.indexFor(DynamicLabel.label(Labels.Movie.toString()))
    try (Transaction tx = db.beginTx()) {
        Schema schema = db.schema();
        indexDefinition = schema.indexFor(DynamicLabel.label(Labels.User.toString()))
    try (Transaction tx = db.beginTx()) {

        for (int i = 0; i < numberOfSheets; i++) {
            Sheet sheet = workbook.getSheet(i);
            int numberOfRows = 6000;//sheet.getRows();
            for (int j = 1; j < numberOfRows; j++) {
                Cell cell1 = sheet.getCell(0, j);
                Cell cell2 = sheet.getCell(9, j);
                Node movie = db.createNode(Labels.Movie);
                movie.setProperty("Name", cell1.getContents());
                movie.setProperty("Genre", cell2.getContents());



    } catch (Exception e) {
        System.out.println("Something goes wrong!");

    Random random = new Random();
    int countUsers = 0;
    Sheet sheetNames = names.getSheet(0);
    Cell cell;
    Node user;

    int numberOfUsers = 1500;//sheetNames.getRows();
    for (int i = 0; i < numberOfUsers; i++) {
        cell = sheetNames.getCell(0, i);
        try (Transaction tx = db.beginTx()) {
            user = db.createNode(Labels.User);
            user.setProperty("Name", cell.getContents());
            List<Integer> listForUser = new ArrayList<>();

            for (int x = 0; x < 1000; x++) {
                int j = random.nextInt(countMovies);
                if (!listForUser.isEmpty()) {
                    if (!listForUser.contains(j)) {
                } else {
            for (int j = 0; j < listForUser.size(); j++) {
                Node movies = db.getNodeById(listForUser.get(j));
                int rate = 0;

                rate = random.nextInt(10) + 1;

                Relationship relationship = user.createRelationshipTo(movies, RelationshipLabels.Rated);
                relationship.setProperty("Rate", rate);

            System.out.println("Number of user: " + countUsers);
        } catch (Exception e) {
            System.out.println("Something goes wrong!");



Does anyone know, how to solve this issue? Or there is some walkaround, how to get results from that queries on a database with a large amount of data? Or some query or settings improvement? I will really appreciate it.


2 Answers


I had a similar problem (in version 4.1) and the properties could be found in conf/neo4j.conf or select active database -> Manage -> Settings and increase:


More details about performance could be found in documentation


You may need to configure the amount of memory available to Neo4j. You can configure Neo4j server heap size by editing conf/neo4j-wrapper.conf:


See this page for more info.

However, looking at your queries (which are doing graph global all-pairs operations) you might want to consider doing them in batches. For example:

// Find users with overlapping movie ratings
MATCH (u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
// only for users whose similarity has not yet been calculated
WHERE NOT exists((u1)-[:SIMILARITY]-(u2))
// consider only up to 50 pairs of users
WITH u1, u2 LIMIT 50
// compute similarity metric and set SIMILARITY relationship with coef

Then execute this query repeatedly until you have computed the similarity metric for all users with overlapping movie ratings.