0
votes

I have a large graph (millions of nodes/edges, no parallel edges or loops) in my neo4j database where nodes represent persons and edges represent friendship. I wish to find all maximal cliques (considering edges as undirected) with at least k nodes.

A maximal clique is a clique that cannot be extended by including one more adjacent vertex, meaning it is not a subset of a larger clique

I looked into neo4j's Graph Data Science library, but could only find an algorithm for triangle cliques. Also, there seems to be a procedure apoc.algo.cliques in apoc v3.5, but it was marked as a work in progress. I didn't find any clique algorithm in the recent versions of apoc.

Is there some cypher query or neo4j plugin that could help me? Any other way to approach this?

1
@ravenspoint Yes, I did. But these are exponential time algorithms. I am not sure if it would be wise to implement them for large graphs. I was hoping for some pre-built plugin or query because they tend to be optimized for performance.mrpandey
Your faith in the folks who code libraries is really quite touching.ravenspoint
I am pretty new to neo4j and I don't know the underlying architecture. So I am inclined to trust their work.mrpandey
"To retain respect for sausages and libraries, one must not watch them in the making." - Bismarkravenspoint

1 Answers

0
votes

Any other way to approach this?

I would write my own code. Since I prefer C++ I came up with this:

void cPathFinder::cliques()
{
    // working copy on input graph
    auto work = myGraph;

    // store for maximal clique collection
    std::vector<std::vector<int>> vclique;

    while (1)
    {
        std::vector<int> clique;

        while (work.nodeCount())
        {
            if (!clique.size())
            {
                // start by moving an arbitrary node to the clique from the work graph
                auto nit = work.nodes().begin();
                clique.push_back(nit->first);
                work.removeNode(nit->first);
                continue;
            }
            bool found = false;

            // loop over nodes remaining in work graph
            for (auto &u : work.nodes())
            {
                // loop over nodes in clique
                for (int v : clique)
                {
                    if (work.includes_link(v, u.first) ||
                        work.includes_link(u.first, v))
                    {
                        // found node in work that is connected to clique nodes.
                        // move it to clique
                        clique.push_back(u.first);
                        work.removeNode(u.first);
                        found = true;
                        break;
                    }
                }
                if (found)
                    break;  // found a node to add to clique
            }
            if (!found)
                break;      // no more nodes can be added, the clique is maximal
        }

        if (!clique.size())
            break;          // did not find a clique

        // add to collection of maximal cliques
        vclique.push_back(clique);
    }

    // Display results
    for (auto &c : vclique)
    {
        std::cout << "clique: ";
        for (int n : c)
            std::cout << myGraph.name(n) << " ";
        std::cout << "\n";
    }
}

For this input

l 1 5
l 1 3
l 2 8
l 2 6
l 3 1
l 3 7
l 4 8
l 4 6
l 5 1
l 5 7
l 6 4
l 6 2
l 7 3
l 7 5
l 8 2
l 8 2

I get this output

clique: 5 1 3 7 clique: 8 2 6 4

To check this, here is what graphViz produces with the same input

enter image description here

You can find the complete code here.