Eq testing for large DAG structures in Haskell

Question

I'm new to Haskell ( a couple of months ). I have a Haskell program that assembles a large expression DAG (not a tree, a DAG), potentially deep and with multiple merging paths (ie, the number of different paths from root to leaves is huge). I need a fast way to test these dags for equality.The default Eq derivation will just recurse, exploring the same nodes multiple times. Currently this causes my program to take 60 seconds for relatively small expressions, and not even finish for larger ones. The profiler indicates it is busy checking equality most of the time. I would like to implement a custom Eq that does not have this problem. I don't have a way to solve this problem that does not involve a lot of rewriting. So I want to hear your thoughts.

My first attempt was to 'instrument' tree nodes with a hash that I compute incrementally, using Data.Hashable.hash, as I build the tree. This approach gave me an easy way to test two things aren't equal without looking deep into the structure. But often in this DAG, because of the paths in the DAG merging, the structures are indeed equal. So the hashes are equal, and I revert to full blown equality testing.

If I had a way to do physical equality, then a lot of my problems here would go away: if they are physically equal, then that's it. Otherwise if the hash is different then that's it. Only go deeper if they are physically not the same, but their hash agrees.

I could also imitate git, and compute a SHA1 per node to decide if they are equal period (no need to recurse). I know for a fact that this would help, because If I let equality be decided fully in terms of hash equality, then the program runs in tens milliseconds for the largest expressions. This approach also has the nice advantage that if for some reason there are two equal dags are not physically equal but are content-equal, I would be able to detect it fast in that case as well. (With Ids, Id still have to do a traversal at that point). So I like the semantics more.

This approach, however involves a lot more work than just calling the Data.Hashable.hash function, because I have to derive it for every variant of the dag node type. And moreover, I have multiple dag representations, with slightly different node definitions, so I would need to basically do this hashing trick thing twice or more if I decide to add more representations.

What would you do?

Are you asking how to define (==) with a custom function, or what custom function best suits your needs? The question is a bit bloated/confused as it stands now... btw, I don't think "physical equality" as you know it from other languages exists in haskell, you simply define your class Eq a where (==) .... and that's it... — mb21
@mb21 He is either asking how to define an efficient instance for Eq or how to define a data structure for his DAG for which the default instance for Eq is efficient... not sure which one of the two, and given no code we can provide little help. — Bakuriu

Paul Johnson Paul Johnson · Accepted Answer · 2016-06-26T19:34:50

Part of the problem here is that Haskell has no concept of object identity, so when you say you have a DAG where you refer to the same node twice, as far as Haskell is concerned its just two values in different places on a tree. This is fundamentally different from the OO concept where an object is indexed by its location in memory, so the distinction between "same object" and "different objects with equal fields" is meaningful.

To solve your problem you need to detect when you are visiting the same object that you saw earlier, and in order to do that you need to have a concept of "same object" that is independent of the value. There are two basic ways to attack this:

Store all your objects in a vector (i.e. an array), and use the vector index as an object identity. Replace values with indices throughout your data structure.
Give each object a unique "identity" field so you can tell if you have seen this one before when traversing the DAG.

The former is how the Data.Graph module in the containers package does it. One advantage is that, if you have a single mapping from DAG to vector, then DAG equality becomes just vector equality.

Eq testing for large DAG structures in Haskell

2 Answers