I frequently have occasion to perform modular arithmetic in Haskell, where the modulus is usually large and often prime (like 2000000011). Currently, I just use functions like (modAdd m a b), (modMul m a b), (modDiv m a b) etc. But that is rather inconvenient, requiring an additional parameter to always be specified and carried around and creating my various functions in both the regular integral form and separately in mod- form.
So it occurs that it might be a good idea to create a new class something like this:
class Integral a => Mod a
m = 2000000011
instance Integral a => Num (Mod a) where
... defining (+), etc. as modulo m
Then one could just perform regular arithmetic, using regular functions, and define useful structures like
factorials :: [Mod Int]
factorials = 1:zipWith (*) factorials [1..]
But that has one problem: All values of type Mod Int necessarily must have the same modulus. However, often I need to work in multiple moduli in one program (of course always only combining values of the same modulus).
I think, but do not understand sufficiently well to be sure, that this could be overcome by something like this:
class Integral a => Mod Nat a
where Nat is a type that encodes the modulus in Peano fashion. This would be advantageous: I could have values of different moduli and the type-checker would save me from accidentally combining this value.
Is something like this feasible and efficient? Will it cause either the compiler or the RTS to try to actually construct the enormous (Succ (Succ (Succ ... repeated 2000000011 times) if I try to use that modulus, rendering the solution effectively useless? Will the RTS try to check the type match on every operation? Will the RTS representation of every value be blown up from what could otherwise be just an unboxed int?
Is there a better way?
CONCLUSION
Thanks to helpful comments from cirdec, dfeuer, user5402, and tikhon-jelvis, I learned that (unsurprisingly) I was not the first to have this idea. In particular, there is a recent paper by Kiselyov and Shan that gives an implementation and tikhon-jelvis has posted to Hackage a solution called (surprise!) modular-arithmetic, that offers even nicer semantics using fancy ghc pragmas.
OPEN QUESTION (TO ME)
What happens behind the curtain? In particular, would a million-element list of [Mod Int 2000000011] carry around an extra million copies of 2000000011 around? Or does it compile to the same code as a list of a million Int with the modulus parameter carried separately would? The latter would be nice.
ADDENDUM ON PERFORMANCE
I ran a bit of a benchmark on a current problem I am working on. The first run used an unboxed 10,000-element Int vector and performed 10,000 operations on it:
4,810,589,520 bytes allocated in the heap
107,496 bytes copied during GC
1,197,320 bytes maximum residency (1454 sample(s))
734,960 bytes maximum slop
10 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 6905 colls, 0 par 0.109s 0.101s 0.0000s 0.0006s
Gen 1 1454 colls, 0 par 0.812s 0.914s 0.0006s 0.0019s
TASKS: 13 (1 bound, 12 peak workers (12 total), using -N11)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.001s elapsed)
MUT time 2.672s ( 2.597s elapsed)
GC time 0.922s ( 1.015s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 3.594s ( 3.614s elapsed)
Alloc rate 1,800,454,557 bytes per MUT second
Productivity 74.3% of total user, 73.9% of total elapsed
For the second run, I performed the same operations on an unboxed Vector of 10,000 (Mod Int 1000000007). That made my code a little simpler, but took about 3 times as long (while having nearly identical memory profile):
4,810,911,824 bytes allocated in the heap
107,128 bytes copied during GC
1,199,408 bytes maximum residency (1453 sample(s))
736,928 bytes maximum slop
10 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 6906 colls, 0 par 0.094s 0.107s 0.0000s 0.0007s
Gen 1 1453 colls, 0 par 1.516s 1.750s 0.0012s 0.0035s
TASKS: 13 (1 bound, 12 peak workers (12 total), using -N11)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.001s elapsed)
MUT time 8.562s ( 8.323s elapsed)
GC time 1.609s ( 1.857s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 10.172s ( 10.183s elapsed)
Alloc rate 561,858,315 bytes per MUT second
Productivity 84.2% of total user, 84.1% of total elapsed
I wonder why that happens and if it could be fixed. Still, I really like the modular-arithmetic package and will use it where performance is not absolutely critical.
Given
inreflection
is Considered Unwise; the rest is really cool. – dfeuer