9
votes

I frequently have occasion to perform modular arithmetic in Haskell, where the modulus is usually large and often prime (like 2000000011). Currently, I just use functions like (modAdd m a b), (modMul m a b), (modDiv m a b) etc. But that is rather inconvenient, requiring an additional parameter to always be specified and carried around and creating my various functions in both the regular integral form and separately in mod- form.

So it occurs that it might be a good idea to create a new class something like this:

class Integral a => Mod a

m = 2000000011

instance Integral a => Num (Mod a) where
   ... defining (+), etc. as modulo m

Then one could just perform regular arithmetic, using regular functions, and define useful structures like

factorials :: [Mod Int]
factorials = 1:zipWith (*) factorials [1..]

But that has one problem: All values of type Mod Int necessarily must have the same modulus. However, often I need to work in multiple moduli in one program (of course always only combining values of the same modulus).

I think, but do not understand sufficiently well to be sure, that this could be overcome by something like this:

class Integral a => Mod Nat a

where Nat is a type that encodes the modulus in Peano fashion. This would be advantageous: I could have values of different moduli and the type-checker would save me from accidentally combining this value.

Is something like this feasible and efficient? Will it cause either the compiler or the RTS to try to actually construct the enormous (Succ (Succ (Succ ... repeated 2000000011 times) if I try to use that modulus, rendering the solution effectively useless? Will the RTS try to check the type match on every operation? Will the RTS representation of every value be blown up from what could otherwise be just an unboxed int?

Is there a better way?

CONCLUSION

Thanks to helpful comments from cirdec, dfeuer, user5402, and tikhon-jelvis, I learned that (unsurprisingly) I was not the first to have this idea. In particular, there is a recent paper by Kiselyov and Shan that gives an implementation and tikhon-jelvis has posted to Hackage a solution called (surprise!) modular-arithmetic, that offers even nicer semantics using fancy ghc pragmas.

OPEN QUESTION (TO ME)

What happens behind the curtain? In particular, would a million-element list of [Mod Int 2000000011] carry around an extra million copies of 2000000011 around? Or does it compile to the same code as a list of a million Int with the modulus parameter carried separately would? The latter would be nice.

ADDENDUM ON PERFORMANCE

I ran a bit of a benchmark on a current problem I am working on. The first run used an unboxed 10,000-element Int vector and performed 10,000 operations on it:

   4,810,589,520 bytes allocated in the heap
         107,496 bytes copied during GC
       1,197,320 bytes maximum residency (1454 sample(s))
         734,960 bytes maximum slop
              10 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6905 colls,     0 par    0.109s   0.101s     0.0000s    0.0006s
  Gen  1      1454 colls,     0 par    0.812s   0.914s     0.0006s    0.0019s

  TASKS: 13 (1 bound, 12 peak workers (12 total), using -N11)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    2.672s  (  2.597s elapsed)
  GC      time    0.922s  (  1.015s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    3.594s  (  3.614s elapsed)

  Alloc rate    1,800,454,557 bytes per MUT second

  Productivity  74.3% of total user, 73.9% of total elapsed

For the second run, I performed the same operations on an unboxed Vector of 10,000 (Mod Int 1000000007). That made my code a little simpler, but took about 3 times as long (while having nearly identical memory profile):

   4,810,911,824 bytes allocated in the heap
         107,128 bytes copied during GC
       1,199,408 bytes maximum residency (1453 sample(s))
         736,928 bytes maximum slop
              10 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6906 colls,     0 par    0.094s   0.107s     0.0000s    0.0007s
  Gen  1      1453 colls,     0 par    1.516s   1.750s     0.0012s    0.0035s

  TASKS: 13 (1 bound, 12 peak workers (12 total), using -N11)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    8.562s  (  8.323s elapsed)
  GC      time    1.609s  (  1.857s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time   10.172s  ( 10.183s elapsed)

  Alloc rate    561,858,315 bytes per MUT second

  Productivity  84.2% of total user, 84.1% of total elapsed

I wonder why that happens and if it could be fixed. Still, I really like the modular-arithmetic package and will use it where performance is not absolutely critical.

2
You might be interested in the paper Functional Pearl: Implicit Configuration and the corresponding reflection package.Cirdec
Do note that everything to do with Given in reflection is Considered Unwise; the rest is really cool.dfeuer
Am reading the paper now. Very interesting. Thanks to Cirdec and dfeuer. This may be just what I was looking for. My main remaining point of confusion is whether it would be possible to construct Kiselyov/Shan modular numbers without the strangeness of considering them functions of a discarded argument.CarlEdman

2 Answers

4
votes

Newer versions of GHC have type-level numbers built in, which should be more efficient than ones you roll yourself using Peano arithmetic. You can use them by enabling DataKinds. As a bonus, you'd also get some nice syntax:

factorials :: [Mod Int 20]

Whether this is efficient or not depends on how you implement the Mod type. In the most general case, you probably want to just mod after every arithmetic operation. Unless you're in a hot loop where saving a handful of instructions matters, this should be fine. (And inside the hot loop, it's probably better to be explicit about when you mod anyhow.)

I actually implemented this type in a library on Hackage: modular-arithmetic. It has a test suite but no benchmarks so I can't vouch for the absolute performance, but it's not doing anything that should be slow and it was fast enough for my purposes. (Which, admittedly, involved small moduli.) If you try it and run into performance issues, I'd love to hear about them so I can try to fix them.

4
votes

Here is some working code using Data.Reflection:

{-# LANGUAGE Rank2Types #-}
{-# LANGUAGE FlexibleContexts #-}

import Data.Reflection
import Data.Proxy

data M a s = M a -- Note the phantom comes *after* the concrete

-- In `normalize` we're tying the knot to get the phantom types to align
-- note that reflect :: Reifies s a => forall proxy. proxy s -> a

normalize :: (Reifies s a, Integral a) => a -> M a s
normalize a = b where b = M (mod a (reflect b)) 

instance (Reifies s a, Integral a) => Num (M a s) where
  M a + M b = normalize (a + b)
  M a - M b = normalize (a - b)
  M a * M b = normalize (a * b)
  fromInteger n = normalize (fromInteger n)
  abs _     = error "abs not implemented"
  signum _  = error "sgn not implemented"

withModulus :: Integral a => a -> (forall s. Reifies s a => M a s) -> a
withModulus m ma = reify m (runM . asProxyOf ma)
  where asProxyOf :: f s -> Proxy s -> f s
        asProxyOf a _ = a

runM :: M a s -> a
runM (M a) = a

example :: (Reifies s a, Integral a) => M a s
example = normalize 3

example2 :: (Reifies s a, Integral a, Num (M a s)) => M a s
example2 = 3*3 + 5*5

mfactorial :: (Reifies s a, Integral a, Num (M a s)) => Int -> M a s
mfactorial n = product $ map fromIntegral [1..n]

test1 p n = withModulus p $ mfactorial n

madd :: (Reifies s Int, Num (M Int s)) => M Int s -> M Int s -> M Int s
madd a b = a + b

test2 :: Int -> Int -> Int -> Int
test2 p a b = withModulus p $ madd (fromIntegral a) (fromIntegral b)