3
votes

I'm looking to implement a version of Generator combinators (e.g. analogous to those in ScalaCheck or Haskell's QuickCheck), in which a Generator contains an instance of Rand, a monad representing a probability distribution (taken from the breeze library). Since it is a monad, Rand implements map and flatMap. As is common, I also want to implement Gen as a monad. As shown below, the implementation of map for Gen is straightforward:

// Rand is from the breeze library
trait Rand[T] {
    def map[U](f: T => U): Rand[U]
    def flatMap[U](f: T => Rand[U]): Rand[U]
}

case class Gen[T](dist: Rand[T]) {
  def map[U](f: T => U): Gen[U] = Gen(dist.map { f })

  def flatMap[U](f: T => Gen[U]): Gen[U] = {
    // How to implement this?
  }
}

However, it is not clear to me how flatMap should be implemented. Is this easily achieved, or does it (for example) require a level of indirection via some intermediate datatype?

2

2 Answers

1
votes

A possible implementation could be

def flatMap[U](f: T => Gen[U]): Gen[U] = 
  Gen (dist.flatMap {f(_).dist})
1
votes

Are the intended semantics of Gen something like the following?

val UnfairCoin = Gen(Rand(Heads -> 0.9, Tails -> 0.1))
val UnfairDieHi = Gen(Rand(1 -> 0.0, 2 -> 0.0, 3 -> 0.0, 4 -> 0.0, 5 -> 0.5. 6 -> 0.5))
val UnfairDieLo = Gen(Rand(1 -> 0.25, 2 -> 0.25, 3 -> 0.25, 4 -> 0.25, 5 -> 0.0, 6 -> 0.0))

def headsHiTailsLo(coinFlip: CoinFlip) = coinFlip match {
  case Heads => UnfairDieHi
  case Tails => UnfairDieLo
}

val flipAndRoll = UnfairCoin flatMap { headsHighTailsLo _ }

where:

flipAndRoll == Gen(1 -> 0.025, 2 -> 0.025, 3 -> 0.025, 4 -> 0.025, 5 -> 0.45, 5 -> 0.45)

If that's right, then the output of Gen.flatMap will either need to generate two random samples, one from each distribution, or Rand must provide a way to calculate the joint distribution and project it to U.

I took a quick look at the documentation for Rand.flatMap, and it appears that it generates a Rand that does the two consecutive random samples, although it's not entirely clear. If that's right, then chi's answer is your winner.

If the two random variables are not independent, then you might have some work to do.