5
votes

Given an arbitrary tree, I can construct a subtype relation over that tree, using Schubert numbering:

constructH :: Tree a -> Tree (Type a)

where Type nests the original label, and additionally provides the data needed to perform child/parent (or subtype) checks. With Schubert Numbering, the two Int parameters are sufficient for that.

data Type a where !Int -> !Int -> a -> Type a

This leads to the binary predicate

subtypeOf :: Type a -> Type a -> Bool

I now want to test with QuickCheck that this does indeed do what I want it to do. The following property, however, does not work, because QuickCheck just gives up:

subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
  let subtypes = concatMap flatten f
  in (not $ null subtypes) ==> conjoin
     (forAll (elements subtypes) (\x → x `subtypeOf` t):(map subtypeSanity f))

If I leave out the recursive call to subtypeSanity, i.e. the tail of the list I'm passing to conjoin, the property runs fine, but tests just the root node of the tree! How can I descend into my data structure recursively without QuickCheck giving up on generating new test cases?

If needed, I could provide the code to construct the Schubert Hierarchy, and the Arbitrary instance for Tree (Type a), to provide a complete runnable example, but that would be quite a bit of code. I'm convinced that I'm just not "getting" QuickCheck, and using it in the wrong way here.

EDIT: unfortunately, the sized function does not seem to eliminate the problem here. It ends up with the same result (see comment to J. Abrahamson's answer.)

EDIT II: I ended up "fixing" my problem by avoiding the recursive step, and avoiding conjoin. We just make a list of all nodes in the tree, then test the single-node property (which worked fine from the beginning) on those.

allNodes ∷ Tree a → [Tree a]
allNodes n@(Node { subForest = f }) = n:(concatMap allNodes f)

subtypeSanity ∷ Tree (Type ()) → Gen Prop
subtypeSanity tree = forAll (elements $ allNodes tree)
  (\(Node { rootLabel = t, subForest = f }) →
    let subtypes = concatMap flatten f
    in (not $ null subtypes) ==> forAll (elements subtypes) (\x → x `subtypeOf` t))

Tweaking the Arbitrary instance for trees did not work. Here is the arbitrary instance I'm still using:

instance (Arbitrary a, Eq a) ⇒ Arbitrary (Tree (Type a)) where
  arbitrary = liftM (constructH) $ sized arbTree

arbTree ∷ Arbitrary a ⇒ Int → Gen (Tree a)
arbTree n = do
  m ← choose (0,n)
  if m == 0
    then Node <$> arbitrary <*> (return [])
    else do part ← randomPartition n m
            Node <$> arbitrary <*> mapM arbTree part

-- this is a crude way to find a sufficiently random x1,..,xm,
-- such that x1 + .. + xm = n, for any n, m, with 0 < m.
randomPartition ∷ Int → Int → Gen [Int]
randomPartition n m' = do
  let m = m' - 1
  seed ← liftM ((++[n]) . sort) $ replicateM m (choose (0,n))
  return $ zipWith (-) seed (0:seed)

I consider the problem "solved for now," but if someone could explain to me why the recursive step and/or conjoin made QuickCheck give up (after passing "only" 0 tests,) I would be more than grateful.

2

2 Answers

7
votes

When generating Arbitrary recursive structures, QuickCheck is often a bit too eager and generates sprawling, enormous random examples. These are undesirable as they usually don't better check the properties of interest and can be very slow. Two solutions are

  1. Use things like the size parameter (sized function) and frequency function to bias the generator toward small trees.

  2. Use a small-type oriented generator like those in smallcheck. These try to exhaustively generate all "small" examples and thus help to keep the size of the tree down.

To clarify the sized and frequency method of controlling generation size, here's an example RoseTree

data Rose a = It a | Rose [Rose a]

instance Arbitrary a => Arbitrary (Rose a) where
  arbitrary = frequency 
    [ (3, It <$> arbitrary)                   -- The 3-to-1 ratio is chosen, ah,
                                              -- arbitrarily...
                                              -- you'll want to tune it
    , (1, Rose <$> children)
    ]
    where children = sized $ \n -> vectorOf n arbitrary

It can be done even more simply with a different Rose formation by very carefully controlling the size of the child list

data Rose a = Rose a [Rose a]

instance Arbitrary a => Arbitrary (Rose a) where
  arbitrary = Rose <$> arbitrary <*> sized (\n -> vectorOf (tuneUp n) arbitrary)
    where tuneUp n = round $ fromIntegral n / 4.0

You could do this without referencing sized, but that gives the user of your Arbitrary instance a knob to ask for larger trees if needed.

3
votes

In case it's useful for those stumbling across this issue: when QuickCheck "gives up", it's a sign that your pre-condition (using ==>) is too hard to satisfy.

QuickCheck uses a simple rejection sampling technique: pre-conditions have no effect on the generation of values. QuickCheck generates a bunch of random values like normal. After these are generated, they're sent through the pre-condition: if the result is True, the property is tested with that value; if it's False, that value is discarded. If your pre-condition rejects most of the values QuickCheck has generated, then QuickCheck will "give up" (better to give up completely, than to make statistically dubious pass/fail claims).

In particular, QuickCheck will not attempt to produce values which satisfy a given pre-condition. It's up to you to make sure that the generator you're using (arbitrary or otherwise) produces lots of values which pass your pre-condition.

Let's see how this is manifesting in your example:

subtypeSanity :: Tree (Type ()) -> Gen Prop
subtypeSanity Node { rootLabel = t, subForest = f } =
  let subtypes = concatMap flatten f
  in (not $ null subtypes) ==> conjoin
     (forAll (elements subtypes) (`subtypeOf` t):(map subtypeSanity f))

There is only one occurance of ==>, so its precondition (not $ null subtypes) must be too hard to satisfy. This is due to the recursive call map subtypeSanity f: not only are you rejecting any Tree which has an empty subForest, you're also (due to the recursion) rejecting any Tree where the subForest contains Trees with empty subForests, and rejecting any Tree where the subForest contains Trees with subForests containing Trees with empty subForests, and so on.

According to your arbitrary instance, Trees are only nested to finite depth: eventually we will always reach an empty subForest, hence your recursive precondition will always fail, and QuickCheck will give up.