How does agda pattern match with same typed constructors?

Question

I have started working my way through the Programming Language Foundations in Agda book and have become confused while attempting to perform the exercise which asks you to write an inc function which increments a binary number represented by the following Bin type:

data Bin : Set where
  ⟨⟩ : Bin
  _O : Bin → Bin
  _I : Bin → Bin

I started writing my inc function as

inc_ : Bin → Bin
inc ⟨⟩ = ⟨⟩
inc (x O) = {!!}
inc (x I) = {!!}

and then thought: "How does Agda know if a value of type Bin can be matched and destructed into x O and x I without ambiguity if the _O and _I constructors have the same type?". The more I thought about this the more confused I became, if we take the Nat example:

data ℕ : Set where
  zero : ℕ
  succ : ℕ → ℕ

the zero and succ constructors both produce a value of type ℕ, so how does Agda know whether some particular value of type ℕ would match the patterns I have in place for functions such as _+_?

_+_ : ℕ → ℕ → ℕ
zero + m = m
succ n-1 + m = succ (n-1 + m)

It seems to me the only way this could work is if Agda keeps track of the constructor used to create each value, and then allows for using that same constructor in pattern matching?

I am very new to Agda and trying to wrap my head around the relationship between types, values, constructors and pattern matching. I would very much appreciate an explanation of how these relate with reference to the Bin and ℕ examples in my question. I have tried reading some additional articles/books/lecture sides and the Agda docs but I could not find an answer to this question.

Could you give us some idea of your background? Are you a programmer, a mathematician, or something else? This might help someone (maybe me) give an appropriate and relatable answer. — mudri
My background is as a programmer, at uni I did an introductory unit on discrete structures if that helps. — HennyH

mudri mudri · Accepted Answer · 2021-02-14T22:22:22

This issue first arises with a very simple type – the Booleans:

data Bool : Set where
  true : Bool
  false : Bool

We can write functions out of Bool by pattern matching, for example:

not : Bool → Bool
not true = false
not false = true

Bool, like similar types in other programming languages, is a type with two canonical inhabitants: true and false. A Bool value stores 1 bit of information. At run time, and also for the sake of computation at type checking time, we could imagine that there literally is a bit in the specified memory location indicating whether the value is true or false.

In terms of data declarations, this bit comes about because Bool has two constructors. Similarly, ℕ has two constructors, so that will also contain a bit indicating whether the value is a zero or suc (and if it is a suc, there will also be a pointer to the rest of the number). For Bin, we will store a 0, 1, or 2, to indicate whether we have a ⟨⟩, _O, or _I.

It is this extra information (one or two bits for the examples here) that pattern matching relies on. In fact, types are typically erased, so pattern matching couldn't use them. As such, not is essentially compiled to something like the following pseudo-C.

int* not(int* b) {
  switch (*b) {                    // Look at the tag of b.
    case 0:                          // If b is true,
      int* r = malloc(sizeof(int));    // Allocate a new boolean;
      *r = 1;                          // Set the value of the new boolean to false;
      return r;                        // Return the new boolean.
    case 1:                          // If b is false,
      int* r = malloc(sizeof(int));    // Allocate a new boolean;
      *r = 0;                          // Set the value of the new boolean to false;
      return r;                        // Return the new boolean.
  }
}

Meanwhile, the _+_ function on ℕ will be compiled into something like:

int* add(int* n, int* m) {
  switch (*n) {                        // Look at the tag of n.
    case 0:                              // If n is zero,
      return m;                            // Return m.
    case 1:                              // If n is a suc,
      int* r = malloc(2 * sizeof(int));    // Allocate space for a suc cell;
      *r = 1;                              // Indicate that it is a suc by setting the tag to 1;
      *(r+1) = add(n+1, m);              // After the tag is a pointer, set to the result of the recursive call to add.
      return r;                            // Return the new ℕ.
  }
}

Note that n+1 here refers to the memory location after n's tag, i.e, the pointer to the predecessor of n (n-1 in the Agda code). I'm assuming for simplicity that sizeof(int) = sizeof(int*), and it's fine to store pointers in the int type. The key detail is that we always tag data values with which constructor they are made of, and having done this, branching via pattern matching amounts to looking at this tag.

How does agda pattern match with same typed constructors?

1 Answers