3
votes

I am generating input to be compiled by Chisel. Doing it the easy way might result in suboptimal boolean expressions. For example, I tend to generate chains of nested Mux()-es, like this:

  x :=
    Mux(!a && !b && !c && d, 13,
        Mux(!a && !b && c, 3,
            Mux(!a && !b, 2,
                Mux(!a && b, temp1,
                    Mux(a && e, 11,
                        Mux(a, 0,
                            -1))))))

As you can see,

  1. some of the boolean expressions are repeated, such as "!a", so likely some optimization could be done to express the same function using fewer evaluations, such as common sub-expression elimination,

  2. tests are repeated, again, such as "!a", so likely some optimization could be done to factor that out and test it once instead, and

  3. similar to point 2 above, the expression is very deep, so likely some optimization could be done to make it more like a tree and less like a linear sequence of Mux-es.

One thing I do not do is have complex predicate expressions: every predicate is just a conjunction of terms and each term is just a var or its negation.

I could try to implement these kind of transforms in my code generator, but in doing so I would end up writing my own optimizing compiler for boolean expressions. Instead can I just generate the above and rely on the Chisel/FIRRTL toolchain to optimize boolean expressions of this level of complexity? Such expressions are likely to be about the size of the one above or up to maybe twice its size.

1

1 Answers

0
votes

The FIRRTL compiler does support Common Subexpression Elimination (CSE), but not Global Value Numbering (GVN). In effect, you can expect that most common subexpressions will be combined as you'd expect in emitted Verilog.

The FIRRTL compiler does not do mux tree optimization. The synthesis tool should be able to optimized whatever it's given, but it's sadly not always the case. Therefore, Chisel and the FIRRTL compiler choose to not do mux tree optimization to preserve the intent of the user. Commonly, the user is writing some specific Chisel intended to be optimized in a certain way by the synthesis tool. If the FIRRTL compiler reorders the mux tree and produces a quality of result (QOR) regression, that's really bad. Consider this comment for more context.

That said, if a user really wants to apply some mux reordering at the FIRRTL-level, they can write a custom FIRRTL optimization transform (that may be scoped to only the module/region they want to optimize). This could be a good optional feature of the FIRRTL compiler. This is also an option available if you're generating Chisel---it may be simpler to write an optimization over FIRRTL IR instead of in the Chisel generation library.

Now, how does this interact with the original example? Start with a slightly simplified version:

import chisel3._
import chisel3.internal.sourceinfo.UnlocatableSourceInfo

class Foo extends RawModule {

  private implicit val noInfo = UnlocatableSourceInfo

  val a = IO(Input(Bool()))
  val b = IO(Input(Bool()))
  val c = IO(Input(Bool()))
  val d = IO(Input(Bool()))
  val e = IO(Input(Bool()))

  val x = IO(Output(UInt()))

  x := Mux(!a && !b && !c && d, 1.U,
           Mux(!a && !b && c, 2.U,
               Mux(!a && !b, 3.U,
                   Mux(!a && b, 4.U,
                       Mux(a && e, 5.U,
                           Mux(a, 6.U, 0.U))))))

}

When compiled with Chisel 3.3.2 and FIRRTL 1.3.2, the following Verilog is the result:

module Foo(
  input        a,
  input        b,
  input        c,
  input        d,
  input        e,
  output [2:0] x
);
  wire  _T = ~a;
  wire  _T_1 = ~b;
  wire  _T_2 = _T & _T_1;
  wire  _T_3 = ~c;
  wire  _T_4 = _T_2 & _T_3;
  wire  _T_5 = _T_4 & d;
  wire  _T_9 = _T_2 & c;
  wire  _T_14 = _T & b;
  wire  _T_15 = a & e;
  wire [2:0] _T_16 = a ? 3'h6 : 3'h0;
  wire [2:0] _T_17 = _T_15 ? 3'h5 : _T_16;
  wire [2:0] _T_18 = _T_14 ? 3'h4 : _T_17;
  wire [2:0] _T_19 = _T_2 ? 3'h3 : _T_18;
  wire [2:0] _T_20 = _T_9 ? 3'h2 : _T_19;
  assign x = _T_5 ? 3'h1 : _T_20;
endmodule

Observations:

  1. CSE is doing it's job, e.g., ~a & ~b is put in _T_2 and reused.
  2. The mux tree structure is unmodified.

Chisel does have a reduceTree method defined for Vec which can be used to produce balanced mux trees. Also, the chain of muxes in the original example can be perhaps more scalably described with util.MuxCase (without affecting the resulting mux tree):

x := MuxCase(
  default = 0.U,
  mapping = Seq(
    (!a && !b && !c && d) -> 1.U,
    (!a && !b && c)       -> 2.U,
    (!a && !b)            -> 3.U,
    (!a && b)             -> 4.U,
    (a && e)              -> 5.U,
    (a)                   -> 6.U)
)