Parallel multiplier-accumulator based on radix-4 Modified booth algorithm

Question

I am designing a multiplier accumulator for signed numbers based on the above mentioned architecture. I have written modules for the booth encoder which generates the partial products and for the carry save accumulator and both are working properly. Now, in the final module written to integrate these sub-parts, I want the mac to accept two inputs in the first clock cycle, produce the partial products and pass them into the carry save adder which accumulates the result of the previous multiplication along with the present one. The result will be stored and displayed in a second register in the next clock cycle. Initially, all registers are reset to 0. The carry save accumulator is based on figure A.6 in the following link: http://infolab.stanford.edu/pub/cstr/reports/csl/tr/94/617/CSL-TR-94-617.appendix.pdf. The final level consists of a carry look ahead adder and outputs the accumulated result. The relevant part of the code is:

     CSA_hope csahope (znew, zcnew, pv[0][8:0], pv[1][8:0], pv[2][8:0], pv[3][8:0], 
     sasa, product, xy[19:16], dealsign);//pv=partial products, znew=output of csa,
     product=final accumulated result, xy=input values
always @ (posedge clk)
   begin
   if (reset)
      begin
          xy <= 20'b0;
          product <= 16'b0;
          sasa <= 2'b0;
          dealsign <= 5'b0;

      end
   else
      begin
          dealsign[0] = ~(multiplicand[7] ^ pv[0][8]);
          dealsign[1] = ~(multiplicand[7] ^ pv[1][8]);
          dealsign[2] = ~(multiplicand[7] ^ pv[2][8]);
          dealsign[3] = ~(multiplicand[7] ^ pv[3][8]);
          dealsign[4] = (multiplicand[7] ^ pv[0][8]);

          xy <= {N, multiplicand, multiplier};
          sasa <= 2'b11;
          product <= znew;
      end
  end

The registers sasa and dealsign initially contain zeros when reset=1, and as soon as reset=0, they are supposed to take up the values of '1' and 'E' respectively for the carry save accumulator(refer fig A.6). However, this does not happen and they consume an extra clock cycle to change their values to 1 and E, and hence erroneous result is being produced. Here is the testbench I have written for the code:

    always
#5 clk = !clk;
initial
begin
    $monitor ($time," clk=%b reset=%b x=%d  y=%d xy=%b p0=%b p1=%b p2=%b p3=%b znew=%b  product=%b(%d) 
     dealsign=%b sasa=%b\n",clk, reset, 
    multiplicand, multiplier, fmac.xy,fmac.pv[0][8:0],fmac.pv[1][8:0],fmac.pv[2]   [8:0],fmac.pv[3][8:0], fmac.znew,product,product,fmac.dealsign,fmac.sasa);
    #0 clk = 0; multiplicand = 10; multiplier = 19; reset = 1; 
    #10 reset = 0; N = 4'b0001;
    #30  multiplicand = 11; multiplier = 13; N = 4'b0010;
    #50 $finish;
end

So, the requisite value of dealsign is coming at t=25, rather than at t=15 and hence, the product at t=25 comes out to be 0000001010111110 (702) instead of 0000000010111110 (190). Can someone please help me debug this code or suggest an alternate way of going about it?

Your lines similar to dealsign[0] = should be changed to dealsign[0] <=. — Morgan
What is t, if time how is digital logic in an RTL simulation taking anything other than 0 time. If module CSA_hope contains timing information, then that will imply the max frequency it can be run at. — Morgan

Tim Tim · Accepted Answer · 2014-01-09T16:19:29

Ideally you should not be changing your input signals at the exact edge of the clock from a different block.

While your simulation should still work, you have a race condition at time #10, since it is not deterministic what happens first: the clock edge changing or the input values changing.

Since you always have the clock toggling at #5 and #10, it would be best of you scheduled all your inputs at some non-multiple of this (schedule your input changes at #11, #41, #91, etc).

Then you will not have a race condition and it will be much easier to understand what is happening when looking at waves.

Parallel multiplier-accumulator based on radix-4 Modified booth algorithm

1 Answers