0
votes

I am trying to write a verilog code to implement census transform on an image of 640X480 pixels.I wrote the complete code in behavioral form. But the code is taking too long to synthesize. I understand that the reason might be the long register arrays and loops but I am not sure how to handle that.
Here is my code:

module test(in,clk,out
    );
    input clk;
    input [7:0] in;
    output  [119:0]out;
    reg [7:0]matrix[0:639][0:479];
    //reg [119:0]win[0:10][0:10];
    reg [9:0] i = 0;
    reg [8:0] j = 0;
    reg [12:0] count = 0;
    integer p,q = 6;
    integer a,b = -6;
    reg [119:0]censusTransformedImage;
    reg [119:0]census=0;
    always@ (posedge clk)
    begin
        if(count<=6411)
            count = count+1;
    end
    always @ (posedge clk )
    begin
        if(i<=639)
        begin
            matrix[i][j]=in;
            i=i+1;
        end
        else if(i==639 && j<=479)
        begin
            i=0;
            j=j+1;
        end
        //end
    end

    always @ (posedge clk)
    begin
        if(count > 6411)
        begin
            if(p<=634)
            begin
                if(q<=479)
                begin
                    //census = 0;
                    if(a<=6)
                    begin
                        if(b<=6)
                        begin
                            if(~(a==0 && b==0))
                            census=census<<1;
                            if (matrix[p+a][q+b] > matrix[p][q])
                        census=census+1;
                                b = b+1;
                        end
                        else
                        begin
                            b=-6;
                            a=a+1;
                        end 
                    end
                    else
                    begin
                        censusTransformedImage=census;
                        census=0;
                        a=-6;
                        q=q+1;
                    end
                end
                else
                begin
                    q=0;
                    p=p+1;
                end
            end
        end
    end
   assign out = censusTransformedImage;
endmodule
1
The window size of census is 11X11. - Utkarsh Jain
Have you actually simulated this code? There is no way that it actually does what you want. So many problems, starting with: every element of matrix is going to be equal to in on every clock tick. - nguthrie
thanx @nguthire, that was a blunder of mine and i have edited the code accordingly. But the problem is still there. It is taking too long to synthesize. - Utkarsh Jain
You've probably violated some coding guidelines in your synthesis tool. Please refer to stackoverflow.com/questions/7565095/… - e19293001
You should be using non-blocking assignments (<=) in your always@(posedge) block, not blocking assignments (=). - wilcroft

1 Answers

0
votes

The synthesizer is likely to try to implement your matrix as distributed memory. That is, to use flip flops taken from the slices of the FPGA. This has to be avoided, because you would exhaust nearly all the resources of your FPGA device only to implement that piece of memory.

Instead, design your matrix memory as an independent module, with one input address (coordinates i,j), one 8-bit output data, and one 8-bit input data. Something like:

module matrix (
  input clk,
  input wire [9:0] i,
  input wire [8:0] j,
  input write_enable,
  input wire [7:0] din,
  output reg [7:0] dout
  );

  reg [7:0] M[0:307199]; // your 640x480 matrix
  wire [18:0] addr;

  assign addr = i*640+j; // let's hope the synthesizer is able to
                         // implement this without having to use
                         // an actual multiplication engine
                         // (it shouldn't need to)
  always @(posedge clk) begin
    if (write_enable == 1'b1)
      M[addr] <= din;
    dout <= M[addr];
  end
endmodule

The key point here is that on every clock cycle there is only one access to the matrix register (M), and both input and output data are registered. This way, the synthesizer will be able to implement this huge register with block RAM instead of distributed RAM, leveraging tons of slices, speeding up the synthesis process.

Of course, this also means that your controller has to be written in such a way that for every clock cycle, only one operation to your matrix can be performed, either read or write. You are not allowed to, for instance, read two different elements in the same clock cycle. If two different elements are needed in the same clock cycle (as your current code seems to do), rewrite this module so two sets of input coordinates are available, along with two output data ports. Hopefully, the synthesizer will infer a dual port memory block for it.

As a test, instruct the synthesizer to synthesize only the matrix module and watch for synthesis messages regarding M being implemented using block RAM, absorbing this and that register, etc, to make sure it won't be implemented using distributed RAM again.