2
votes

I want to use the output of another module inside an always block. Currently the only way to make this code work is by adding #1 after the pi_in assignment so that enough time has passed to allow Pi to finish.

Relevant part from module pLayer.v:

Pi pi(pi_in,pi_out);

always @(*)
begin

    for(i=0; i<constants.nSBox; i++) begin
        for(j=0; j<8; j++) begin
            x               = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
            pi_in           = 8*i+j;#1; /* wait for pi to finish */
            PermutedBitNo   = pi_out;
            y               = PermutedBitNo>>3;

            tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
        end
    end
    state_out = tmp;
end

Modllue Pi.v

`include "constants.v" 

module Pi(in, out);
input  [31:0]   in;
output [31:0]   out;
reg [31:0] out;

always @* begin
    if (in != constants.nBits-1) begin
        out = (in*constants.nBits/4)%(constants.nBits-1);
    end else begin
        out = constants.nBits-1;
    end
end
endmodule

Delays should not be used in the final implementation, so is there another way without using #1?

In essence i want PermutedBitNo = pi_out to be evaluated only after the Pi module has finished its job with pi_in (=8*i+j) as input. How can i block this line until Pi has finished?

Do i have to use a clock? If that's the case, please give me a hint.

update:

Based on Krouitch suggestions i modified my modules. Here is the updated version:

From pLayer.v:

    Pi pi(.clk (clk),
      .rst (rst),
      .in  (pi_in),
      .out (pi_out));

counter c_i (clk, rst, stp_i, lmt_i, i);
counter c_j (clk, rst, stp_j, lmt_j, j);

always @(posedge clk)
begin
    if (rst) begin
        state_out = 0;
    end else begin
        if (c_j.count == lmt_j) begin
            stp_i = 1;
        end else begin
            stp_i = 0;
        end

        // here, the logic starts
        x               = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
        pi_in           = 8*i+j;
        PermutedBitNo   = pi_out;
        y               = PermutedBitNo>>3;
        tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);

        // at end
        if (i == lmt_i-1)
            if (j == lmt_j) begin
                state_out = tmp;
            end
    end
end
endmodule

module counter(
  input  wire clk,
  input  wire rst,
  input  wire stp,
  input  wire [32:0] lmt,
  output reg  [32:0] count
);

always@(posedge clk or posedge rst)
    if(rst)
        count <= 0;
    else if (count >= lmt)
        count <= 0;
    else if (stp)
        count <= count + 1;
endmodule

From Pi.v:

always @* begin
    if (rst == 1'b1) begin
        out_comb = 0;
    end
    if (in != constants.nBits-1) begin
        out_comb = (in*constants.nBits/4)%(constants.nBits-1);
    end else begin
        out_comb = constants.nBits-1;
    end
end

always@(posedge clk) begin
    if (rst)
        out <= 0;
    else
        out <= out_comb;
end
1

1 Answers

3
votes

That's a nice piece of software you have here...

The fact that this language describes hardware is not helping then.

In verilog, what you write will simulate in zero time. it means that your loop on i and j will be completely done in zero time too. That is why you see something when you force the loop to wait for 1 time unit with #1.

So yes, you have to use a clock.

For your system to work you will have to implement counters for i and j as I see things.

A counter synchronous counter with reset can be written like this:

`define SIZE 10
module counter(
  input  wire clk,
  input  wire rst_n,
  output reg [`SIZE-1:0] count
);

always@(posedge clk or negedge rst_n)
  if(~rst_n)
    count <= `SIZE'd0;
  else
    count <= count + `SIZE'd1;
endmodule

You specify that you want to sample pi_out only when pi_in is processed. In a digital design it means that you want to wait one clock cycle between the moment when you are sending pi_in and the moment when you are reading pi_out.

The best solution, in my opinion, is to make your pi module sequential and then consider pi_out as a register.

To do that I would do the following:

module Pi(in, out);
input           clk;
input  [31:0]   in;
output [31:0]   out;
reg [31:0]  out;
wire        clk;
wire [31:0] out_comb;
always @* begin
  if (in != constants.nBits-1) begin
    out_comb = (in*constants.nBits/4)%(constants.nBits-1);
  end else begin
    out_comb = constants.nBits-1;
  end
end

always@(posedge clk)
  out <= out_comb;

endmodule

Quickly if you use counters for i and j and this last pi module this is what will happen:

  1. at a new clock cycle, i and j will change --> pi_in will change accordingly at the same time(in simulation)
  2. at the next clock cycle out_comb will be stored in out and then you will have the new value of pi_out one clock cycle later than pi_in

EDIT

First of all, when writing (synchronous) processes, I would advise you to deal only with 1 register by process. It will make your code clearer and easier to understand/debug.

Another tip would be to separate combinatorial circuitry from sequential. It will also make you code clearer and understandable.

If I take the example of the counter I wrote previously it would look like :

`define SIZE 10
module counter(
  input  wire clk,
  input  wire rst_n,
  output reg [`SIZE-1:0] count
);

//Two way to do the combinatorial function
//First one
wire [`SIZE-1:0] count_next;
assign count_next = count + `SIZE'd1; 

//Second one  
reg [`SIZE-1:0] count_next;
always@*
  count_next = count + `SIZE'1d1;


always@(posedge clk or negedge rst_n)
  if(~rst_n)
    count <= `SIZE'd0;
  else
    count <= count_next;


endmodule

Here I see why you have one more cycle than expected, it is because you put the combinatorial circuitry that controls your pi module in you synchronous process. It means that the following will happen :

  1. first clk positive edge i and j will be evaluated
  2. next cycle, the pi_in is evaluated
  3. next cycle, pi_out is captured

So it makes sense that it takes 2 cycles.

To correct that you should take out of the synchronous process the 'logic' part. As you stated in your commentaries it is logic, so it should not be in the synchronous process.

Hope it helps