0
votes

This a general FPGA design question, I'm kind of new to FPGA design and have just embarked on my first large scale project, building some nice linear algebra solvers. The systems are pretty large so getting it right first time is important.

After successful simulations I am now sythensizing, but am having a nightmare, I am having to build and test it component by component as nothing behaves as it does in simulation! I'm mainly having issues with state machines where outputs are not being synchronized for example this is a data loader i'm using:

entity TriDiag_Data_Scheduler is
    generic( W : integer :=16;  
                AW : integer := 10 -- address width
                );
    Port ( clk : in  STD_LOGIC;
           rst : in  STD_LOGIC; --attatched to data finished
           d_ready : in  STD_LOGIC;
           din : in  STD_LOGIC_VECTOR (W-1 downto 0);
              wr_en : out STD_LOGIC_VECTOR (3 downto 0);
              dout_a, dout_b, dout_c, dout_y : out  STD_LOGIC_VECTOR (W-1 downto 0);
              addr_out : out STD_LOGIC_VECTOR (AW-1 downto 0));
end TriDiag_Data_Scheduler;

architecture Behavioral of TriDiag_Data_Scheduler is

type state is (a,b,c,y);
signal state_pr, state_next : state := y;
signal addr_reg, addr_next : std_logic_vector(AW-1 downto 0) :=(others =>'1');

signal wr_en_next : std_logic_vector(3 downto 0);

--data buffer
signal d_buff, d_buff_a, d_buff_b, d_buff_c, d_buff_y : std_logic_vector (W-1 downto 0) :=(others =>'0');
signal d_buff_a_reg, d_buff_b_reg, d_buff_c_reg, d_buff_y_reg : std_logic_vector (W-1 downto 0) :=(others =>'0');

begin

process(clk,rst)
begin
    if(clk'event and clk ='1') then
        state_pr <= state_next;
        d_buff_a <= d_buff_a_reg;
        d_buff_b <= d_buff_b_reg;
        d_buff_c <= d_buff_c_reg;
        d_buff_y <= d_buff_y_reg;

        addr_reg <= addr_next;
        wr_en <= wr_en_next;
    end if;

end process;

addr_out <= addr_reg;
dout_a <= d_buff_a;
dout_b <= d_buff_b;
dout_c <= d_buff_c;
dout_y <= d_buff_y;


--Data out logic
process(state_pr, d_buff_a, d_buff_b, d_buff_c, d_buff_y, d_buff)
begin

    d_buff_a_reg <= d_buff_a;
    d_buff_b_reg <= d_buff_b;
    d_buff_c_reg <= d_buff_c;
    d_buff_y_reg <= d_buff_y;

    case state_pr is
        when a => --move data to a reg
            d_buff_a_reg <= d_buff;
        when b => --move data to b reg
            d_buff_b_reg <= d_buff;
        when c => --move data to c reg
            d_buff_c_reg <= d_buff;
        when y => 
            d_buff_y_reg <= d_buff;
    end case; 
end process;

--next state and addr logic
process(state_pr, d_ready, rst, din)
begin

    state_next <= state_pr;
    addr_next <= addr_reg;
    wr_en_next <= (others => '0');

if(rst = '1') then
    state_next <= a;
    addr_next <= (others =>'1');
    wr_en_next <= (others => '0');
elsif(d_ready = '1') then
--Read in the data to the buffer
    d_buff <= din;
--next state logic
    case state_pr is
        when a => --move data to a reg
            addr_next <= addr_reg + 1;
        --  d_buff_a_reg <= din;
            wr_en_next <= "0001";
            state_next <= b;
        when b => --move data to b reg
            wr_en_next <= "0010";
        --  d_buff_b_reg <= din;
            state_next <= c;
        when c => --move data to c reg
            wr_en_next <= "0100";
        --  d_buff_c_reg <= din;
            state_next <= y;
        when y => 
        --  d_buff_y_reg <= din;
            wr_en_next <= "1000";
            state_next <= a;
    end case; 
end if;
end process;
end Behavioral;

Basically when data is received via a UART module its job is to then load into the correct memory (controlled by write_en signal). The issue is that in all my designs (this is revision 7) all addr_out, wr_en and the correct data are in syn, but in synthesis I keep finding that the addr and wr_en are not in syn with the data and reads half from the preceding and half from the previous state.

What design practices should I use so that my VHDL is more syntheziable friendly because at this rate I will have to re-write all my previous hard-work for each component!

Many thanks Sam

1
My VHDL is home taught so apologies if it offends anyone!Sam Palmer
Whenever I see non-clocked processes with big sensitivity lists, I assume something's missing from them. I haven't gone through it in detail, but check the synthesizer warnings to see if it's spotted anything. Personally I would rewrite this as a single clocked processMartin Thompson
Thanks again Martin. The problem was that it was sensitive to the external d_ready signal. I fixed the issue by adding a synchronous tick which is set by the d_ready signal but then asserted on the clk. I think i was suffering from so nice skews between d_ready and the clk which is why it wasn't all in sync. But now everything is active on the synchronous tick and seems to work.Sam Palmer
My experience is that software people have more success using a larger, single, clocked process per entity and variables for everything that doesn't involve communication with another process. YMMV of course, but it might be worth a try...Martin Thompson
I agree with @MartinThompson: As a demonstration for how dangerous such large combinatorial processes are, take a look at the latch generated for d_buff. Just a small negligence that won't show up in simulation, but have a large impact on synthesis. Also, naming something _reg when it is not a register is very confusing (e.g., d_buff_a_reg)!zennehoy

1 Answers

0
votes

Working design, the old design was suffering from skew between the d_ready signal asserted from another module and this modules clk. As such the changes where not being synced as first hoped. To fixed this I but in the d_tick_next signal which asserts a locally sync signal d_tick which allowed all the correct behavior. One lesson learnt (correct me if i'm wrong) is that you can't rely on supposed externally clocked signals (such as d_ready) to be in sync with the receiving modules clk.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.std_logic_unsigned.all;

-- Uncomment the following library declaration if using
-- arithmetic functions with Signed or Unsigned values
--use IEEE.NUMERIC_STD.ALL;

-- Uncomment the following library declaration if instantiating
-- any Xilinx primitives in this code.
--library UNISIM;
--use UNISIM.VComponents.all;


-- ////////////////////////////////////////////////////////////////////
--Takes in the linear stream of data and builds up the memory structure

--INPUT DATA MUST FOLLOW SEQUENCE [a b c] x = [y] (0 if a or c do not exist) rows 0 and N respectivly 
--//////////////////////////////////////////////////////////////////////
entity TriDiag_Data_Scheduler is
    generic( W : integer :=16;  
                AW : integer := 10 -- address width
                );
    Port ( clk : in  STD_LOGIC;
           rst : in  STD_LOGIC; --attatched to data finished
           d_ready : in  STD_LOGIC;
           din : in  STD_LOGIC_VECTOR (W-1 downto 0);
              wr_en : out STD_LOGIC_VECTOR (3 downto 0);
              dout_a, dout_b, dout_c, dout_y : out  STD_LOGIC_VECTOR (W-1 downto 0);
              addr_out : out STD_LOGIC_VECTOR (AW-1 downto 0));
end TriDiag_Data_Scheduler;

architecture Behavioral of TriDiag_Data_Scheduler is

type state is (a,b,c,y);
signal state_pr, state_next : state := y;
signal addr_reg, addr_next : std_logic_vector(AW-1 downto 0) :=(others =>'1');

signal wr_en_next : std_logic_vector(3 downto 0);

signal d_tick, d_tick_next : std_logic;

--data buffer
signal d_buff, d_buff_a, d_buff_b, d_buff_c, d_buff_y : std_logic_vector (W-1 downto 0) :=(others =>'0');
signal d_buff_a_reg, d_buff_b_reg, d_buff_c_reg, d_buff_y_reg : std_logic_vector (W-1 downto 0) :=(others =>'0');

begin

process(clk,rst)
begin
    if(clk'event and clk ='1') then
        state_pr <= state_next;

        d_buff_a <= d_buff_a_reg;
        d_buff_b <= d_buff_b_reg;
        d_buff_c <= d_buff_c_reg;
        d_buff_y <= d_buff_y_reg;

        d_tick <= d_tick_next;

        addr_reg <= addr_next;
        wr_en <= wr_en_next;
    end if;

end process;

addr_out <= addr_reg;
dout_a <= d_buff_a;
dout_b <= d_buff_b;
dout_c <= d_buff_c;
dout_y <= d_buff_y;


--Data out logic
process(state_pr,d_tick,rst)
begin

    d_buff_a_reg <= d_buff_a;
    d_buff_b_reg <= d_buff_b;
    d_buff_c_reg <= d_buff_c;
    d_buff_y_reg <= d_buff_y;

    wr_en_next <= (others => '0');

if(rst = '1') then
    addr_next <= (others =>'1');
else
    addr_next <= addr_reg;
end if;

if(d_tick = '1') then
    case state_pr is
        when a => --move data to a reg
            d_buff_a_reg <= d_buff;
            addr_next <= addr_reg + 1;
            wr_en_next <= "0001";
        when b => --move data to b reg
            d_buff_b_reg <= d_buff;
            wr_en_next <= "0010";
        when c => --move data to c reg
            d_buff_c_reg <= d_buff;
            wr_en_next <= "0100";
        when y => 
            d_buff_y_reg <= d_buff;
            wr_en_next <= "1000";
    end case; 
end if;
end process;

--next state and d_tick
process(state_pr, d_ready, rst, din)
begin

    state_next <= state_pr;
    d_tick_next <='0';

if(rst = '1') then
    state_next <= y;
elsif(d_ready = '1') then
--Read in the data to the buffer
    d_buff <= din;
-- set sync tick
    d_tick_next <= '1';
--next state logic
    case state_pr is
        when a => 
            state_next <= b;
        when b => 
            state_next <= c;
        when c =>
            state_next <= y;
        when y => 
            state_next <= a;
    end case; 
end if;
end process;

end Behavioral;