attribute ram_style: string;
attribute ram_style of ram : signal is "distributed";
type dist_ram is array (0 to 99) of std_logic_vector(7 downto 0);
signal ram : dist_ram := (others => (others => '0'));
begin
--Pseudocode
PROCESS(Clk)
BEGIN
if(rising_edge(Clk)) then
ram(0) <= "0";
ram(2) <= "1";
ram(3) <= "2";
ram(4) <= "3";
ram(5) <= "1";
ram(6) <= "2";
...
...
ram(99) <= "3";
end if;
END PROCESS;
In the above scenario the complete ram gets updated in 1 clock cycle, however if i use a Block ram instead, i would require a minimum of 100 clock cycles to update the entire memory as opposed to 1 clock cycle when used as a distributed ram.
I also understand that it is not advisable to use the distributed ram for large memory as it will eat up the FPGA resources. So what is the best design for such situation (say for few KB ram) in order to achieve the best throughput.
Should i use block ram or distributed ram assuming Xilinx FPGA. Your suggestions are highly appreciated.
Thanks for your replies, let me make it a bit more clear. My purpose is not for ram initialization, i have 100 x 20 (8 bits) ram block which needs to be updated after certain computation. After these computations i have to store and then use it back for next iteration. This is an iterative process and i am expected to finish atleast 2 iterations within 3000 clk cycles. if i use the block ram to store these coefficients then to just read and write i would need atleast (100*20) cycles with some latency which will not meet my requirement. So how should i go about designing in this case.