3
votes

I have designed a MIPS single cycle processor in Xilinx using VHDL. The abstract design is based on the theory provided by Patterson and Henessy book. After completing the design i ran few assembly codes to check it's functioning and it was giving the desired results. My problem is with the "TIMING SUMMARY" in the design summary report(".SYR" file). Every time I change the assembly code that is stored in the Instruction memory(which is my ROM) the minimum clock period for the single cycle processor keeps changing. I don't quite understand the reason?

Timing Summary:
---------------
Speed Grade: -4

   Minimum period: 17.561ns (Maximum Frequency: 56.945MHz)
   Minimum input arrival time before clock: No path found
   Maximum output required time after clock: 16.296ns
   Maximum combinational path delay: No path found

Timing Detail:
--------------
All values displayed in nanoseconds (ns)

=========================================================================
Timing constraint: Default period analysis for Clock 'clk'
  Clock period: 17.561ns (frequency: 56.945MHz)
  Total number of paths / destination ports: 6965792 / 616
-------------------------------------------------------------------------
Delay:               17.561ns (Levels of Logic = 22)
  Source:            MIPS_processor_unit/Datapath_comp/PC_reg/q_5_1 (FF)
  Destination:       MIPS_processor_unit/Datapath_comp/RegF/memory_0_0 (FF)
  Source Clock:      clk rising
  Destination Clock: clk rising

  Data Path: MIPS_processor_unit/Datapath_comp/PC_reg/q_5_1 to MIPS_processor_unit/Datapath_comp/RegF/memory_0_0
                                Gate     Net
    Cell:in->out      fanout   Delay   Delay  Logical Name (Net Name)
    ----------------------------------------  ------------
    FDCE:C->Q             2   0.591   0.622  MIPS_processor_unit/Datapath_comp/PC_reg/q_5_1 >>(MIPS_processor_unit/Datapath_comp/PC_reg/q_5_1)
     LUT2_L:I0->LO         1   0.704   0.104  Instruction_memory_unit/Mrom_Instruction_out391220_SW0 (N1361)
     LUT4:I3->O            3   0.704   0.535  Instruction_memory_unit/Mrom_Instruction_out391236_SW0 (N141)
     LUT4:I3->O           17   0.704   1.051  Instruction_memory_unit/Mrom_Instruction_out391236 (Instruction_tl_s)
     MUXF5:S->O            2   0.739   0.526  MIPS_processor_unit/Datapath_comp/RegF/mux8_8_f5 (MIPS_processor_unit/Datapath_comp/RegF/mux8_8_f5)
     LUT4:I1->O            1   0.704   0.000  MIPS_processor_unit/Datapath_comp/ALUSrc_mux/y1_F (N276)
     MUXF5:I0->O           3   0.321   0.610  MIPS_processor_unit/Datapath_comp/ALUSrc_mux/y1 (MIPS_processor_unit/Datapath_comp/ALU_2nd_input_s)
     LUT2:I1->O            1   0.704   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_lut (MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_lut)
     MUXCY:S->O            1   0.464   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy)
     MUXCY:CI->O           1   0.059   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy)
     MUXCY:CI->O           1   0.059   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy)
     MUXCY:CI->O           1   0.059   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy)
     MUXCY:CI->O           1   0.059   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy)
     MUXCY:CI->O           1   0.059   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy)
     MUXCY:CI->O           0   0.059   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_cy)
     XORCY:CI->O           1   0.804   0.424  MIPS_processor_unit/Datapath_comp/ALU_comp/Msub_y_sig_addsub0001_xor (MIPS_processor_unit/Datapath_comp/ALU_comp/y_sig_addsub0001)
     LUT4:I3->O            1   0.704   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/y_sig_mux0000_f5_G (N237)
     MUXF5:I1->O         259   0.321   1.334  MIPS_processor_unit/Datapath_comp/ALU_comp/y_sig_mux0000_f5 (Output_address_0_OBUF)
     RAM32X1S:A0->O        1   1.025   0.499  Data_memory_unit/Mram_data_mem1 (N10)
     LUT3:I1->O            1   0.704   0.000  inst_LPM_MUX_6 (inst_LPM_MUX_6)
     MUXF5:I0->O           1   0.321   0.000  inst_LPM_MUX_4_f5 (inst_LPM_MUX_4_f5)
     MUXF6:I0->O           1   0.521   0.455  inst_LPM_MUX_2_f6 (Read_data_tl_s)
     LUT3:I2->O            8   0.704   0.000  MIPS_processor_unit/Datapath_comp/WB_mux/y1 (MIPS_processor_unit/Datapath_comp/write_data_s)
     FDCE:D                    0.308          MIPS_processor_unit/Datapath_comp/RegF/memory_0_0
    ----------------------------------------
    Total                     17.561ns (11.401ns logic, 6.160ns route)
                                       (64.9% logic, 35.1% route)

=========================================================================


Timing Summary:
---------------
Speed Grade: -4

   Minimum period: 13.551ns (Maximum Frequency: 73.798MHz)
   Minimum input arrival time before clock: No path found
   Maximum output required time after clock: 14.466ns
   Maximum combinational path delay: No path found

Timing Detail:
--------------
All values displayed in nanoseconds (ns)

=========================================================================
Timing constraint: Default period analysis for Clock 'clk'
  Clock period: 13.551ns (frequency: 73.798MHz)
  Total number of paths / destination ports: 256927 / 278
-------------------------------------------------------------------------
Delay:               13.551ns (Levels of Logic = 13)
  Source:            MIPS_processor_unit/Datapath_comp/PC_reg/q_6 (FF)
  Destination:       MIPS_processor_unit/Datapath_comp/PC_reg/q_2 (FF)
  Source Clock:      clk rising
  Destination Clock: clk rising

  Data Path: MIPS_processor_unit/Datapath_comp/PC_reg/q_6 to MIPS_processor_unit/Datapath_comp/PC_reg/q_2
                                Gate     Net
    Cell:in->out      fanout   Delay   Delay  Logical Name (Net Name)
    ----------------------------------------  ------------
     FDCE:C->Q            71   0.591   1.354  MIPS_processor_unit/Datapath_comp/PC_reg/q_6 (MIPS_processor_unit/Datapath_comp/PC_reg/q_6)
     LUT3_D:I1->O          8   0.704   0.761  Instruction_memory_unit/Mrom_Instruction_out4711110 (N91)
     LUT4:I3->O           17   0.704   1.051  Instruction_memory_unit/Mrom_Instruction_out43111_2 (Instruction_memory_unit/Mrom_Instruction_out43111_1)
     MUXF5:S->O            1   0.739   0.000  MIPS_processor_unit/Datapath_comp/RegF/mux3_7_f5_0 (MIPS_processor_unit/Datapath_comp/RegF/mux3_7_f51)
     MUXF6:I0->O           1   0.521   0.424  MIPS_processor_unit/Datapath_comp/RegF/mux3_5_f6_0 (MIPS_processor_unit/Datapath_comp/RegF/mux3_5_f61)
     LUT4:I3->O            1   0.704   0.424  MIPS_processor_unit/Datapath_comp/RegF/read_data_11 (MIPS_processor_unit/Datapath_comp/read_data_1_s)
     LUT4:I3->O            1   0.704   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_lut (MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_lut)
     MUXCY:S->O            1   0.464   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_cy)
     MUXCY:CI->O           1   0.059   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_cy)
     MUXCY:CI->O           1   0.059   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_cy)
     MUXCY:CI->O           0   0.059   0.000  MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_cy (MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_cy)
     XORCY:CI->O          18   0.804   1.072  MIPS_processor_unit/Datapath_comp/ALU_comp/Maddsub_y_sig_addsub0000_xor (MIPS_processor_unit/Datapath_comp/write_data_s)
     LUT4_D:I3->O          5   0.704   0.637  MIPS_processor_unit/Controller_comp/PCSrc9 (MIPS_processor_unit/Controller_comp/PCSrc9)
     LUT4:I3->O            1   0.704   0.000  MIPS_processor_unit/Datapath_comp/Jump_mux/y1 (MIPS_processor_unit/Datapath_comp/Next_PC_1_s)
     FDCE:D                    0.308          MIPS_processor_unit/Datapath_comp/PC_reg/q_6
    ----------------------------------------
    Total                     13.551ns (7.828ns logic, 5.723ns route)
                                       (57.8% logic, 42.2% route)

=========================================================================

As can be seen I gave my Instruction_memory_unit two different assembly codes and the minimum period for the single cycle processor changes.These are my doubts:

1)Every time I change my assembly codes, does xilinx evaluate the critical path on the basis of the instructions that i have specified in my assembly code? If 'Yes', then how should i get a general minimum period for my design?

2)I have RegF as my Register file which is basically the RAM containing the 32 registers of a MIPS processor. What I can't understand is that, in both these timing summary the 'Gate delay + Net Delay' is different. Theoretically, shouldn't the register file being a memory have a fixed read time?

2
Is there a register between the output of your instruction ROM and the input to your instruction decoder? The register would take the ROM contents out of the critical timing path.markgz
Notice your delay paths change the number of elements and their type between the two. As a guess you're changing the size of the ROM and re-synthesizing the whole thing. A pipeline model would allow you to hide differences in size behind register transfers and can provide layout isolation. Here you're dealing with a flat design. Potentially you can specify tight timing in part of your design to force layout locality and limit implementation alternatives for parts of the design.user1155120
The MUXF5 cell in the first path has a large fanout. This is indicative of architectural problems that could be improved by inserting registers. These variations are most likely coming from different placements of the logic around the chip. Floor-planning will help stabilize the results.Kevin Thibedeau
The reported 'Maximum Frequency' after synthesis is just an estimation. It can be very different from values after P&R. If you want to constrain your design add a *.xcf file. This 'synthesis constraint file' is read while synthesis. Normal *.ucf files are read in translate phase. So with *.xcf files synthesis already knows your desired F_max. Also acknowledge that optimization stops if all constraints are met.Paebbels
1) no I have not added any register in between ROM and register file. Output of ROM goes straight to Register File. 2) I didn't understand the part-" Potentially you can specify tight timing in part of your design to force layout locality and limit implementation alternatives for parts of the design." Can i get more hints? 3)This timing summary is from "Synthesize" Process. I have not studied the timing summary of "Implementation".If I add a UCF file will that solve my problem which is related to RAM and ROM inference?Tapojyoti Mandal

2 Answers

2
votes

It may be synthesising your ROM down into gates or LUTs or SRL16s. ... check the device usage (just before the timing report in the .syr file) to see whether it's using block memory for the ROM - it may not be.

In fact that does appear to be the problem, according to the timing report : there's a lot of LUTs in there and no sign of a BRAM.

If that's the problem, look up "attribute ram_style=blockram" in the Xilinx constraint guide (I may have the spellisg/syntax slightly wrong) - if you apply that to the array containing your ROM you may be able to overcome this. Once data is in memory, timings should be more stable.

NOTE that the BlockRams are synchronous : you present the address in one clock cycle and get the contents a cycle later. If that doesn't meet your pipeline model, you will have to re-think that in order to let synthesis implement the ROM in block memory.

1
votes

Every time you implement your design, even without any logic changes, the timing results may be different. In some cases, where there is routing congestion, many levels of logic in many paths, or many "difficult" paths, you may encounter wildly different results from run to run.

As an experiment, change nothing in your design and run implementation 2 or 3 times. I bet you will get at least some variation in the runs.

There are some handles that you can play with to minimize this variability, but I don't recommend it (for example: using a fixed seed to the implementation process). Likely something else is going on here.

Other possible factors:

  1. Are all of your IO fixed to specific IO locations? If not, the tools could be randomly selecting IO pins for the IO of your design, which will greatly affect timing.
  2. Have you tried placing constraints on your design (on your clock, for example)? This will indicate to the tools "how hard they should try" in order to improve your design to meet a certain goal. If you have some performance in mind (e.g. 66MHz, 100MHz... etc) you can provide that as a constraint to the tools and they will attempt to meet that constraint.
  3. Look into how your ROM/RAM is actually implemented. The tools may be taking liberties to make optimizations depending on the contents of the ROM, which may be simplifying the design in some cases (based on the contents). In short, it may be implementing your design as LUTs instead of a RAM-type primitive. This could be helping you out, and it might just be an artifact of how your have things implemented at this time (coding style, resets, etc). If in the future the ROM becomes more generic (e.g. some run-time loading process), the tools won't be able to take the same optimization liberties and will have similar performance run-to-run.

In summary, I don't think changing the contents of your ROM/RAM is the culprit in the timing changes you are seeing, but some other factor.