Vivado: Finding the “maximal frequency” after synthesis

This post was written by eli on January 3, 2017
Posted Under: FPGA,Vivado


Somewhere at the bottom of ISE’s xst synthesizer’s report, it says what the maximal frequency is, along with an outline the slowest path. This is a rather nice feature, in particular when attempting to optimize a specific module. There is no such figure given after a regular Vivado synthesis, possibly because the guys at Xilinx thought this “maximal frequency” could be misleading. If so, they had two good reasons for that:

  • There is no such thing as a “maximal frequency”: The tools pay attention to the timing constraints and do their best accordingly. Put shortly, you might not get frequency X unless you ask for it.
  • In a typical design, there are many clocks with different frequencies. The slowest path might belong to a clock that’s slow anyhow.

And still, it’s sometimes useful to get an idea of where things stand before the Via Dolorosa of a full implementation.

How to do it in Vivado

First and foremost: Set the timing constraints according to your expectations. Or at least, in a way that makes it clear which clock is important, and which can be slow. Then synthesize the design.

After the synthesis has completed successfully, open the synthesized design (clicking “Open Synthesized Design” on the left bar or with the Tcl command “open_run synth_1″).

In the Tcl window, issue the command

report_timing_summary -file mytiming.rpt

which writes a full post-synthesis timing report into mytiming.rpt. Just “report_timing_summary” prints it out to the console.

There’s also a “Report Timing Summary” option under “Synthesized Design” on the left bar, but I find it difficult to navigate my way to getting information in the GUI representation of the report.

Reading the report

RULE #1: The synthesis report is no more than a rough estimation. The routing delays are guesses. It might report timing failures where the implementation will succeed to fix things, and it might say all is fine where the implementation will fail colossally (in particular when the FPGA’s logic usage goes close to 100%).

Now to action: The first thing to look at is the clock summary and Intra Clock Table, and get to know how Vivado has named which clock. For example,

| Clock Summary
| -------------

Clock                  Waveform(ns)         Period(ns)      Frequency(MHz)
-----                  ------------         ----------      --------------
clk_fpga_1             {0.000 5.000}        10.000          100.000
gclk                   {0.000 4.000}        8.000           125.000
  audio_mclk_OBUF      {0.000 41.667}       83.333          12.000
  clk_fb               {0.000 20.000}       40.000          25.000
  vga_clk_ins/clk_fb   {0.000 20.000}       40.000          25.000
  vga_clk_ins/clkout0  {0.000 1.538}        3.077           325.000
  vga_clk_ins/clkout1  {0.000 7.692}        15.385          65.000
  vga_clk_ins/clkout2  {0.000 7.692}        15.385          65.000          

| Intra Clock Table
| -----------------

Clock                      WNS(ns)      TNS(ns)  TNS Failing Endpoints  TNS Total Endpoints      WHS(ns)      THS(ns)  THS Failing Endpoints  THS Total Endpoints     WPWS(ns)     TPWS(ns)  TPWS Failing Endpoints  TPWS Total Endpoints
-----                      -------      -------  ---------------------  -------------------      -------      -------  ---------------------  -------------------     --------     --------  ----------------------  --------------------
clk_fpga_1                   3.791        0.000                      0                12474        0.135        0.000                      0                12474        3.750        0.000                       0                  5021
gclk                                                                                                                                                                     6.751        0.000                       0                     2
  audio_mclk_OBUF                                                                                                                                                       76.667        0.000                       0                     1
  clk_fb                                                                                                                                                                12.633        0.000                       0                     2
  vga_clk_ins/clk_fb                                                                                                                                                    38.751        0.000                       0                     2
  vga_clk_ins/clkout0                                                                                                                                                    1.410        0.000                       0                    10
  vga_clk_ins/clkout1       10.747        0.000                      0                  215       -0.029       -0.229                      8                  215        6.712        0.000                       0                   195
  vga_clk_ins/clkout2        3.990        0.000                      0                  415        0.135        0.000                      0                  415        7.192        0.000                       0                   211

If the clock frequencies listed in the Clock Summary (which are derived from the timing constraints) don’t help matching between a clock and a name, the TNS Total Endpoints of each clock in the Intra Clock Table helps telling which clock is which. So once the name of the clock of interest is nailed down, search for it in the file, and find something like this:

Max Delay Paths
Slack (MET) :             3.791ns  (required time - arrival time)
  Source:                 xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_offset_limit_1/C
                            (rising edge-triggered cell FDRE clocked by clk_fpga_1  {rise@0.000ns fall@5.000ns period=10.000ns})
  Destination:            xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0/D
                            (rising edge-triggered cell FDRE clocked by clk_fpga_1  {rise@0.000ns fall@5.000ns period=10.000ns})
  Path Group:             clk_fpga_1
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            10.000ns  (clk_fpga_1 rise@10.000ns - clk_fpga_1 rise@0.000ns)
  Data Path Delay:        6.077ns  (logic 2.346ns (38.605%)  route 3.731ns (61.395%))
  Logic Levels:           8  (CARRY4=3 LUT3=1 LUT4=1 LUT6=3)
  Clock Path Skew:        -0.040ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    0.851ns = ( 10.851 - 10.000 )
    Source Clock Delay      (SCD):    0.901ns
    Clock Pessimism Removal (CPR):    0.010ns
  Clock Uncertainty:      0.154ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.300ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_fpga_1 rise edge)
                                                      0.000     0.000 r
                         PS7                          0.000     0.000 r  xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/PS7_i/FCLKCLK[1]
                         net (fo=1, unplaced)         0.000     0.000    xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/n_707_PS7_i
                         BUFG (Prop_bufg_I_O)         0.101     0.101 r  xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/buffer_fclk_clk_1.FCLK_CLK_1_BUFG/O
                         net (fo=5023, unplaced)      0.800     0.901    xillybus_ins/xillybus_core_ins/bus_clk_w
                                                                      r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_offset_limit_1/C
  -------------------------------------------------------------------    -------------------
                         FDRE (Prop_fdre_C_Q)         0.496     1.397 f  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_offset_limit_1/Q
                         net (fo=5, unplaced)         0.834     2.231    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_offset_limit[1]
                         LUT4 (Prop_lut4_I0_O)        0.289     2.520 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_lutdi/O
                         net (fo=1, unplaced)         0.000     2.520    xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_lutdi
                         CARRY4 (Prop_carry4_DI[0]_CO[3])
                                                      0.553     3.073 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_cy[0]_CARRY4/CO[3]
                         net (fo=1, unplaced)         0.000     3.073    xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_cy[3]
                         CARRY4 (Prop_carry4_CI_CO[3])
                                                      0.114     3.187 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_cy[4]_CARRY4/CO[3]
                         net (fo=3, unplaced)         0.936     4.123    xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_cy[7]
                         LUT6 (Prop_lut6_I4_O)        0.124     4.247 f  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_wr_request_condition/O
                         net (fo=7, unplaced)         0.480     4.727    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_wr_request_condition
                         LUT3 (Prop_lut3_I2_O)        0.124     4.851 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_flush_condition_unitw_1_wr_request_condition_AND_179_o3_lut/O
                         net (fo=1, unplaced)         0.000     4.851    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_flush_condition_unitw_1_wr_request_condition_AND_179_o3_lut
                         CARRY4 (Prop_carry4_S[2]_CO[3])
                                                      0.398     5.249 f  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_flush_condition_unitw_1_wr_request_condition_AND_179_o2_cy_CARRY4/CO[3]
                         net (fo=21, unplaced)        0.979     6.228    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_flush_condition_unitw_1_wr_request_condition_AND_179_o
                         LUT6 (Prop_lut6_I5_O)        0.124     6.352 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/_n03401/O
                         net (fo=15, unplaced)        0.502     6.854    xillybus_ins/xillybus_core_ins/unitw_1_ins/_n0340
                         LUT6 (Prop_lut6_I5_O)        0.124     6.978 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0_rstpot/O
                         net (fo=1, unplaced)         0.000     6.978    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0_rstpot
                         FDRE                                         r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0/D
  -------------------------------------------------------------------    -------------------

                         (clock clk_fpga_1 rise edge)
                                                     10.000    10.000 r
                         PS7                          0.000    10.000 r  xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/PS7_i/FCLKCLK[1]
                         net (fo=1, unplaced)         0.000    10.000    xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/n_707_PS7_i
                         BUFG (Prop_bufg_I_O)         0.091    10.091 r  xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/buffer_fclk_clk_1.FCLK_CLK_1_BUFG/O
                         net (fo=5023, unplaced)      0.760    10.851    xillybus_ins/xillybus_core_ins/bus_clk_w
                                                                      r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0/C
                         clock pessimism              0.010    10.861
                         clock uncertainty           -0.154    10.707
                         FDRE (Setup_fdre_C_D)        0.062    10.769    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0
                         required time                         10.769
                         arrival time                          -6.978
                         slack                                  3.791

This is a rather messy piece of text, but the key elements are marked in red.

Before drawing any conclusions, make sure it’s the right part you’re looking at:

  • It’s the Max Delay Paths section. The mimimal paths section is useful for spotting hold time violation, and has no effect on the maximal frequency.
  • It’s the right clock. In the example above, it’s clk_fpga_1. The Requirement line states not only the constraint given for this clock (10 ns = 100 MHz), but also that it goes from one rising edge of clk_fpga_1 to another.

Once that’s done, let’s see what we’ve got: The requirement was 10 ns, and the slack 3.791 ns (note that it’s positive), which means that we could have asked for a clock period  3.791 ns shorter and it would still be OK. So it could have been 10 – 3.791 = 6.2090 ns, which is some 161 MHz.

So the short answer to the “maximal clock” question for clk_fpga_1 is 161 MHz. But remember that this figure might change if the constraints change.

And a final note: The Data Path Delay tells us something about what made this worst path slow or fast. How much delay went on logic, and how much on the (estimated) route delays. So does the detailed delay report that follows. For a more detailed report, consider using the “-noworst” flag when requesting the timing report, so a few worst-case paths are listed. This can help solving timing problems.

Reader Comments

There is much easier way to find TNS/THS/WNS/WHS in Vivado.
Those are properties of the run object.
Here is a TCL snippet:

foreach run [get_runs -filter {IS_IMPLEMENTATION}] {
set tns [get_property STATS.TNS $run]
set ths [get_property STATS.THS $run]

Moreover, it doesn’t require opening the implemented design, which can take 25-30 min.

Written By Evgeni Stavinov on March 1st, 2017 @ 02:22

Hello Eli Billauer,
My name is Jason and thank you very much for the design experience shared in your blog. After reading your blog introducing how to find the maximal frequency, a question comes to my mind.
When I am design a general processor (or DSP such as a multiplier), I tried to examine the timing summary. However, the design is reported empty since I didn’t add any constraint to my design. If I add some modifiers (* mark_debug = “true” *) into some of my Verilog files and connect the clock and reset signals by defining the constraints, my design is preserved and the timing summary did report the WNS information. However, I think this timing information would not be accurate since the (* mark_debug = “true” *) affects (slows down) the clock frequency.
I am wondering if you have met the same issues in your design experience. Any comments or suggestions would be highly appreciated. Thank you.

Written By Jason Syu on December 26th, 2017 @ 04:14

Thanks a lot for your information.

Written By Hossam Hassan on June 5th, 2020 @ 06:59

Add a Comment

required, use real name
required, will not be published
optional, your blog address