Quartus / sdc: Constraining I/O ports clocked by an internal clock

This post was written by eli on August 5, 2018
Posted Under: FPGA,Intel FPGA (Altera)

Introduction

This post is an expansion for another post of mine, which deals with register I/O packing. It’s recommended reading that one first.

Timing constraining of I/O ports is typically intended to ensure timing relations between an external clock and the timing of signals that are clocked by this clock (or derived from this clock, with the same frequency or a simple relation to it).

However in some cases the clock of the I/O registers is generated with an PLL within the FPGA, and is practically unrelated to the originating clock. There are still good reasons to constrain the timing of such ports, among others:

  • Even though the external clock source isn’t involved directly, the timing must still be under control. In particular, when the interface with an external device is bidirectional, the timing of the signals arriving from the device depend on those being generated by the FPGA going to it. Constraining the ports is part of ensuring that this timing loop is fast enough.
  • Ensuring that I/O registers are used. Tight constraints, which can only be met with I/O registers will fail if the tools don’t pack those registers as desired.
  • Ensuring that no delay is inserted by the tools between the input pad and the register.

Clearly, nobody at Altera thought that this kind of constraining was necessary. Consequently, getting this done in a fairly clean manner is nontrivial, to say the least (yours truly wasted a full week of work to figure this out). This post suggests a methodology which is hopefully the clean enough. This is the best I managed to work out, anyhow.

The relevant documentation:

The Intel / Altera toolset used is Quartus Prime 15.1 (Web Edition).

The goal

The naïve approach is to apply set_input_delay and set_output_delay to the output ports as usual, using the clock from the PLL in the -clock argument. Even so, the tools interpret this as constraining the timing relative to the external clock, which is inherently pointless, since this relation has no meaning. To make things worse, if the clock frequency relations aren’t a plain ratio, the timing requirements become unreal, as the closest clock edge relations between the two clocks are applied as the worst case. So this doesn’t work at all.

Ideally, we’d like to constrain only the path between the I/O pin and the register connected directly to it. It appears like there’s no way to do that exactly, as Quartus’ Timing Analyzer automatically mixes in the clock’s path delay when there’s a register involved.

So the goal is to define the delay between the external pin and the register that samples its state or vice versa, with as little interference as possible.

A sample set of constraints

This is the sdc file that worked for me. Each part is explained in detail afterwards.

create_clock -name root_clk -period 20.833 [get_ports {osc_clock}]

# The 60 MHz clock is defined on the global clock buffer's output pin:
create_clock -name main_clk -period 16.666 [get_pins {clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk}]

set_clock_groups -asynchronous -group [ get_clocks root_clk ] \
    -group [ get_clocks main_clk ]

set_annotated_delay -from clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk 0

derive_pll_clocks
derive_clock_uncertainty

set_false_path -hold -from [get_ports pixadc_*] -to [get_registers]
set_false_path -hold -from [get_registers] -to [get_ports pixadc_*]

set_max_delay -from [get_registers] -to [get_ports pixadc_*] 2.7
set_max_delay -from [get_ports pixadc_*] -to [get_registers] 0.7

create_clock assignments

In the relevant design, the external clock source runs at 48 MHz (20.833 ns period) and there’s a PLL on the FPGA generating a 60 MHz clock (16.666 ns period) based upon the external clock.

First and somewhat unrelated, note that neither the duty cycle nor the waveform attributes are given in these definitions. If the duty cycle is 50%, don’t add that rubbish. It’s the default anyhow, but is nevertheless added by the automatic constraint generator.

The definition of root_clk is quite standard. But pay attention to the way the derived clock, main_clk is defined. Not only isn’t it given as a derived clock from root_clk (or I could have relied on an automatic derivation made with “derive_pll_clocks”), but it’s assigned to the PLL’s output pin. It’s not a coincidence: That specific “get_pins” format is mandatory in the create_clock definition, or the PLL’s input-to-output delay is included (around 2 ns). For example, even

create_clock -name main_clk -period 16.666 clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]

(which is what the tools would generate automatically with derive_pll_clocks) will include the delay from inclk[0], even though the identifier given in the constraint is the one of the PLL’s output net. This is also the case if the net is referred to with “get_nets”. Only being specific with get_pins on the PLL’s output pins clarifies that the clock delay should start at the output of the PLL.

A peculiar thing is that the fitter issues a warnings like

Warning (332049): Ignored create_clock at test.sdc: Argument <targets> is an empty collection File: ...
 Info (332050): create_clock -name main_clk -period 16.666 [get_pins {clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk}] File: ...

and main_clk does indeed not appear in the list of clocks made by the fitter, even though the constraint’s effect is clear in the timing analysis made by the timing analyzer.

As shown below, the clock path is included in the timing calculation, no matter what. So this is one of the necessities for keeping that interference in the timing calculations minimal.

IMPORTANT: Setting the clock constraint on the PLL’s output pin as shown above, as well as the set_annotated_delay constraint, disrupt the constraining of register-to-register paths slightly, as the original, rigorous timing calculation assumes that the destination register receives the clock slightly earlier than the source register (presumably to represent a worst-case scenario involving global clock skew and other effects). It’s therefore recommended to compare the timing difference between the clock paths with the clock constraint made the classic way, and deduce this difference from the clock’s period, to ensure an accurate calculation. It should be no more than a few hundred picoseconds.

Ah, and another things I tried but lead nowhere: In the Handbook, it says that a virtual clock can be defined with something like

create_clock -name main_clk -period 16.666

In other words, this is a clock that can be mentioned in constraints, but has no signal in the FPGA related to it. My original thought that if this clock doesn’t relate to anything real, it won’t have any global clock delay in its timing calculations.

But trying it, constraining I/O pins with the virtual clock as the -clock argument, no timing calculations were made at all. The timing report ended up with “Nothing to report”. So the virtual clock concept didn’t help.

set_clock_groups

Nothing special about this. Just declaring that root_clk and main_clk should be considered unrelated (all paths between these clock domains are false).

set_annotated_delay

This command tells Timing Analyzer to consider the delay of the global clock buffer as zero. This is yet another necessity to keep unrelated delays out of the calculation.

Together with the definition of main_clk above, which relates the timed clock to the PLL’s output, the delay of the global clock network in the FPGA fabric is left out of the calculations. As shown below, there’s still a clock delay component that is counted in, but it’s presumably the delay of the clock within the logic element.

As the clock’s delay is left out, it doesn’t matter if the PLL is set to compensate for the global clock delay or not; the same timing is achieved either way. One could, by the way, argue that the PLL’s clock timing compensation is an alternative way to minimize the clock path’s role in the timing calculations. My own attempts to go down that road have however led to nothing else than a lot of wasted time. Note that in order to make sense of the PLL’s timing compensation, the commonplace create_clock definition must be used for main_clk, so the PLL’s own delay is included (it’s compensated for further down the road), and this leads to a total lack of control of what’s timed and what is not.

derive_pll_clocks and derive_clock_uncertainty

derive_pll_clocks is applied even though main_clk is defined explicitly with a create_clock constraint, and the latter overrides the clock generated by derive_pll_clocks. But since the create_clock statement for main_clk is ignored by the synthesizer as well as the fitter (because the relevant pin isn’t found), derive_pll_clocks is necessary during these stages to ensure that the relevant paths are timed. In particular, that the fitter makes sure that register-to-register paths meet timing.

If the clock period given in the create_clock constraint is shorter than the one derived from the PLL (which is recommended for reasons mentioned in this post), there might a situation where timing fails because the fitter didn’t attempt to meet a constraint it was blind to. Or at least theoretically. I’ve never encountered anything like this, partly because it’s quite difficult to fail on a 60 MHz clock.

derive_clock_uncertainty is used, as with any proper set of constraints.

set_max_delay

Finally, the delay constraints themselves. set_max_delay is used rather than set_input_delay and set_output_delay, mainly because set_max_delay expresses the element we want to constrain: A segment between the register and a port. As outlined in this other post of mine, set_input_delay and set_output_delay are tailored to allow copying numbers from the counterpart device’s datasheet directly. However if we want to constrain the internal delay with these, the sampling clock’s period needs to be taken into account. So for the purpose of constraining the internal delay, set_input_delay and set_output_delay’s values must be adjusted if the clock’s frequency changes, and that’s an unnecessary headache.

One could have hoped that there would be a way to constrain the plain combinatoric path between a port and a register. It seems however like there’s no way to do this, but that Timing Analyzer is being a bit too helpful: When any (or both) of the endpoints of a set_max_delay constraint is a register, the clock delay path is taken into consideration. In other words, if the source of the delay path is a register, the clock path delay is added to the constrained path to represent the fact that the data toggle from the source register is delayed by this path. Likewise, if the destination of the constraint is a register, the clock path is added to the timing requirement (relaxes the constraint) to represent that the destination register samples its input later.

This holds true no matter of how the register endpoint is given to the constraint command: Quite obviously, if get_regs was used to select the relevant endpoint, the clock path is included in the math. But it’s less obvious, for example, that if the source endpoint was selected with get_pins on the registers’ output pin (e.g. [ get_pins -hierarchical the_sample_reg|d ]), the clock path is still included. Bottom line: No way to avoid having the clock path in the math. This is the reason for the manipulations with create_clock and set_annotated_delay above.

Examples of the timing obtained with set_max_delay are given below.

set_false_path -hold

These set_false_path constraints disable the timing calculation for the registers’ hold requirement (note the -hold flag). Without these two constraints, Timing Analyzer will mark the relevant I/O ports (partly) unconstrained, even if they have related set_max_delay constraints. This has no practical implication except that the “TimeQuest Timing Analyzer” group in the GUI’s compilation report pane is marked red, indicating there’s a timing problem.

The sole purpose of these set_false_path constraints is hence to tell the tools not to bother about the hold paths, avoiding the said red color in the GUI.

As with any set_false_path constraint, care must be taken not to include any unintended paths.

Hold timing is irrelevant for the purpose of ensuring I/O register packing. Neither does it have any significance when timing against the external device, as its hold timing should be ensured by manual timing calculations. As for timing a loop from the FPGA to the device and back this is unnecessary as well: Failing the receiving register’s hold timing in this case requires that the receiver’s hold time is shorter than the clock-to-output (which involves driving a physical pin and its equivalent capacitor) plus the external device’s response time to the toggling signal. So this is by far unrealistic.

One could think that rather than making a false path, a reasonable set_min_delay constraint would do the job. But no: Any set_min_delay, which in turn activates hold time constraining, leads to an “Input Pin to Input Register Delay” as shown in this other post, but for other reasons and with another behavior. In particular, with the constraint setting of this post, this Input Pin delay is added even if that causes a failure of the set_max_delay constraint.

The underlying reason is to compensate for the clock delay: The tools must ensure that the clock arrives to the input register before the data on its input port toggles. Otherwise, the data sampled for the minimal clock case is different from the case of the maximal clock delay (for which the data toggle is obviously after the clock toggle).

Given the delay in the clock path, this forces the tools to insert a delay before the input register that is at least the clock time delay. When clock delay compensation is enabled at the PLL (and the originating clock is external), the PLL is set to create a negative clock delay, hence eliminating the need for this Input Pin delay.

But it gets worse with a clock generated internally: It’s not completely clear why, but even if the clock path is set to zero with the set_annotate_delay statement as said above, the tools keep adding this delay. Also regardless of whether the PLL is set to compensate for the clock delay. One explanation can be found in set_annotated_delay’s help text saying “This assignment is for timing analysis only, and is not considered during timing-driven compilation”. But this still doesn’t explain why it’s inserted even with the clock path compensation of the PLL enabled. So the conclusion is that the tools weren’t really meant to handle this internally generated clock scheme.

Bottom line: Don’t make any set_min_delay constraints on this path, and surely not set_input_delay -min or set_output_delay -min (the latter two will mess up things even worse. Believe me on that).

Constraints for crossing clock domain

This is somewhat unrelated, but it’s another aspect of how set_max_delay path works.

When crossing clock domains, it’s common to put two registers in series, so that the first register is a metastability guard, and the second samples the signal safely in the destination clock domain.

But since the paths crossing clock domains are not timed by the tools, they may in theory have an arbitrarily high propagation delay. This undermines the whole idea of the metastability guard. So to be extra safe, it makes sense to constrain these paths in order to ensure that the path delay is limited to something sensible.

Unfortunately, there is nothing better than set_max_delay for this purpose, which takes the clock delays into account. As these two clocks are unrelated, this makes no sense at all, but this is what Quartus offers. It would have been much better to constrain just the data path, and maybe creating a special clock and using set_annotated_delay as suggested above would do the trick.

But I’ll suggest the simple and crude method:

set_max_delay -from [ get_clocks *|some_ins|*|tx_clkout] \
    -to [ get_clocks *|some_ins|*|rx_clkout] 4
set_max_delay -from [ get_clocks *|some_ins|*|rx_clkout] \
    -to [ get_clocks *|some_ins|*|tx_clkout] 4

set_false_path -hold -from [ get_clocks *|some_ins|*|tx_clkout] \
    -to [ get_clocks *|some_ins|*|rx_clkout]
set_false_path -hold -from [ get_clocks *|some_ins|*|rx_clkout] \
    -to [ get_clocks *|some_ins|*|tx_clkout]

Choosing the delay as 4 ns as shown above keeps the delays sensibly small on a Cyclone 10, but this is something to verify separately on each design with the Timing Analyzer.

As for the two false path settings: Note that they are only for hold timing. This is sometimes necessary if the tools consider the clocks related, in which case the hold timing might fail because of the different clock delays. Since the clocks are treated as unrelated in the logic design, the hold timing is pointless.

Timing example: Register to pin (output)

+-------------------------------------------------------------+
; Path Summary                                                ;
+---------------------+---------------------------------------+
; Property            ; Value                                 ;
+---------------------+---------------------------------------+
; From Node           ; video_adc:video_adc_ins|pixadc_clk[1] ;
; To Node             ; pixadc_clk[1]                         ;
; Launch Clock        ; main_clk                              ;
; Latch Clock         ; n/a                                   ;
; Max Delay Exception ; 2.700                                 ;
; Data Arrival Time   ; 2.666                                 ;
; Data Required Time  ; 2.700                                 ;
; Slack               ; 0.034                                 ;
+---------------------+---------------------------------------+

+---------------------------------------------------------------------------------------------------------------------------------------------+
; Data Arrival Path                                                                                                                           ;
+---------+---------+----+------+--------+-----------------------+----------------------------------------------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location              ; Element                                                                    ;
+---------+---------+----+------+--------+-----------------------+----------------------------------------------------------------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                       ; launch edge time                                                           ;
; 0.559   ; 0.559   ;    ;      ;        ;                       ; clock path                                                                 ;
;   0.000 ;   0.000 ;    ;      ;        ;                       ; source latency                                                             ;
;   0.000 ;   0.000 ;    ;      ; 13     ; CLKCTRL_G13           ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk ;
;   0.000 ;   0.000 ; RR ; IC   ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc_ins|pixadc_clk[1]|clk                                            ;
;   0.559 ;   0.559 ; RR ; CELL ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc:video_adc_ins|pixadc_clk[1]                                      ;
; 2.666   ; 2.107   ;    ;      ;        ;                       ; data path                                                                  ;
;   0.771 ;   0.212 ;    ; uTco ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc:video_adc_ins|pixadc_clk[1]                                      ;
;   1.268 ;   0.497 ; RR ; CELL ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc_ins|pixadc_clk[1]|q                                              ;
;   1.268 ;   0.000 ; RR ; IC   ; 2      ; IOOBUF_X0_Y10_N2      ; pixadc_clk[1]~output|i                                                     ;
;   2.666 ;   1.398 ; RR ; CELL ; 1      ; IOOBUF_X0_Y10_N2      ; pixadc_clk[1]~output|o                                                     ;
;   2.666 ;   0.000 ; RR ; CELL ; 0      ; PIN_R2                ; pixadc_clk[1]                                                              ;
+---------+---------+----+------+--------+-----------------------+----------------------------------------------------------------------------+

+-------------------------------------------------------------------------+
; Data Required Path                                                      ;
+---------+---------+----+------+--------+----------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location ; Element             ;
+---------+---------+----+------+--------+----------+---------------------+
; 2.700   ; 2.700   ;    ;      ;        ;          ; latch edge time     ;
; 2.700   ; 0.000   ;    ;      ;        ;          ; clock path          ;
;   2.700 ;   0.000 ; R  ;      ;        ;          ; clock network delay ;
; 2.700   ; 0.000   ; R  ; oExt ; 0      ; PIN_R2   ; pixadc_clk[1]       ;
+---------+---------+----+------+--------+----------+---------------------+

The interconnect delay on the line after location CLKCTRL_G13 is the global clock’s delay, which the set_annotate_delay constraint forces to zero. Without that, it would have read 1.076 ns instead. Together with the create_clock assignment on the output pin, the only part left in the clock path is the 0.559 ns corresponding to the clock’s delay within the register itself (it’s not the clock-to-output, that one follows as uTco).

A regular create_clock declaration would have yielded the following at the beginning of the datapath instead:

+---------+---------+----+------+--------+-----------------------+------------------------------------------------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location              ; Element                                                                      ;
+---------+---------+----+------+--------+-----------------------+------------------------------------------------------------------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                       ; launch edge time                                                             ;
; 2.716   ; 2.716   ;    ;      ;        ;                       ; clock path                                                                   ;
;   0.000 ;   0.000 ;    ;      ;        ;                       ; source latency                                                               ;
;   0.000 ;   0.000 ;    ;      ; 1      ; PLL_3                 ; clkrst_ins|altpll_component|auto_generated|pll1|clk[0]                       ;
;   2.157 ;   2.157 ; RR ; IC   ; 1      ; CLKCTRL_G13           ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|inclk[0] ;
;   2.157 ;   0.000 ; RR ; CELL ; 13     ; CLKCTRL_G13           ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk   ;
;   2.157 ;   0.000 ; RR ; IC   ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc_ins|pixadc_clk[1]|clk                                              ;
;   2.716 ;   0.559 ; RR ; CELL ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc:video_adc_ins|pixadc_clk[1]

The above relates to a PLL without delay compensation.

Timing example: Pin to register (input)

+-----------------------------------------------------------+
; Path Summary                                              ;
+---------------------+-------------------------------------+
; Property            ; Value                               ;
+---------------------+-------------------------------------+
; From Node           ; pixadc_da[2]                        ;
; To Node             ; video_adc:video_adc_ins|samp_reg[2] ;
; Launch Clock        ; n/a                                 ;
; Latch Clock         ; main_clk                            ;
; Max Delay Exception ; 0.700                               ;
; Data Arrival Time   ; 0.992                               ;
; Data Required Time  ; 1.020                               ;
; Slack               ; 0.028                               ;
+---------------------+-------------------------------------+

+-------------------------------------------------------------------------------------------------+
; Data Arrival Path                                                                               ;
+---------+---------+----+------+--------+------------------+-------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location         ; Element                             ;
+---------+---------+----+------+--------+------------------+-------------------------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                  ; launch edge time                    ;
; 0.000   ; 0.000   ;    ;      ;        ;                  ; clock path                          ;
;   0.000 ;   0.000 ; R  ;      ;        ;                  ; clock network delay                 ;
; 0.000   ; 0.000   ; R  ; iExt ; 1      ; PIN_W2           ; pixadc_da[2]                        ;
; 0.992   ; 0.992   ;    ;      ;        ;                  ; data path                           ;
;   0.000 ;   0.000 ; RR ; IC   ; 1      ; IOIBUF_X0_Y7_N15 ; pixadc_da[2]~input|i                ;
;   0.748 ;   0.748 ; RR ; CELL ; 1      ; IOIBUF_X0_Y7_N15 ; pixadc_da[2]~input|o                ;
;   0.748 ;   0.000 ; RR ; IC   ; 1      ; FF_X0_Y7_N17     ; video_adc_ins|samp_reg[2]|d         ;
;   0.992 ;   0.244 ; RR ; CELL ; 1      ; FF_X0_Y7_N17     ; video_adc:video_adc_ins|samp_reg[2] ;
+---------+---------+----+------+--------+------------------+-------------------------------------+

+------------------------------------------------------------------------------------------------------------------------------------+
; Data Required Path                                                                                                                 ;
+---------+---------+----+------+--------+--------------+----------------------------------------------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location     ; Element                                                                    ;
+---------+---------+----+------+--------+--------------+----------------------------------------------------------------------------+
; 0.700   ; 0.700   ;    ;      ;        ;              ; latch edge time                                                            ;
; 1.124   ; 0.424   ;    ;      ;        ;              ; clock path                                                                 ;
;   0.700 ;   0.000 ;    ;      ;        ;              ; source latency                                                             ;
;   0.700 ;   0.000 ;    ;      ; 13     ; CLKCTRL_G13  ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk ;
;   0.700 ;   0.000 ; RR ; IC   ; 1      ; FF_X0_Y7_N17 ; video_adc_ins|samp_reg[2]|clk                                              ;
;   1.124 ;   0.424 ; RR ; CELL ; 1      ; FF_X0_Y7_N17 ; video_adc:video_adc_ins|samp_reg[2]                                        ;
; 1.020   ; -0.104  ;    ; uTsu ; 1      ; FF_X0_Y7_N17 ; video_adc:video_adc_ins|samp_reg[2]                                        ;
+---------+---------+----+------+--------+--------------+----------------------------------------------------------------------------+

First, note the zero time increment marked in green above. It just confirms that no Input Pin delay was inserted by the tools.

Once again, the zero increment in red is the result of the set_annotate_delay constraint. It would have read 1.028 ns otherwise.

And again, a regular create_clock declaration would have yielded the following at the beginning of the datapath instead:

+--------------------------------------------------------------------------------------------------------------------------------------+
; Data Required Path                                                                                                                   ;
+---------+---------+----+------+--------+--------------+------------------------------------------------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location     ; Element                                                                      ;
+---------+---------+----+------+--------+--------------+------------------------------------------------------------------------------+
; 0.700   ; 0.700   ;    ;      ;        ;              ; latch edge time                                                              ;
; 3.194   ; 2.494   ;    ;      ;        ;              ; clock path                                                                   ;
;   0.700 ;   0.000 ;    ;      ;        ;              ; source latency                                                               ;
;   0.700 ;   0.000 ;    ;      ; 1      ; PLL_3        ; clkrst_ins|altpll_component|auto_generated|pll1|clk[0]                       ;
;   2.770 ;   2.070 ; RR ; IC   ; 1      ; CLKCTRL_G13  ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|inclk[0] ;
;   2.770 ;   0.000 ; RR ; CELL ; 13     ; CLKCTRL_G13  ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk   ;
;   2.770 ;   0.000 ; RR ; IC   ; 1      ; FF_X0_Y7_N17 ; video_adc_ins|samp_reg[2]|clk                                                ;
;   3.194 ;   0.424 ; RR ; CELL ; 1      ; FF_X0_Y7_N17 ; video_adc:video_adc_ins|samp_reg[2]                                          ;
; 3.090   ; -0.104  ;    ; uTsu ; 1      ; FF_X0_Y7_N17 ; video_adc:video_adc_ins|samp_reg[2]                                          ;
+---------+---------+----+------+--------+--------------+------------------------------------------------------------------------------+

The figures differ from the corresponding figures for the output timing, because increments in the data required path relax the constraint, so the tools pick the minimal delays here.

Loop timing budget

OK, so we have one constraint requiring the data on output ports to be valid 2.7 ns after main_clk. We have another constraint saying that the delay from an input pin to a register is no more than 0.7 ns. The clock period is 16.666 ns. Does it mean that the difference, 16.666 – (2.7 + 0.7) = 13.266 ns is the time allowed for the device to respond?

In other words, if the output signal is a clock that triggers the outputs of the external device, is it enough that the device’s clock-to-output, plus the PCB trace delay, mount up to less than 13.266 ns?

The answer is almost yes. The only thing not taken into account is the skew between the clocks as they arrive to each of the two I/O registers, because the global clock delay was forced to zero. But the skew is typically less than a few hundred picoseconds. All the rest is covered.

Note in particular that in the input timing calculation, the data path (from the pin to the register) isn’t compared with the constrained time (0.7 ns), but rather with the constrained time, plus the register’s internal clock delay, minus the register’s setup time. In other words, these small adjustments result in an accurate answer to if 0.7 ns from the pin to the clock is OK.

And because the delay calculations for the input and output delays begin at exactly the same global clock toggle at the register’s pins, the overall result is valid and accurate, except for the global clock skew, which isn’t taken into account.

Conclusion

It’s quite peculiar that this seemingly simple task of constraining the I/O timing turned out to be as difficult. It’s also unfortunate that this requires some crippling of the regular register-to-register calculations.

What makes this even more unfortunate, is that this constraining is practically necessary to ensure that no input pin delay is inserted by the tools. It’s not just a safety mechanism to set the alarm if the I/O registers slip away into the logic fabric.

One could argue that if timing is important, an external clock should have been used as a direct reference, in which case this whole issue would not have risen. But the point is that even if the design doesn’t squeeze the best possible timing performance from the FPGA, proper constraining is still required. It’s the designers prerogative to use the FPGA in a suboptimal way for ease and laziness, as long as the application’s requirements are met. It’s too bad that the punishment comes from the tools themselves, turning a straightforward task into a saga.

Reader Comments

Is this necessary if your pll is multiplying by a power of two, such as 4 let’s say?

It would seem so, because the pll is internal and has a bunch of delays that get added into the calculation incorrectly apparently.

#1 
Written By Kelly lindseth on December 13th, 2018 @ 08:41

Actually, the delays by the PLL and the clock network can be, and usually are, corrected by the PLL itself, as its reference can be set to be the global clock network.

#2 
Written By eli on December 13th, 2018 @ 09:40

Add a Comment

required, use real name
required, will not be published
optional, your blog address