Quartus / Linux: Programming the FPGA with command-line

Command-line?

Yes, it much more convenient than the GUI programmer. Programming an FPGA is a repeated task, always the same file to the same FPGA on the same board connected to the computer. And somehow the GUI programming tools turn it into a daunting ceremony (and sometimes even a quiz, when it can’t tell exactly which device is connected, so I’m supposed to nail the exact one).

With command line its literally picking the command from bash history, and press Enter. And surprisingly enough, the command line tool doesn’t ask the silly questions that the GUI tool does.

First, some mucking about

Set up the environment:

$ /path/to/quartus/15.1/nios2eds/nios2_command_shell.sh

To list all devices found (cable auto-detected):

$ quartus_pgm --auto
Info: *******************************************************************
Info: Running Quartus Prime Programmer
    Info: Version 15.1.0 Build 185 10/21/2015 SJ Lite Edition
    Info: Copyright (C) 1991-2015 Altera Corporation. All rights reserved.
[ ... ]
    Info: agreement for further details.
    Info: Processing started: Sun May 27 15:06:22 2018
Info: Command: quartus_pgm --auto
Info (213045): Using programming cable "USB-BlasterII [2-5.1]"
1) USB-BlasterII [2-5.1]
  02B040DD   5CGTFD9(A5|C5|D5|E5)/..
  020A40DD   5M2210Z/EPM2210

[ ... ]

Note that listing the devices as shown above is not necessary for programming. It might be useful to tell the position of the FPGA in the JTAG chain, maybe. Really something that is done once to explore the board.

Programming

quartus_pgm displays most of its output in green. Generally speaking, if there’s no red text, all went fine.

$ quartus_pgm -m jtag -o "p;path/to/file.sof"

Or add the position in the JTAG explicitly (in particular if it’s not the first device). In this case it’s @1, meaning it’s the first device in the JTAG chain. If it’s the second device, pick @2 etc.

$ quartus_pgm -m jtag -o "p;path/to/file.sof@1"
Info: *******************************************************************
Info: Running Quartus Prime Programmer
    Info: Version 15.1.0 Build 185 10/21/2015 SJ Lite Edition
    Info: Copyright (C) 1991-2015 Altera Corporation. All rights reserved.
    Info: Your use of Altera Corporation's design tools, logic functions
    Info: and other software and tools, and its AMPP partner logic
    Info: functions, and any output files from any of the foregoing
    Info: (including device programming or simulation files), and any
    Info: associated documentation or information are expressly subject
    Info: to the terms and conditions of the Altera Program License
    Info: Subscription Agreement, the Altera Quartus Prime License Agreement,
    Info: the Altera MegaCore Function License Agreement, or other
    Info: applicable license agreement, including, without limitation,
    Info: that your use is for the sole purpose of programming logic
    Info: devices manufactured by Altera and sold by Altera or its
    Info: authorized distributors.  Please refer to the applicable
    Info: agreement for further details.
    Info: Processing started: Sun May 27 15:35:02 2018
Info: Command: quartus_pgm -m jtag -o p;path/to/file.sof@1
Info (213045): Using programming cable "USB-BlasterII [2-5.1]"
Info (213011): Using programming file p;path/to/file.sof@1 with checksum 0x061958E1 for device 5CGTFD9E5F35@1
Info (209060): Started Programmer operation at Sun May 27 15:35:05 2018
Info (209016): Configuring device index 1
Info (209017): Device 1 contains JTAG ID code 0x02B040DD
Info (209007): Configuration succeeded -- 1 device(s) configured
Info (209011): Successfully performed operation(s)
Info (209061): Ended Programmer operation at Sun May 27 15:35:09 2018
Info: Quartus Prime Programmer was successful. 0 errors, 0 warnings
    Info: Peak virtual memory: 432 megabytes
    Info: Processing ended: Sun May 27 15:35:09 2018
    Info: Elapsed time: 00:00:07
    Info: Total CPU time (on all processors): 00:00:03

If anything goes wrong — device mismatch, a failure to scan the JTAG chain or whatever, it will be hard to miss because of the errors written in red. The sweet thing with the command line interface is that every attempt starts from fresh, so just turn the board on (the usual reason for errors) and give it another go.

Quartus / sdc: Constraining I/O ports clocked by an internal clock

Introduction

This post is an expansion for another post of mine, which deals with register I/O packing. It’s recommended reading that one first.

Timing constraining of I/O ports is typically intended to ensure timing relations between an external clock and the timing of signals that are clocked by this clock (or derived from this clock, with the same frequency or a simple relation to it).

However in some cases the clock of the I/O registers is generated with an PLL within the FPGA, and is practically unrelated to the originating clock. There are still good reasons to constrain the timing of such ports, among others:

  • Even though the external clock source isn’t involved directly, the timing must still be under control. In particular, when the interface with an external device is bidirectional, the timing of the signals arriving from the device depend on those being generated by the FPGA going to it. Constraining the ports is part of ensuring that this timing loop is fast enough.
  • Ensuring that I/O registers are used. Tight constraints, which can only be met with I/O registers will fail if the tools don’t pack those registers as desired.
  • Ensuring that no delay is inserted by the tools between the input pad and the register.

Clearly, nobody at Altera thought that this kind of constraining was necessary. Consequently, getting this done in a fairly clean manner is nontrivial, to say the least (yours truly wasted a full week of work to figure this out). This post suggests a methodology which is hopefully the clean enough. This is the best I managed to work out, anyhow.

The relevant documentation:

The Intel / Altera toolset used is Quartus Prime 15.1 (Web Edition).

The goal

The naïve approach is to apply set_input_delay and set_output_delay to the output ports as usual, using the clock from the PLL in the -clock argument. Even so, the tools interpret this as constraining the timing relative to the external clock, which is inherently pointless, since this relation has no meaning. To make things worse, if the clock frequency relations aren’t a plain ratio, the timing requirements become unreal, as the closest clock edge relations between the two clocks are applied as the worst case. So this doesn’t work at all.

Ideally, we’d like to constrain only the path between the I/O pin and the register connected directly to it. It appears like there’s no way to do that exactly, as Quartus’ Timing Analyzer automatically mixes in the clock’s path delay when there’s a register involved.

So the goal is to define the delay between the external pin and the register that samples its state or vice versa, with as little interference as possible.

A sample set of constraints

This is the sdc file that worked for me. Each part is explained in detail afterwards.

create_clock -name root_clk -period 20.833 [get_ports {osc_clock}]

# The 60 MHz clock is defined on the global clock buffer's output pin:
create_clock -name main_clk -period 16.666 [get_pins {clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk}]

set_clock_groups -asynchronous -group [ get_clocks root_clk ] \
    -group [ get_clocks main_clk ]

set_annotated_delay -from clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk 0

derive_pll_clocks
derive_clock_uncertainty

set_false_path -hold -from [get_ports pixadc_*] -to [get_registers]
set_false_path -hold -from [get_registers] -to [get_ports pixadc_*]

set_max_delay -from [get_registers] -to [get_ports pixadc_*] 2.7
set_max_delay -from [get_ports pixadc_*] -to [get_registers] 0.7

create_clock assignments

In the relevant design, the external clock source runs at 48 MHz (20.833 ns period) and there’s a PLL on the FPGA generating a 60 MHz clock (16.666 ns period) based upon the external clock.

First and somewhat unrelated, note that neither the duty cycle nor the waveform attributes are given in these definitions. If the duty cycle is 50%, don’t add that rubbish. It’s the default anyhow, but is nevertheless added by the automatic constraint generator.

The definition of root_clk is quite standard. But pay attention to the way the derived clock, main_clk is defined. Not only isn’t it given as a derived clock from root_clk (or I could have relied on an automatic derivation made with “derive_pll_clocks”), but it’s assigned to the PLL’s output pin. It’s not a coincidence: That specific “get_pins” format is mandatory in the create_clock definition, or the PLL’s input-to-output delay is included (around 2 ns). For example, even

create_clock -name main_clk -period 16.666 clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]

(which is what the tools would generate automatically with derive_pll_clocks) will include the delay from inclk[0], even though the identifier given in the constraint is the one of the PLL’s output net. This is also the case if the net is referred to with “get_nets”. Only being specific with get_pins on the PLL’s output pins clarifies that the clock delay should start at the output of the PLL.

A peculiar thing is that the fitter issues a warnings like

Warning (332049): Ignored create_clock at test.sdc: Argument <targets> is an empty collection File: ...
 Info (332050): create_clock -name main_clk -period 16.666 [get_pins {clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk}] File: ...

and main_clk does indeed not appear in the list of clocks made by the fitter, even though the constraint’s effect is clear in the timing analysis made by the timing analyzer.

As shown below, the clock path is included in the timing calculation, no matter what. So this is one of the necessities for keeping that interference in the timing calculations minimal.

IMPORTANT: Setting the clock constraint on the PLL’s output pin as shown above, as well as the set_annotated_delay constraint, disrupt the constraining of register-to-register paths slightly, as the original, rigorous timing calculation assumes that the destination register receives the clock slightly earlier than the source register (presumably to represent a worst-case scenario involving global clock skew and other effects). It’s therefore recommended to compare the timing difference between the clock paths with the clock constraint made the classic way, and deduce this difference from the clock’s period, to ensure an accurate calculation. It should be no more than a few hundred picoseconds.

Ah, and another things I tried but lead nowhere: In the Handbook, it says that a virtual clock can be defined with something like

create_clock -name main_clk -period 16.666

In other words, this is a clock that can be mentioned in constraints, but has no signal in the FPGA related to it. My original thought that if this clock doesn’t relate to anything real, it won’t have any global clock delay in its timing calculations.

But trying it, constraining I/O pins with the virtual clock as the -clock argument, no timing calculations were made at all. The timing report ended up with “Nothing to report”. So the virtual clock concept didn’t help.

set_clock_groups

Nothing special about this. Just declaring that root_clk and main_clk should be considered unrelated (all paths between these clock domains are false).

set_annotated_delay

This command tells Timing Analyzer to consider the delay of the global clock buffer as zero. This is yet another necessity to keep unrelated delays out of the calculation.

Together with the definition of main_clk above, which relates the timed clock to the PLL’s output, the delay of the global clock network in the FPGA fabric is left out of the calculations. As shown below, there’s still a clock delay component that is counted in, but it’s presumably the delay of the clock within the logic element.

As the clock’s delay is left out, it doesn’t matter if the PLL is set to compensate for the global clock delay or not; the same timing is achieved either way. One could, by the way, argue that the PLL’s clock timing compensation is an alternative way to minimize the clock path’s role in the timing calculations. My own attempts to go down that road have however led to nothing else than a lot of wasted time. Note that in order to make sense of the PLL’s timing compensation, the commonplace create_clock definition must be used for main_clk, so the PLL’s own delay is included (it’s compensated for further down the road), and this leads to a total lack of control of what’s timed and what is not.

derive_pll_clocks and derive_clock_uncertainty

derive_pll_clocks is applied even though main_clk is defined explicitly with a create_clock constraint, and the latter overrides the clock generated by derive_pll_clocks. But since the create_clock statement for main_clk is ignored by the synthesizer as well as the fitter (because the relevant pin isn’t found), derive_pll_clocks is necessary during these stages to ensure that the relevant paths are timed. In particular, that the fitter makes sure that register-to-register paths meet timing.

If the clock period given in the create_clock constraint is shorter than the one derived from the PLL (which is recommended for reasons mentioned in this post), there might a situation where timing fails because the fitter didn’t attempt to meet a constraint it was blind to. Or at least theoretically. I’ve never encountered anything like this, partly because it’s quite difficult to fail on a 60 MHz clock.

derive_clock_uncertainty is used, as with any proper set of constraints.

set_max_delay

Finally, the delay constraints themselves. set_max_delay is used rather than set_input_delay and set_output_delay, mainly because set_max_delay expresses the element we want to constrain: A segment between the register and a port. As outlined in this other post of mine, set_input_delay and set_output_delay are tailored to allow copying numbers from the counterpart device’s datasheet directly. However if we want to constrain the internal delay with these, the sampling clock’s period needs to be taken into account. So for the purpose of constraining the internal delay, set_input_delay and set_output_delay’s values must be adjusted if the clock’s frequency changes, and that’s an unnecessary headache.

One could have hoped that there would be a way to constrain the plain combinatoric path between a port and a register. It seems however like there’s no way to do this, but that Timing Analyzer is being a bit too helpful: When any (or both) of the endpoints of a set_max_delay constraint is a register, the clock delay path is taken into consideration. In other words, if the source of the delay path is a register, the clock path delay is added to the constrained path to represent the fact that the data toggle from the source register is delayed by this path. Likewise, if the destination of the constraint is a register, the clock path is added to the timing requirement (relaxes the constraint) to represent that the destination register samples its input later.

This holds true no matter of how the register endpoint is given to the constraint command: Quite obviously, if get_regs was used to select the relevant endpoint, the clock path is included in the math. But it’s less obvious, for example, that if the source endpoint was selected with get_pins on the registers’ output pin (e.g. [ get_pins -hierarchical the_sample_reg|d ]), the clock path is still included. Bottom line: No way to avoid having the clock path in the math. This is the reason for the manipulations with create_clock and set_annotated_delay above.

Examples of the timing obtained with set_max_delay are given below.

set_false_path -hold

These set_false_path constraints disable the timing calculation for the registers’ hold requirement (note the -hold flag). Without these two constraints, Timing Analyzer will mark the relevant I/O ports (partly) unconstrained, even if they have related set_max_delay constraints. This has no practical implication except that the “TimeQuest Timing Analyzer” group in the GUI’s compilation report pane is marked red, indicating there’s a timing problem.

The sole purpose of these set_false_path constraints is hence to tell the tools not to bother about the hold paths, avoiding the said red color in the GUI.

As with any set_false_path constraint, care must be taken not to include any unintended paths.

Hold timing is irrelevant for the purpose of ensuring I/O register packing. Neither does it have any significance when timing against the external device, as its hold timing should be ensured by manual timing calculations. As for timing a loop from the FPGA to the device and back this is unnecessary as well: Failing the receiving register’s hold timing in this case requires that the receiver’s hold time is shorter than the clock-to-output (which involves driving a physical pin and its equivalent capacitor) plus the external device’s response time to the toggling signal. So this is by far unrealistic.

One could think that rather than making a false path, a reasonable set_min_delay constraint would do the job. But no: Any set_min_delay, which in turn activates hold time constraining, leads to an “Input Pin to Input Register Delay” as shown in this other post, but for other reasons and with another behavior. In particular, with the constraint setting of this post, this Input Pin delay is added even if that causes a failure of the set_max_delay constraint.

The underlying reason is to compensate for the clock delay: The tools must ensure that the clock arrives to the input register before the data on its input port toggles. Otherwise, the data sampled for the minimal clock case is different from the case of the maximal clock delay (for which the data toggle is obviously after the clock toggle).

Given the delay in the clock path, this forces the tools to insert a delay before the input register that is at least the clock time delay. When clock delay compensation is enabled at the PLL (and the originating clock is external), the PLL is set to create a negative clock delay, hence eliminating the need for this Input Pin delay.

But it gets worse with a clock generated internally: It’s not completely clear why, but even if the clock path is set to zero with the set_annotate_delay statement as said above, the tools keep adding this delay. Also regardless of whether the PLL is set to compensate for the clock delay. One explanation can be found in set_annotated_delay’s help text saying “This assignment is for timing analysis only, and is not considered during timing-driven compilation”. But this still doesn’t explain why it’s inserted even with the clock path compensation of the PLL enabled. So the conclusion is that the tools weren’t really meant to handle this internally generated clock scheme.

Bottom line: Don’t make any set_min_delay constraints on this path, and surely not set_input_delay -min or set_output_delay -min (the latter two will mess up things even worse. Believe me on that).

Timing example: Register to pin (output)

+-------------------------------------------------------------+
; Path Summary                                                ;
+---------------------+---------------------------------------+
; Property            ; Value                                 ;
+---------------------+---------------------------------------+
; From Node           ; video_adc:video_adc_ins|pixadc_clk[1] ;
; To Node             ; pixadc_clk[1]                         ;
; Launch Clock        ; main_clk                              ;
; Latch Clock         ; n/a                                   ;
; Max Delay Exception ; 2.700                                 ;
; Data Arrival Time   ; 2.666                                 ;
; Data Required Time  ; 2.700                                 ;
; Slack               ; 0.034                                 ;
+---------------------+---------------------------------------+

+---------------------------------------------------------------------------------------------------------------------------------------------+
; Data Arrival Path                                                                                                                           ;
+---------+---------+----+------+--------+-----------------------+----------------------------------------------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location              ; Element                                                                    ;
+---------+---------+----+------+--------+-----------------------+----------------------------------------------------------------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                       ; launch edge time                                                           ;
; 0.559   ; 0.559   ;    ;      ;        ;                       ; clock path                                                                 ;
;   0.000 ;   0.000 ;    ;      ;        ;                       ; source latency                                                             ;
;   0.000 ;   0.000 ;    ;      ; 13     ; CLKCTRL_G13           ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk ;
;   0.000 ;   0.000 ; RR ; IC   ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc_ins|pixadc_clk[1]|clk                                            ;
;   0.559 ;   0.559 ; RR ; CELL ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc:video_adc_ins|pixadc_clk[1]                                      ;
; 2.666   ; 2.107   ;    ;      ;        ;                       ; data path                                                                  ;
;   0.771 ;   0.212 ;    ; uTco ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc:video_adc_ins|pixadc_clk[1]                                      ;
;   1.268 ;   0.497 ; RR ; CELL ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc_ins|pixadc_clk[1]|q                                              ;
;   1.268 ;   0.000 ; RR ; IC   ; 2      ; IOOBUF_X0_Y10_N2      ; pixadc_clk[1]~output|i                                                     ;
;   2.666 ;   1.398 ; RR ; CELL ; 1      ; IOOBUF_X0_Y10_N2      ; pixadc_clk[1]~output|o                                                     ;
;   2.666 ;   0.000 ; RR ; CELL ; 0      ; PIN_R2                ; pixadc_clk[1]                                                              ;
+---------+---------+----+------+--------+-----------------------+----------------------------------------------------------------------------+

+-------------------------------------------------------------------------+
; Data Required Path                                                      ;
+---------+---------+----+------+--------+----------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location ; Element             ;
+---------+---------+----+------+--------+----------+---------------------+
; 2.700   ; 2.700   ;    ;      ;        ;          ; latch edge time     ;
; 2.700   ; 0.000   ;    ;      ;        ;          ; clock path          ;
;   2.700 ;   0.000 ; R  ;      ;        ;          ; clock network delay ;
; 2.700   ; 0.000   ; R  ; oExt ; 0      ; PIN_R2   ; pixadc_clk[1]       ;
+---------+---------+----+------+--------+----------+---------------------+

The interconnect delay on the line after location CLKCTRL_G13 is the global clock’s delay, which the set_annotate_delay constraint forces to zero. Without that, it would have read 1.076 ns instead. Together with the create_clock assignment on the output pin, the only part left in the clock path is the 0.559 ns corresponding to the clock’s delay within the register itself (it’s not the clock-to-output, that one follows as uTco).

A regular create_clock declaration would have yielded the following at the beginning of the datapath instead:

+---------+---------+----+------+--------+-----------------------+------------------------------------------------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location              ; Element                                                                      ;
+---------+---------+----+------+--------+-----------------------+------------------------------------------------------------------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                       ; launch edge time                                                             ;
; 2.716   ; 2.716   ;    ;      ;        ;                       ; clock path                                                                   ;
;   0.000 ;   0.000 ;    ;      ;        ;                       ; source latency                                                               ;
;   0.000 ;   0.000 ;    ;      ; 1      ; PLL_3                 ; clkrst_ins|altpll_component|auto_generated|pll1|clk[0]                       ;
;   2.157 ;   2.157 ; RR ; IC   ; 1      ; CLKCTRL_G13           ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|inclk[0] ;
;   2.157 ;   0.000 ; RR ; CELL ; 13     ; CLKCTRL_G13           ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk   ;
;   2.157 ;   0.000 ; RR ; IC   ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc_ins|pixadc_clk[1]|clk                                              ;
;   2.716 ;   0.559 ; RR ; CELL ; 1      ; DDIOOUTCELL_X0_Y10_N4 ; video_adc:video_adc_ins|pixadc_clk[1]

The above relates to a PLL without delay compensation.

Timing example: Pin to register (input)

+-----------------------------------------------------------+
; Path Summary                                              ;
+---------------------+-------------------------------------+
; Property            ; Value                               ;
+---------------------+-------------------------------------+
; From Node           ; pixadc_da[2]                        ;
; To Node             ; video_adc:video_adc_ins|samp_reg[2] ;
; Launch Clock        ; n/a                                 ;
; Latch Clock         ; main_clk                            ;
; Max Delay Exception ; 0.700                               ;
; Data Arrival Time   ; 0.992                               ;
; Data Required Time  ; 1.020                               ;
; Slack               ; 0.028                               ;
+---------------------+-------------------------------------+

+-------------------------------------------------------------------------------------------------+
; Data Arrival Path                                                                               ;
+---------+---------+----+------+--------+------------------+-------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location         ; Element                             ;
+---------+---------+----+------+--------+------------------+-------------------------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                  ; launch edge time                    ;
; 0.000   ; 0.000   ;    ;      ;        ;                  ; clock path                          ;
;   0.000 ;   0.000 ; R  ;      ;        ;                  ; clock network delay                 ;
; 0.000   ; 0.000   ; R  ; iExt ; 1      ; PIN_W2           ; pixadc_da[2]                        ;
; 0.992   ; 0.992   ;    ;      ;        ;                  ; data path                           ;
;   0.000 ;   0.000 ; RR ; IC   ; 1      ; IOIBUF_X0_Y7_N15 ; pixadc_da[2]~input|i                ;
;   0.748 ;   0.748 ; RR ; CELL ; 1      ; IOIBUF_X0_Y7_N15 ; pixadc_da[2]~input|o                ;
;   0.748 ;   0.000 ; RR ; IC   ; 1      ; FF_X0_Y7_N17     ; video_adc_ins|samp_reg[2]|d         ;
;   0.992 ;   0.244 ; RR ; CELL ; 1      ; FF_X0_Y7_N17     ; video_adc:video_adc_ins|samp_reg[2] ;
+---------+---------+----+------+--------+------------------+-------------------------------------+

+------------------------------------------------------------------------------------------------------------------------------------+
; Data Required Path                                                                                                                 ;
+---------+---------+----+------+--------+--------------+----------------------------------------------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location     ; Element                                                                    ;
+---------+---------+----+------+--------+--------------+----------------------------------------------------------------------------+
; 0.700   ; 0.700   ;    ;      ;        ;              ; latch edge time                                                            ;
; 1.124   ; 0.424   ;    ;      ;        ;              ; clock path                                                                 ;
;   0.700 ;   0.000 ;    ;      ;        ;              ; source latency                                                             ;
;   0.700 ;   0.000 ;    ;      ; 13     ; CLKCTRL_G13  ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk ;
;   0.700 ;   0.000 ; RR ; IC   ; 1      ; FF_X0_Y7_N17 ; video_adc_ins|samp_reg[2]|clk                                              ;
;   1.124 ;   0.424 ; RR ; CELL ; 1      ; FF_X0_Y7_N17 ; video_adc:video_adc_ins|samp_reg[2]                                        ;
; 1.020   ; -0.104  ;    ; uTsu ; 1      ; FF_X0_Y7_N17 ; video_adc:video_adc_ins|samp_reg[2]                                        ;
+---------+---------+----+------+--------+--------------+----------------------------------------------------------------------------+

First, note the zero time increment marked in green above. It just confirms that no Input Pin delay was inserted by the tools.

Once again, the zero increment in red is the result of the set_annotate_delay constraint. It would have read 1.028 ns otherwise.

And again, a regular create_clock declaration would have yielded the following at the beginning of the datapath instead:

+--------------------------------------------------------------------------------------------------------------------------------------+
; Data Required Path                                                                                                                   ;
+---------+---------+----+------+--------+--------------+------------------------------------------------------------------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location     ; Element                                                                      ;
+---------+---------+----+------+--------+--------------+------------------------------------------------------------------------------+
; 0.700   ; 0.700   ;    ;      ;        ;              ; latch edge time                                                              ;
; 3.194   ; 2.494   ;    ;      ;        ;              ; clock path                                                                   ;
;   0.700 ;   0.000 ;    ;      ;        ;              ; source latency                                                               ;
;   0.700 ;   0.000 ;    ;      ; 1      ; PLL_3        ; clkrst_ins|altpll_component|auto_generated|pll1|clk[0]                       ;
;   2.770 ;   2.070 ; RR ; IC   ; 1      ; CLKCTRL_G13  ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|inclk[0] ;
;   2.770 ;   0.000 ; RR ; CELL ; 13     ; CLKCTRL_G13  ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk   ;
;   2.770 ;   0.000 ; RR ; IC   ; 1      ; FF_X0_Y7_N17 ; video_adc_ins|samp_reg[2]|clk                                                ;
;   3.194 ;   0.424 ; RR ; CELL ; 1      ; FF_X0_Y7_N17 ; video_adc:video_adc_ins|samp_reg[2]                                          ;
; 3.090   ; -0.104  ;    ; uTsu ; 1      ; FF_X0_Y7_N17 ; video_adc:video_adc_ins|samp_reg[2]                                          ;
+---------+---------+----+------+--------+--------------+------------------------------------------------------------------------------+

The figures differ from the corresponding figures for the output timing, because increments in the data required path relax the constraint, so the tools pick the minimal delays here.

Loop timing budget

OK, so we have one constraint requiring the data on output ports to be valid 2.7 ns after main_clk. We have another constraint saying that the delay from an input pin to a register is no more than 0.7 ns. The clock period is 16.666 ns. Does it mean that the difference, 16.666 – (2.7 + 0.7) = 13.266 ns is the time allowed for the device to respond?

In other words, if the output signal is a clock that triggers the outputs of the external device, is it enough that the device’s clock-to-output, plus the PCB trace delay, mount up to less than 13.266 ns?

The answer is almost yes. The only thing not taken into account is the skew between the clocks as they arrive to each of the two I/O registers, because the global clock delay was forced to zero. But the skew is typically less than a few hundred picoseconds. All the rest is covered.

Note in particular that in the input timing calculation, the data path (from the pin to the register) isn’t compared with the constrained time (0.7 ns), but rather with the constrained time, plus the register’s internal clock delay, minus the register’s setup time. In other words, these small adjustments result in an accurate answer to if 0.7 ns from the pin to the clock is OK.

And because the delay calculations for the input and output delays begin at exactly the same global clock toggle at the register’s pins, the overall result is valid and accurate, except for the global clock skew, which isn’t taken into account.

Conclusion

It’s quite peculiar that this seemingly simple task of constraining the I/O timing turned out to be as difficult. It’s also unfortunate that this requires some crippling of the regular register-to-register calculations.

What makes this even more unfortunate, is that this constraining is practically necessary to ensure that no input pin delay is inserted by the tools. It’s not just a safety mechanism to set the alarm if the I/O registers slip away into the logic fabric.

One could argue that if timing is important, an external clock should have been used as a direct reference, in which case this whole issue would not have risen. But the point is that even if the design doesn’t squeeze the best possible timing performance from the FPGA, proper constraining is still required. It’s the designers prerogative to use the FPGA in a suboptimal way for ease and laziness, as long as the application’s requirements are met. It’s too bad that the punishment comes from the tools themselves, turning a straightforward task into a saga.

UPS, Fedex or DHL: Will your neighbor get your package?

Is that for me?

I had some $1,200 worth package sent to me from an electronics vendor (Mouser) with UPS. Free shipping. Got an SMS saying when the courier was expected to arrive. Took a nap and didn’t hear the phone ringing nor the doorbell. Woke up to an SMS saying “thank you for choosing UPS” and a note on the door saying the package was delivered to me neighbor.

Needless to say, I didn’t give my consent to this. Actually, I didn’t know this option existed with large couriers. Don’t get me wrong: My neighbor is great. I just think it’s completely wrong that he should be bothered with my stuff.

This isn’t a rant post. It’s a note to future self, so I can make an informed choice of courier and shipping conditions. Like many other posts on this blog, I’m writing it for myself, but let others see and share their insights (in the comments).

Needless to say, all companies deliver the package to you if you’re at home. The question is what they do if you’re not. Will they get rid of the package as quickly as possible, or will they go on trying (which makes the delivery more expensive to them).

Written in August 2018.

UPS

Yes, the delivery to neighbor was legit, according to UPS’ own website: “Shipments that do not require a signature can be left in a safe place, out of sight and out of weather, at the driver’s discretion. This could include the front porch, side door, back porch, garage area, or with a neighbor or leasing office (which would be noted in a yellow UPS InfoNotice® left by the driver).”

From “UPS’ Tariffs / Terms and Conditions”, “Delivery”: “UPS does not limit Delivery of a Shipment to the person specified as the Receiver in the UPS Shipping System. Unless the Shipper uses Delivery Confirmation service requiring a signature, UPS reserves the right, in its sole and unlimited discretion, to make a Delivery without obtaining a signature.”

The “Signature Required” option adds $4.75 to the tariff, according to their pricing page. Mouser obviously opted this out. So much for “free shipping”.

Fedex

From Fedex’ Service Guide 2018, “FedEx Express Terms and Conditions”, in “Delivery Signature Options”, it says “someone at the delivery address” with respect to who is allowed to acknowledge the delivery. If the sender has chosen “Indirect Signature Required”, a neighbor is perfectly eligible to sign for the parcel. Actually, it gets better: “Shipments to residential addresses may be released without obtaining a signature. If you require a signature for a residential shipment, select one of the Delivery Signature Options.” Let’s hope that the sender does require a signature.

So it seems Fedex is flexible on this issue, requiring the sender to pick the option, possibly at a cost: For example, “Direct Signature Required” costs $4.75 extra if the package’s worth is under $500, according to Fedex’ Fees information leaflet. The Service Guide 2018 confirms this: “Direct Signature Required fees will apply only to those packages within the shipment with a declared value of less than $500″. In other words, they don’t give the shipper the option to be irresponsible.

Conclusion: No package above $500 will reach the neighbor. Or if that extra tariff has been paid.

DHL

DHL’s “Terms and Conditions” (which is remarkably short and concise) says under “Deliveries and Undeliverables”: “Shipments cannot be delivered to PO boxes or postal codes. Shipments are delivered to the Receiver’s address given by Shipper but not necessarily to the named Receiver personally. Shipments to addresses with a central receiving area will be delivered to that area. DHL may notify Receiver of an upcoming delivery or a missed delivery. Receiver may be offered alternative delivery options such as delivery on another day, no signature required, redirection or collection at a DHL Service Point. Shipper may exclude some delivery options on request”.

No neighbors mentioned, no alternative destinations. It’s either the destination address or nothing.

Their German site allows choosing a preferred neighbor or a preferred outdoors location for placing the parcel. This is an active choice made by the recipient, not an ad-hoc improvisation by the courier.

TNT?

Opted out, after I recently had to make several phone calls in order to get the invoice for the customs clearance tariffs. At least in Israel, they’re not up to it.

Bottom line

Judging by the official docs, DHL most careful about where the package ends, but Fedex isn’t so bad either (in particular when the declared worth is above $500, or if those extra $4.75 has been paid).

UPS, well, it seems like they offer good deals to the shippers.

Solved: netcat (nc) doesn’t terminate at end of transmission

Introduction

I often use netcat to transmit chunks of data between two Linux machines. I usually go something like

$ pv backup-image.iso | nc -l 1234

on one machine, and then maybe

# nc 10.1.2.3 1234 > /dev/sdb1

This is an example for using another machine to write data into a USB disk-on-key, because writing to any /dev/sdX on my main computer scares me too much.

But it doesn’t quit when it finishes

So after a couple of hours of operation it’s obviously finished, but with certain couples of computers, neither side quits the netcat program. So it’s not clear if the very last piece of data was fully written on the target.

Immediate thing to try

If you’re stuck like this after a long netcat transmission, this might save you: Press CTRL-D on the console of computer receiving data. If you’re lucky, this releases netcat on both sides.

Why this happens

This was written in July 2018, reflecting the netcat versions I’ve come across.

netcat opens a bidirectional TCP link, passing one side’s stdin to the other side’s stdout and vice versa. When netcat is faced with an EOF on its standard input, it may or may not close the sending part of its TCP connection, depending on which version of netcat it is. If it indeed closed the the sending connection, a FIN TCP packet is sent to the other side. The netcat program on the other side receives this FIN packet, and may or may not quit, once again, depending on which version of netcat it is. If it did quit, it returns a FIN packet, closing the connection altogether.

So we have two “may or may not”, leading to four possibilities of how it all behaves.

The example above works if (and only if) the sending side sends a FIN when it sees EOF on its stdin, and that causes the other side’s netcat to quit, closing the TCP connection completely (sending a FIN packet back), which causes the first netcat to quit as well. And all is good.

Well, no. Formally, this is actually wrong behavior: Considering netcat to be a bidirectional link (this isn’t used a lot, but still), closing one direction shouldn’t cause the closing of the other. Maybe there’s data for waiting for transmission in the opposite direction. It’s perfectly legal, and quite commonplace, to transmit data on a half-open TCP link.

This is probably why recent revisions of netcat will not quit on receiving a FIN packet, but only when there’s no data in either direction: After receiving a FIN on the incoming TCP line and an EOF on its stdin.

Also, recent netcat revisions ignore the EOF on its stdin until the FIN arrives, unless the -q flag is given (which is not supported by earlier versions, but neither is it needed). This causes a potential deadlock: Even if both sides have received an EOF, none will quit, because neither has sent the FIN packet. The -q flag solves this.

Does it matter which side is listening?

I haven’t read the sources (of which revision should I read?), but after quite some experiments I got the impression that the behavior is symmetric: Client and server behave exactly the same way. Doesn’t matter which side was listening and which was initiating the TCP connection.

So what to do

Since there are two different paradigms of netcat out there, there’s no catch-all solution. For each pair of machines, test your netcat pair before starting a heavy operation. Possibly on a short file, possibly by typing data on the console. Be sure both sides quit at the end.

One thing that helps is to change the receiving part to e.g.

# nc 10.1.2.3 1234 > /dev/sdb1 < /dev/null

/dev/null supplies an EOF right away. Older netcats will send a FIN on the TCP link immediately on its establishment, so if there’s an old netcat on the other side, both sides quit right away, possibly before any data is sent. But if there are old netcats on both sides, you’re probably not bothered by this issue at all.

Newer netcats do nothing because of this immediate EOF (they ignore it until a FIN arrives), but it allows them to quit when the other side does.

Another thing to do, is to add the -q flag on the sending netcat (supported only by the newer netcats). For example:

$ pv backup-image.dat | nc -q 0 -l 1234

The “-q 0″ part tells netcat to “quit” immediately after receiving an EOF (or so says the man page). My own anecdotal experiment shows that “-q 0″ doesn’t make netcat quit at all, but just to send the FIN packet when an EOF arrives. In other words, “-q 0″ means “send a FIN packet when EOF arrives on stdin”. Something old netcats do anyhow.

This is good enough to get out of the deadlock mentioned above: When the data stream ends, the sending part sends a FIN because of the “-q 0″ flag. The receiving part now has an EOF by virtue of /dev/null, and a FIN from the sending part, so it quits, sending a FIN back. Now the first side has an EOF and a FIN, and quits as well.

Note that the “-q 0″ is more important than the /dev/null trick: If the receiving side has quit, we know all data has been flushed. It therefore doesn’t matter so much that a CTRL-C is needed to release the sending side. Doesn’t matter, but doesn’t add a feeling of confidence when the transmission is really important.

And this bring me back to what I began with: Each pair of computers needs a test before attempting something heavy. Sadly.

Synplify Pro on Linux Mint 18.1: The cheat sheet

Introduction

I needed to run Synplify Pro for a short trial period on my Fedora 12 machine (yup, it’s 2018, and still). And I have a full Mint 18.1 as a chroot jail on that machine for installing contemporary software.

So these are my notes on the go. Consider everything below as run om Mint 18.1 x86_64 (which is an Ubuntu derivative), except for the licensing manager, which I eventually ran directly on the Fedora 12 oldie (x86_64 as well).

I should point out, that Synopsys officially supports only Red Hat based distributions (SUSE and RHEL), which explains why small tweaks were necessary. But once that is over with, all was fine.

Synopsys offers extensive documentation for all this, of course. As the title implies, this should be considered as a cheat sheet, nothing more.

Download the stuff

In essence, three parts are needed: Synopsys’ installer program, the tool to run and the licensing manager. In my case, I downloaded all inside the following directories (into separate directories on my own computer):

  • /rev/installer_v4.1
  • /rev/s_fpga_d_vN-2018.03-SP1
  • /rev/scl_v2018.06

These three directories happened to be everything under /rev/ in my case (this is what they prepared for me, I suppose). So I grabbed it all. This included some large files for Windows, which I surely didn’t need, but it’s easier to fetch all files and wait longer than to use my own brain, for example.

Make the system ready

C-shell is used by the installation scripts (and others, possibly):

# apt-get install csh

Synplify itself expects a LSB (Linux Standard Base), in particular a symlink in /lib64/ for the ELF loader.

# apt-get install lsb-core

Without this, the licensing related program go something like:

$ ./lmhostid
-bash: ./lmhostid: No such file or directory

And then you go “But what??? The file is there!” and you’re right, the file is there, but the ELF loader which the executable requests isn’t, because it’s /lib64/ld-lsb-x86-64.so.3. I’ve discussed this issue in another post.

Also, some scripts have their shebang to /bin/sh (!). Unfortunately, Debian’s standard symlink for /bin/sh goes to /bin/dash (because working out of the box is for the weak). So

# cd /bin
# mv sh old-sh
# ln -s bash sh

If you don’t change this symlink, the typical error goes “synplify-pro-exec/fpga/N-2018.03-SP1/bin/config/execute: Syntax error: “(” unexpected (expecting “;;”)”

Then create another symlink from /usr/tmp to /tmp, because the licensing deamon creates the lock file there. As root:

# cd /usr
# ln -s /tmp

Installing

Refer to the Synopsys’ installation guide for how to run through the installation. This is just a brief.

Installing doesn’t require being root, if the target directories are owned by the user.

First, install the installer. In the directory where the installer was downloaded to, go

$ ./SynopsysInstaller_v4.1.run

And extract the installer into some other directory.

Navigate to the directory to which the installer was installed, and go

$ ./setup.sh

for a GUI installer, or

$ ./installer

for the textual stuff (but the ssh’ers).

The installer should be run (at least) twice: Once to install the tool of interest, and a second time to install the licensing manager.

For each time tell the installer where Synopsys’ installation files were downloaded to, and then to where the installed program should go. Both are different directories for each of the two installations.

Editing the licensing file

Edit the SERVER line, replacing “hostname1″ with the actual host name (as returned by “uname -n”).

There is no need to change the VENDOR line. At least in my case, it worked fine as is.

Check the licensing file

Be sure it’s valid. Navigate to where the licensing manager was installed (e.g. scl/2018.06/linux64/bin), and go (below is a successful validation for a temporary key):

$ ./sssverify /path/to/license.txt 

Integrity check report for license file "/path/to/license.txt".
Report generated on 06-Jul-2018 (SCL_2018.06)
---------------------------------------------------------
Checking the integrity of the license file...
Valid SSST feature found.
Licensed to Temp Keys for New Customers
Siteid: 5.1, Server Hostid: 200247EDD334, Issued on: 7/4/2018
License file integrity check PASSED!
---------------------------------------------------------
You may now USE this license file to start your license server.
Please don't edit or manipulate the contents of this license file.

Or use the -pinfo flag for a list of licensed features:

$ ./sssverify -pinfo /path/to/license.txt
=============================================================================================
	PRODUCT TO FEATURE MAPPING REPORT GENERATED ON 6/7/2018
---------------------------------------------------------------------------------------------
License File: /path/to/license.txt
Site ID: NEWSITE
Host ID: 200247EDD334
Key File Date: 07/04/2018
SCL Version: SCL_2018.06
=============================================================================================

=============================================================================================
Product: *****			 Serial Number: (SN=0:0)
---------------------------------------------------------------------------------------------
Feature Name                     Expiry-Date  Daemon       Version Quantity        Start-Date
---------------------------------------------------------------------------------------------
SSST                             22-Jul-2018  snpslmd      1.0            1       04-Jul-2018
=============================================================================================

=============================================================================================
Product: *****			 Serial Number: (SN=4881-0:503161)
---------------------------------------------------------------------------------------------
Feature Name                     Expiry-Date  Daemon       Version Quantity        Start-Date
---------------------------------------------------------------------------------------------
synplifypro_altera               22-jul-2018  snpslmd      2018.03        1
=============================================================================================

Start the licensing manager

OK, this is the only place where I left my Mint chroot jail, because I got

18:07:13 (snpslmd) Cannot open daemon lock file
18:07:13 (snpslmd) EXITING DUE TO SIGNAL 41 Exit reason 9
18:07:13 (lmgrd) snpslmd exited with status 41 (Exited because another server was running)

and then it just worked on Fedora 12, so what the heck with that. There is a word that the licensing manager doesn’t work on reiserfs, and I also spotted with strace that this failure occurs immediately after getdents() system calls on the root directory, which was a fake root in my case. So maybe because of that, maybe something else I didn’t get right, or more precisely: Didn’t bother to get right.

Anyhow, root aren’t required, and neither is any environment variable.

$ cd /path/to/scl/2018.06/linux64/bin
$ ./lmgrd -c /path/to/license.txt

And of course, if you’re really into it, make a service for this on your machine.

Is there a licensing manager running?

Is it up?

 ./lmstat
lmstat - Copyright (c) 1989-2017 Flexera Software LLC. All Rights Reserved.
Flexible License Manager status on Fri 7/6/2018 20:59

License server status: 27020@myhost.localdomain
    License file(s) on myhost.localdomain: /path/to/license.txt:

myhost.localdomain: license server UP (MASTER) v11.14.1

Vendor daemon status (on myhost.localdomain):

   snpslmd: UP v11.14.1

What’s its process (for killing)?

$ ps aux | grep lmg
eli      27499  0.0  0.0  17760  1428 pts/15   S    17:41   0:00 ./lmgrd -c /path/to/license.txt

Shut down the licensing manager

$ ./lmdown -c /path/to/synplify-pro-exec/license.txt

This works however only if the licensing manager went up OK. Otherwise, it might say it shut down the daemon, but there’s still a process running.

Running Simplify Pro

Finally there.

Be sure that the licensing manager is up and running, and go:

$ SNPSLMD_LICENSE_FILE='27020@localhost' ./synplify_pro &

Synopsys’ docs tell us to set and export the environment variable, but this way works, and this is how I like it.

Cyclone V and some transceiver CDR/PLL parameters

Introduction

Connecting an Intel FPGA (Altera) Cyclone V’s Native Transceiver IP to a USB 3.0 channel (which involves a -5000 ppm Spread Spectrum modulation), I got a significant bit error rate and what appeared to be occasional losses of lock. Suspecting that the CDR didn’t catch up with the frequency modulation, I wanted to try out a larger PLL bandwidth = track more aggressively at the expense of higher jitter. That turned out to be not so trivial.

This post sums up my findings related to Quartus. As for solving the original problem (bit errors and that), changing the bandwidth made no difference.

Toolset: Quartus Lite 15.1 on Linux.

And by the way, the problem turned out to be unrelated to the PLL, but the lack of an  equalizer on Cyclone V’s receiver. Hence no canceling of the low-pass filtering effect of the USB 3.0 cable. I worked this around by setting XCVR_RX_LINEAR_EQUALIZER_CONTROL  to 2 in the QSF file and the errors were gone. However this just activates a constant compensation high-pass filter on the receiver’s input (see the Cyclone V Device Datasheet, CV-51002,  2018.05.07, Figure 4) and consequently works the problem around for a specific cable, not more.

Assignments in the QSF file

In order to change the CDR’s bandwidth, assignments in the QSF are due, as detailed in V-Series Transceiver PHY IP Core User Guide (UG-01080, 2017.07.06) in the section “Analog Settings for Cyclone V Devices” and on page 20-28. In principle, CDR_BANDWIDTH_PRESET should be set to High instead of its default “Auto”. In this post, I’ll also set PLL_BANDWIDTH_PRESET to High, even though I’m quite confident it has nothing to do with locking to data (rather, it controls locking to the reference clock). But it causes quite some confusion, as shown below.

So all that is left is to nail down the CDR’s instance name, and assign it these parameters.

Now first, what not to do: Using wildcards. This is quite tempting because the path to the CDR is very long. So at first, I went for this, which is wrong:

set_instance_assignment -name CDR_BANDWIDTH_PRESET High -to *|xcvr_inst|*rx_pma.rx_cdr
set_instance_assignment -name PLL_BANDWIDTH_PRESET High -to *|xcvr_inst|*rx_pma.rx_cdr

And nothing happened, except a small notice in some very important place of the fitter report:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
; Ignored Assignments                                                                                                                                                                                   ;
+--------------------------------------------------+---------------------------+--------------+------------------------------------------------------------+---------------+----------------------------+
; Name                                             ; Ignored Entity            ; Ignored From ; Ignored To                                                 ; Ignored Value ; Ignored Source             ;
+--------------------------------------------------+---------------------------+--------------+------------------------------------------------------------+---------------+----------------------------+
; Merge TX PLL driven by registers with same clear ; altera_xcvr_reset_control ;              ; alt_xcvr_reset_counter:g_pll.counter_pll_powerdown|r_reset ; ON            ; Compiler or HDL Assignment ;
; CDR Bandwidth Preset                             ; myproj                    ;              ; *|xcvr_inst|*rx_pma.rx_cdr                                 ; HIGH          ; QSF Assignment             ;
; PLL Bandwidth Preset                             ; myproj                    ;              ; *|xcvr_inst|*rx_pma.rx_cdr                                 ; HIGH          ; QSF Assignment             ;
+--------------------------------------------------+---------------------------+--------------+------------------------------------------------------------+---------------+----------------------------+

Ayeee. So it seems like there’s no choice but to spell out the entire path. I haven’t investigated this thoroughly, though. Maybe there is some form of wildcards that would work. I also discuss this topic briefly in another post of mine.

So this is more like it:

set_instance_assignment -name CDR_BANDWIDTH_PRESET High -to frontend_ins|xcvr_inst|xcvr_inst|gen_native_inst.av_xcvr_native_insts[0].gen_bonded_group_native.av_xcvr_native_inst|inst_av_pma|av_rx_pma|rx_pmas[0].rx_pma.rx_cdr
set_instance_assignment -name PLL_BANDWIDTH_PRESET High -to frontend_ins|xcvr_inst|xcvr_inst|gen_native_inst.av_xcvr_native_insts[0].gen_bonded_group_native.av_xcvr_native_inst|inst_av_pma|av_rx_pma|rx_pmas[0].rx_pma.rx_cdr

I guess this clarifies why wildcards are tempting.

Verifying something happened

This is where things get confusing. Looking at the fitter report, in the part on transceivers, this was the output before adding the QSF assignments above (pardon the wide line, this is what the Fitter produced):

;         -- Name                                                                                           ; frontend:frontend_ins|xcvr:xcvr_inst|altera_xcvr_native_av:xcvr_inst|av_xcvr_native:gen_native_inst.av_xcvr_native_insts[0].gen_bonded_group_native.av_xcvr_native_inst|av_pma:inst_av_pma|av_rx_pma:av_rx_pma|rx_pmas[0].rx_pma.rx_cdr                                                                                                                                                                                                                                                     ;
;         -- PLL Location                                                                                   ; CHANNELPLL_X0_Y49_N32                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ;
;         -- PLL Type                                                                                       ; CDR PLL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ;
;         -- PLL Bandwidth Type                                                                             ; Auto (Medium)                                                                                                                                                                                                                                                                                                                                                                                                                                                                               ;
;         -- PLL Bandwidth Range                                                                            ; 2 to 4 MHz

And after adding the QSF assignments:

;         -- Name                                                                                           ; frontend:frontend_ins|xcvr:xcvr_inst|altera_xcvr_native_av:xcvr_inst|av_xcvr_native:gen_native_inst.av_xcvr_native_insts[0].gen_bonded_group_native.av_xcvr_native_inst|av_pma:inst_av_pma|av_rx_pma:av_rx_pma|rx_pmas[0].rx_pma.rx_cdr                                                                                                                                                                                                                                                     ;
;         -- PLL Location                                                                                   ; CHANNELPLL_X0_Y49_N32                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ;
;         -- PLL Type                                                                                       ; CDR PLL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ;
;         -- PLL Bandwidth Type                                                                             ; High                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ;
;         -- PLL Bandwidth Range                                                                            ; 4 to 8 MHz

Bingo, huh? Well, not really. Which of these two assignments made this happen? CDR_BANDWIDTH_PRESET or PLL_BANDWIDTH_PRESET? In other words: Does the fitter report tell us about the bandwidth of the PLL on the reference clock or the data?

The answer is PLL_BANDWIDTH_PRESET. Setting CDR_BANDWIDTH_PRESET doesn’t change anything in the Fitter report at all. I know it all too well (after spending some pleasant quality time trying to figure out why, before realizing it’s about PLL_BANDWIDTH_PRESET).

So where’s does CDR_BANDWIDTH_PRESET do its trick?

To find that, one needs to get down to the post-fitting properties of the rx_cdr instance. The following sequence applies to Quartus 15.1′s GUI:

After fitting, select Tools > Netlist Viewers > Technology Map Viewer (Post-Fitting). Locate the instance in the Find tab (to the left; it’s a plain substring search on the instance name given in the QSF assignment). Once found, click on the block in the graphics display so its bounding box becomes red, and then right-click this block. On the menu that shows up, select Locate in Resource Property Editor.

And that displays a list of properties (which can be exported into a CSV file). One of which is rxpll_pd_bw_ctrl. Changing CDR_BANDWIDTH_PRESET to High altered this property’s value from 300 to 600. Changing it to Low sets it to 240.

And by the way, a change in PLL_BANDWIDTH_PRESET to High has no impact on any of the properties listed in the Resource Property Editor for the said instance, but making it Low takes pfd_charge_pump_current_ctrl from 30 to 20, and rxpll_pfd_bw_ctrl from 4800 to 3200. Whatever that means.

It’s worth mentioning that the CDR is instantiated as an arriav_channel_pll primitive (yes, an Arria V primitive on a Cyclone V FPGA) in the av_rx_pma.sv module (generated automatically for the Transceiver Native PHY IP). One of the instantiation parameters is rxpll_pd_bw_ctrl, which is assigned 300 by default. The source file doesn’t change as a result of the said change in the QSF file. So the tools somehow change something post-synthesis. I guess.

There are however no instantiation parameters for neither pfd_charge_pump_current_ctrland nor rxpll_pfd_bw_ctrl. So the rxpll_pd_bw_ctrl naming match is probably more of a coincidence. Once again, I guess.

A closer look on the PLL

It’s quite clear from above that CDR_BANDWIDTH_PRESET influenced rxpll_pd_bw_ctrl (note the _pd_ part) and that PLL_BANDWIDTH_PRESET is related to a couple of parameters with pfd them. This terminology goes along with the one used in the documentation (see e.g. Figure 1-17, “Channel PLL Block Diagram” in Cyclone V Device Handbook Volume 2: Transceivers, cv_5v3.pdf, 2016.01.28): The displayed terminology is that PFD relates to the Lock-To-Reference loop, which locks on the reference clock, and PD relates to the Lock-To-Data loop, which is the CDR.

This isn’t just a curiosity, because the VCO’s output dividers, L, are assigned separately for the PD and PDF loops (see the fitter report as well as Table 1-9).

As for the numbers in the fitter report, they match the doc’s as shown in the two relevant segments below. The first relates to a Native PHY IP, and the second to a PCIe PHY, both on the same design, both targeted at 5 Gb/s (and hence having the same “Output Clock Frequency”).

;         -- Reference Clock Frequency                                                                      ; 100.0 MHz                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   ;
;         -- Output Clock Frequency                                                                         ; 2500.0 MHz                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  ;
;         -- L Counter PD Clock Disable                                                                     ; Off                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         ;
;         -- M Counter                                                                                      ; 25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          ;
;         -- PCIE Frequency Control                                                                         ; pcie_100mhz                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ;
;         -- PD L Counter                                                                                   ; 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ;
;         -- PFD L Counter                                                                                  ; 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ;
;         -- Powerdown                                                                                      ; Off                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         ;
;         -- Reference Clock Divider                                                                        ; 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ;

versus

;         -- Reference Clock Frequency                                                                      ; 100.0 MHz                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   ;
;         -- Output Clock Frequency                                                                         ; 2500.0 MHz                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  ;
;         -- L Counter PD Clock Disable                                                                     ; Off                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         ;
;         -- M Counter                                                                                      ; 25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          ;
;         -- PCIE Frequency Control                                                                         ; pcie_100mhz                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ;
;         -- PD L Counter                                                                                   ; 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ;
;         -- PFD L Counter                                                                                  ; 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ;
;         -- Powerdown                                                                                      ; Off                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         ;
;         -- Reference Clock Divider                                                                        ; 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ;

In both transceivers, a 2500 MHz clock is generated from a 100 MHz reference clock. It seems like the trick to understanding what’s going on is noting footnote (2) of Table 1-17, saying that the output of L_PD is the one that applies when the PLL is configured as a CDR.

In the first case, the reference clock is fed into the phase detector without division. Since the reference clock is not divided, 100 MHz reaches one input of the phase detector. As the output is divided by PFD_L = 2 and then by M=25, the VCO has to run at 5000 MHz so that its output divided by 50 matches the 100 MHz reference. That doesn’t seem very clever to me (why not pick L=1, and avoid 5 GHz, which I’m not even sure is possible on that silicon?). But at least the math adds up: The output is divided with PD_L = 2, and we have 2500 MHz.

Now to the second case (PCIe): The reference clock is divided by 2, so the phase detector is fed with a 50 MHz reference. The VCO’s clock is divided by PDF_L = 2 and then with M = 25, and hence the VCO runs at 2500 MHz. This way, the total division by 50 (again) matches the 50 MHz reference on the phase detector. PD_L = 1, so the VCO’s output is used undivided, hence an output clock of 2500 MHz, again.

I’m not sure that I’m buying this explanation myself, actually, but it’s the only way I found to make sense of these figures. At some point I tried to convince the tools to divide the reference clock by 2 on the Native PHY (first case above) by adding

set_instance_assignment -name PLL_PFD_CLOCK_FREQUENCY "50 MHz" -to "frontend:frontend_ins|xcvr:xcvr_inst|altera_xcvr_native_av:xcvr_inst|av_xcvr_native:gen_native_inst.av_xcvr_native_insts[0].gen_bonded_group_native.av_xcvr_native_inst|av_pma:inst_av_pma|av_rx_pma:av_rx_pma|rx_pmas[0].rx_pma.rx_cdr"

to the QSF file. This assignment was silently ignored. It wasn’t mentioned anywhere in the reports (not even in the Ignored Assignments) part, but the Divider remained at 1. I should mention that this assignment isn’t documented for Cyclone V, but Quartus Assignment Editor nevertheless agreed to generate it. And Quartus usually refuses to load a project if anything is fishy in the QSF file.

Quartus, timing closure: Obtaining a concise multi-corner timing path report

Introduction

The natural thing to do when an FPGA design fails timing is to take a detailed look at the critical paths, based upon a timing report showing the logic elements and their delays of this path.

If you’re not a heavy user of Intel’s FPGAs (a.k.a. Altera), it may not be so trivial to figure out how to obtain this report. And even worse, you might unknowingly be looking at the wrong report.

The first thing to have sorted out is the concept of multi-corner timing analysis. Without proving that this is sufficient and/or necessary (mainly because I don’t know. Can anyone present a solid proof?), the common practice is to verify an FPGA’s timing validity by ensuring that the timing constraints are met in four cases: The minimal and maximal temperature, and a “slow” and “fast” model, which makes four combinations, or as they are referred to, four corners.

When looking at the critical paths, it’s therefore important to look at the paths at all four corners. This is often overlooked: For example, just generating a timing report in TimeQuest, typically produces the report for a single corner.

So this post describes how to get the report that says something. It relates to Quartus Prime 17.1 Lite.

Everything the said below (including the scripts) works on Quartus Prime 15.1 Lite as well, except that this version (and earlier, I suppose) doesn’t generate any multi-corner reports in any form. This makes the HTML report generation option attractive, as these reports are easier to work with.

I should mention two other related posts in this blog: One taking a look on the relation between input / output constraints and the timing report, and another experimenting a bit with Tcl scripting with TimeQuest.

Getting a multi-corner report: Scripted & quick

First, copy the following Tcl script into a file, say, timing.tcl:

create_timing_netlist
read_sdc
update_timing_netlist

foreach_in_collection op [get_available_operating_conditions] {
  set_operating_conditions $op

  report_timing -setup -npaths 20 -detail full_path -multi_corner \
    -panel_name "Critical paths"
}

Don’t let the “multi_corner” flag confuse you: Each call to report_timing covers one corner. It’s not clear if this flag does anything.

Now to action:

  • In Quartus, expand the TimeQuest group in the Task pane, and open TimeQuest Timing Analyzer.
  • In TimeQuest Timing Analyzer, pick Script > Run Tcl Script… from the menu bar, and select the Tcl script (e.g. timing.tcl).
  • An entry named “Critical paths” is added to the TimeQuest Timing Analyzer’s Report pane. Click on Multi-Corner Summary. A list of paths and their details now fill the main panes.
  • To export all path information into a textual file, right-click Multi-Corner Summary, and select “Export…”. Choose a name for an output file with a .rpt suffix. HTML reports are not supported (they will be empty).

There will also be four separate reports in the same entry, one for each corner. On earlier versions of Quartus, only these will appear (i.e., no Multi-Corner Summary).

Generate HTML / text reports only

The tools can generate neat HTML reports, which are considerably more comfortable to read than TimeQuest’s own GUI. Alas, these reports only cover one corner each. This script generates four HTML reports (it’s a whole bunch of files, JQuery script files, CSS and whatnot. Bells and whistles, but not a multi-corner report).

Suppose the following script as timing-html.tcl

#project_open myproj
create_timing_netlist
read_sdc
update_timing_netlist

foreach_in_collection op [get_available_operating_conditions] {
  set_operating_conditions $op

  report_timing -setup -npaths 20 -detail full_path -multi_corner \
    -file "timing_paths_$op.html" \
    -panel_name "Critical paths for $op"
}

For a plain textual report, change the -file flag’s argument, so the suffix is .rpt or .txt instead of .html.

Note the “project_open” command which is commented out at the top of the script. If it’s uncommented and “myproj” is replaced with the actual project name, a plain shell command line can be used to generate the HTML reports with something like

$ /path/to/quartus/bin/quartus_sta -t timing-html.tcl

I haven’t however found a way to generate a multi-corner report like this.

In order to have these reports generated in each implementation (which is recommended), add a line like the following to the QSF file:

set_global_assignment -name TIMEQUEST_REPORT_SCRIPT relative/path/to/timing-html.tcl

When included in a QSF file, the said Tcl script should not call project_open (comment it out or delete it).

The GUI only method

A multi-corner report can be obtained with just pointing and clicking:

  • In Quartus, expand the TimeQuest group in the Task pane, and open TimeQuest Timing Analyzer.
  • Inside the Timing Analyzer’s Tasks pane, double-click “Update Timing Netlist” .
  • In the same pane, scroll down to “Custom Reports” and double-click “Report Timing…”
  • A dialog box opens. Accept the defaults, and click “Report Timing” below.
  • In the Report pane, an “Report Timing” entry will be added. Expand it and right-click it. In the menu that opens, click “Generate in All Corners”
  • Click on the “Multi Corner Summary” group and possibly export the report as outlined above.

Making any IP in the IP Catalog availabe in QSys

Introduction

I needed the Cyclone V Transceiver Native PHY IP Core inside QSys. Why? Actually, part of a failed attempt to find solve a compilation error.

The IP is available in Quartus 15.1′s IP Catalog, but inside the same toolkit’s QSys it doesn’t appear in the list of IPs. As discussed in this forum thread, this is intentional: Altera doesn’t support having it inside QSys, seemingly because it’s not “fully verified”. OK, so I’ll take the risk. How do I make QSys list this IP, so it can be included?

The fix

As mentioned in this guide, the thing is that IPs which are hidden from QSys have the INTERNAL property set to “true”. All that is left is hence to edit the relevant Tcl file, and update the IP database.

Mission number one is to find the correct Tcl file. The hints on the file’s name are:

  • It’s probably related to the IP’s name and functionality
  • It ends with *_hw.tcl
  • The FPGA family is denoted by “av”, “cv” “sv” etc

Eventually the file I was looking for was at /path/to/quartus/ip/altera/alt_xcvr/altera_xcvr_native_phy/cv/tcl/altera_xcvr_native_cv_hw.tcl. Unlike many other HW Tcl files, it doesn’t just assign parameters directly (in which case it’s easy to spot the assignment to INTERNAL), but it merely consists of adding a couple of directories to some search path, and then it goes:

::altera_xcvr_native_cv::module::declare_module

which refers to module.tcl, which has the following code snippet:

  namespace export \
    declare_module

  # Internal variables
  variable module {\
    {NAME                   VERSION                 INTERNAL  ANALYZE_HDL EDITABLE  ELABORATION_CALLBACK                        PARAMETER_UPGRADE_CALLBACK                    DISPLAY_NAME                        GROUP                                 AUTHOR                DESCRIPTION DATASHEET_URL                                           DESCRIPTION  }\
    {altera_xcvr_native_cv  15.1  true      false       false     ::altera_xcvr_native_cv::module::elaborate  ::altera_xcvr_native_cv::parameters::upgrade  "Cyclone V Transceiver Native PHY"  "Interface Protocols/Transceiver PHY" "Altera Corporation"  NOVAL       "http://www.altera.com/literature/ug/xcvr_user_guide.pdf" "Cyclone V Transceiver Native PHY."}\
  }
}

This is an assignment of multiple variables: The names of the variables are listed on the first curly brackets, and the values in the second. As the third variable is INTERNAL, that’s the one to fix. So the actual edit consist of changing the “true” marked in red above to “false”.

Updating the IP catalog

Only making the change above isn’t enough. The IP Catalog cache must be updated as well.

Change directory to something like /path/to/quartus/ip/altera/ and set up the environment variables:

$ ../../nios2eds/nios2_command_shell.sh

and then create an IP Catalog cache:

$ ip-make-ipx

Once done, overwrite the previous file (you may want to make a copy of it first):

$ mv components.ipx altera_components.ipx

And now restart Quartus. The said IP now appears in QSys’ IP Catalog.

Combining PCIe and Gigabit Transceiver on Cyclone V

Overview

The goal: Using one of the PCIe transceivers for something else on a Cyclone V GT FPGA Development Kit Board, while keeping the PCIe link (narrowing it down from 4x to 2x). This allows allocating some other logic to the transceiver that goes to the PCIe finger, and do the bifurcation with a PCIe extender cable (and some soldering).

In the relevant project, the PCIe block was implemented as a QSys unit, accommodating Altera’s reset and calibration IP blocks.

Use the same controller clock

The first attempt was to just create and instantiate a Cyclone V Transceiver Native PHY IP Core, and place its pins instead of one of the PCIe lanes. But adding a plain MGT transceiver to the design caused the fitter to fail with the following error messages, whether it was supposed to be placed in the same transceiver group of six or not:

Error (14566): The Fitter cannot place 2 periphery component(s) due to conflicts with existing constraints (2 HSSI PMA Aux. block(s)). Fix the errors described in the submessages, and then rerun the Fitter. The Altera Knowledge Database may also contain articles with information on how to resolve this periphery placement failure. Review the errors and then visit the Knowledge Database at https://www.altera.com/support/support-resources/knowledge-base/search.html and search for this specific error message number.
    Error (175001): The Fitter cannot place 1 HSSI PMA Aux. block, which is within Cyclone V Transceiver Native PHY native_xcvr.
        Info (14596): Information about the failing component(s):
            Info (175028): The HSSI PMA Aux. block name(s): native_xcvr:native_xcvr|altera_xcvr_native_av:native_xcvr_inst|av_xcvr_native:gen_native_inst.av_xcvr_native_insts[0].gen_bonded_group_native.av_xcvr_native_inst|av_pma:inst_av_pma|av_tx_pma:av_tx_pma|av_tx_pma_ch:tx_pma_insts[0].av_tx_pma_ch_inst|tx_pma_ch.tx_pma_buf.tx_pma_aux
        Error (178014): Partition assignments may be preventing transceiver placement - transceivers optimizations across partitions are not supported in this version of the Quartus Prime software. For more information, refer to the Release Notes.
    Error (175001): The Fitter cannot place 1 HSSI PMA Aux. block, which is within Cyclone V Transceiver Native PHY native_xcvr.
        Info (14596): Information about the failing component(s):
            Info (175028): The HSSI PMA Aux. block name(s): native_xcvr:native_xcvr|altera_xcvr_native_av:native_xcvr_inst|av_xcvr_native:gen_native_inst.av_xcvr_native_insts[0].gen_bonded_group_native.av_xcvr_native_inst|av_pma:inst_av_pma|av_rx_pma:av_rx_pma|rx_pmas[0].rx_pma.rx_pma_aux
        Error (178014): Partition assignments may be preventing transceiver placement - transceivers optimizations across partitions are not supported in this version of the Quartus Prime software. For more information, refer to the Release Notes.
Error (12289): An error occurred while applying the periphery constraints. Review the offending constraints and rerun the Fitter.

The error messages are horribly misleading, and made me try pushing the transceiver into the QSys project that defines the PCIe interface. Fortunately or unfortunately, I fixed the problem accidentally (or more like out of laziness) and got the idea that it was indeed a matter of structure and partition. See “Leftovers” section below for what I tried and eventually failed.

So the real reason it failed: There are these reconfiguration and reset generating IPs, which are driven by some clock (mgmt_clk on the reconfiguration block), which isn’t necessarily any of the transceiver’s data rate reference clocks. If the PCIe’s reconfiguration IP is driven by a clock that is different from the other transceiver’s, this fitter error occurs.

The solution was to use PCIe’s reference clock for driving the reconfiguration logic of the new transceiver. It could have been the other way around as well, I suppose. Either way, it’s just a utility clock.

Narrow down PCIe to 1x

The next obstacle was that both the PCIe and the general-purpose transceiver relied on two different dedicated reference clock inputs for their PLLs, but with different bit rates. This calls for two different CMUs (PLLs). And according to the Handbook (CV-5V3, figures 2-2 and 2-7), there’s a thing with only the CMUs of CH1 and CH4 being connected directly to these reference clocks. So without diving to deep into the clocking structure, it became apparent that the only way to solve this PLL placement problem was to narrow down the PCIe interface further down to 1x, so it didn’t occupy the CH1 transceiver. I suppose that freed its CMU, and allowed it to generate the clock for some other transceiver.

Also in CV-5V3, figure 4-5 shows that on a PCIe 2x or x4 configuration, CMU PLL is placed in CH4, which is inactive as a transmitter. This goes along with a comment made a page earlier, saying that “The Quartus II software automatically places the CMU PLL in a channel different from that of the data channels”.

And if this doesn’t sound very worked out from my side, it’s because I abandoned this direction and bought an HSMC breakout board instead. So I lost motivation to deal with this issue. Too much hassle.

Leftovers (or: some failed attempts)

Nothing really meaningful here: The jots below were written in my attempts to figure out how and why partitions made any difference (in hindsight: they don’t).

I first tried to put the transceiver as an IP inside the Qsys that holds the PCIe IPs. Which requires some acrobatics, as described in this post. That happened to work, because I used the same clock for both reconfiguration block (and accidentally hit gold without realizing it). Sometimes laziness is a blessing.

Encouraged by the false “I nailed it” feeling, I tried to include the Verilog and SystemVerilog sources of the QSys design, so they’ll be one chunk so to speak, but then the error came back.

One thing that I noted in the map report is that it said:

Info (16010): Generating hard_block partition "hard_block:auto_generated_inst"

So I tried this between synthesis and fitting (merging partitions should do it…?)

$ quartus_cdb theproj --merge=on

Not only did this not make any difference, but the merge report that was generated showed that all HSSI related logic (PMA / PCS elements etc.) was all in the hard_block:auto_generated_inst partition, and none outside it.

Apparently, this hard_block partition is some kind of container for the transceiver-related logic block, and nothing else. Harmless, it seems.

Add to the list of failing attempts: Export the entire design as a post-synthesis QXP, and then run it through quartus_map with only this QXP as a source, followed by the fitter. Which failed the same way. Actually, the same message on generating the hard_block partition appeared in the map report adopting the QXP file.

Also, when going (from shell, then Tcl shell):

$ quartus_sta -s
tcl> project_open theproj
tcl> create_timing_netlist
tcl> report_partitions

The report implied that only the “Top” partition exists (on a design without the added transceiver, or there would be no timing netlist to work with).

Which left me wondering what’s the magic about putting the transceiver inside the QSys design? The answer is of course, none whatsoever. It had nothing to do with partitioning in the first place.

QDB vs. QXP, Quartus Pro vs. Standard: Post-synthesis packaging of an IP core

Introduction

It’s often desired to package an piece of FPGA logic in a post-synthesis (netlist) format for later use in another project. IP core vendors often deliver their products as netlists, partly to protect themselves from unauthorized copying and use, and partly to ensure that possible bugs in the end-user’s synthesizer don’t influence the product they support.

Intel FPGA (formerly Altera) have recently released a new edition of Quartus, codenamed Pro. Intel’s publications indicate that all of their series-10 FPGAs (Cyclone-10, Arria-10, Stratix-10) are covered by the Pro edition only, with the exception of Arria-10 (covered by both editions) and Cyclone-10 LP (covered by the Standard edition).

Among other differences, Quartus Pro doesn’t support the QXP format, which was the netlist-like format used with Quartus Standard. As the Pro edition arrives with a different synthesizer (“quartus_syn” instead of “quartus_map”) and a different internal database structure, the QXP doesn’t fit in.

Aside from this, I can’t comment much on the differences between Standard and Pro, and just by using both, no other difference stands out. As for the new synthesizer: On my own anecdotal experiment with an Arria-10 design that failed timing with the Standard edition, the Pro edition’s ended up with a slightly worse timing result. So on the face of it, there is nothing new and blazing about Pro.

All said in this post relates to Quartus Prime Version 17.1.0 Build 240 SJ Pro Edition running on a Linux machine. I’d expect things to change in future revision (or maybe this is just wishful thinking). The HDL language in this post is Verilog.

The official source of information, followed below, is the Intel Quartus Prime Pro Edition Handbook Volume 1, Design and Compilation, section 7.4, titled “Design Block Reuse”. Intel also offers a free online 45-minute training lecture (Design Block Reuse in the Intel Quartus Prime Pro Software, slides + sound), which pretty much covers the topic.

Update, 8.7.18: Be sure to read the bottom of this post regarding the VQM format.

How to convert a QXP to a QDB

It’s not possible, seemingly because of the difference in the internal database structure.

Packaging a core as a QDB

The desired QDB file is a post-synthesis export of a partition which contains the IP core. Even if the design consists the core to package only, neither an export of the entire synthesized design, nor an export of the root partition will do the trick (or so the Handbook says).

It’s therefore required to generate a wrapper module for the core in question, and instantiate the core in it. This allows defining a dedicated partition at the core’s boundaries. It can be done with the GUI, or a line like the following can be added to the QSF file:

set_instance_assignment -name PARTITION my_core -to my_core_ins -entity top

Where “top” is the name of the toplevel module, which wraps my_core. my_core_ins is the instance name used in the instantiation of my_core.

With this line in place, run Analysis & Synthesis in Quartus, or launch quartus_syn as a command-line utility to synthesize the design.

Once the synthesis is finished, export the core’s Synthesized snapshot with Project > Export Design Partition, and select my_core as the Partition name (it will most likely be the only choice).

Alternatively, use the following command:

quartus_cdb my_core -c my_core --export_partition my_core --snapshot synthesized --file /path/to/my_core.qdb

Note that some information of the file path at which the core was compiled is stored in the qdb file (see e.g. qdb/qar_info.json and sdc.cdb after including the QDB file in the target project) as well as the operating system used (the same file).

Including the QDB file in a design

To include the core in a design, a partition is generated in the target design, and then the QDB file is plugged into that partition by virtue of a QDB_FILE_PARTITION partition. For example:

set_instance_assignment -name PARTITION pr_foo -to foo_ins|my_core_ins
set_instance_assignment -name QDB_FILE_PARTITION /path/to/my_core.qdb -to foo_ins|my_core_ins

In the example above, my_core_ins is the instance name of the core within a module, which is in turn instantiated with the name foo_ins in the toplevel module.

The name pr_foo has no special significance, expect that it’s the name given to the partition.

Unlike the use of QXP files in Quartus’ Standard edition, a black box file must be included in the project: It’s just like the core’s top level module, but with anything between the port declaration and the “endmodule” statement wiped out.

Failing to include a black box file in the project causes a few warnings by fitter like

Warning(13032): The following tri-state nodes are fed by constants
        Warning(13033): The node "foo_ins|my_core_ins" is fed by GND

which escalate to errors, also by the fitter:

Error(13076): The pin "foo_ins|my_core_ins.user_r_read_32_data_w" has multiple drivers due to the non-tri-state driver "foo_ins|my_core_ins"

As a side note, I’d also mention another QSF assignment, QDB_FILE, which may appear to replace QXP_FILE, but attempting to use it with QDB file yields an error:

Error(19507): QDB_FILE assignment "my_core.qdb" is not supported.

Quartus’ help on this error just says that the QDB_FILE assignment isn’t supported, and suggests using QDB_FILE_PARTITION. So it’s not really clear why the QSB_FILE assignment exists.

Matching target device / Quartus revision

Unlike common practice with netlist files, the FPGA part number for which the QDB file was synthesized must match the FPGA part number of the project in which it’s instantiated, or the fitter rejects it with an error like

Error(18097): Partition "|" contains assignment "DEVICE" with setting "10AX115S3F45E2SG", which is different from setting "10AX115S2F45I2VG" in partition "foo_ins|my_core_ins". Modify your design so all partitions use the same setting for the specified assignment.

This finding is contrary to Intel’s training lecture, which says that another FPGA device can be used if the exported snapshot was of post-synthesis type. My experience was different. Also, the Handbook (revision of 2017.11.06) says in Section 7.4.1.4: “Because the exported .qdb includes compiled netlist information, the Consumer project must target the same FPGA device part number and use the same Intel Quartus Prime version as the Developer project.”

I’ve also tried to get an answer on this in Altera’s forum.

So while QXP files can be used freely within an entire device family with Quartus Standard edition, Quartus Pro’s QDB is by far more limited, at least on revision 17.1. And there is no way to generate an EDIF file or anything of that sort in any Quartus edition.

My bet would be that the original intention was to make QDB a true netlist replacement, but then too much of the internal database quirks leaked into the QDB format, making it dependent on both software version and FPGA device. Whether this will be solved, time will tell.

All in all, it’s not clear if a reasonable substitute for a netlist exists at all in Quartus Pro as of writing this. Which is make the choice calling this edition “Pro” quite ironic.

The VQM format

It turns out that Quartus Standard Edition can generate a netlist in VQM format, which is the format used when importing netlists from third-party synthesizers into Quartus (Quartus Pro included).

So after a successful Quartus synthesis, just go

$ quartus_cdb my_core --vqm=my_core.vqm

which generates a post-synthesis netlist in Verilog. Note that a FAMILY or DEVICE  assignment must be present in the QSF file, or quartus_cdb complains quartus_map should be run with the –family flag assigned (which is wrong, it doesn’t help).

It can be imported into any design with the following line in the QSF file (or use the GUI to import it):

set_global_assignment -name VQM_FILE /path/to/my_core.vqm

and it works exactly like a QXP. Or so it did in my own anecdotal experiment: The entire compilation went through cleanly, and the design worked exactly the same on hardware.

Quartus can be instructed to resynthesize the VQM’s WYSIWYG primitives with the following assignment in the QSF file:

set_global_assignment -name ADV_NETLIST_OPT_SYNTH_WYSIWYG_REMAP ON

In Quartus’ context, the term WYSIWYG seems to mean that the instantiated modules turn into logic block as defined in the Verilog file (so what is instantiated, “seen” is “what you get”).

The reason I call my own experiment “anecdotal” is the following couple of lines at the very top of the VQM file generated by quartus_cdb:

// !!!!!! This generated VQM is intended for Academic use or Internal Altera use only !!!!!!
// Functionality may not be correct on the programmed device or in simulation

So despite this scary warning, it did work for me. How seriously this warning should be taken is still to find out. Comments are welcome.

Now, here’s the catch: Quartus Pro won’t generate VQM (according to the Intel Quartus Prime Pro Edition Handbook Volume 1, section 1.1, “Saving a node-level netlist as .vqm” is one of the features explicitly not supported).

To make things worse, there’s an obvious problem exporting a VQM made by Quartus Standard to Quartus Pro, because the VQM is a Verilog file, but the wire names contain all kind of Verilog-wise illegal characters, such as “.”, “:”, “~” and “|”. So even though Quartus Standard accepts its own VQM file, Quartus Pro spits out error messages like

Error(18303): Name "mod:mod_i|twentynm_lcell_comb:wr_data~20_I" is illegal. Avoid using '*', ':' and '|' in your naming scheme because these characters have special meaning in Quartus.
Error(18303): Name "mod:mod_i|twentynm_lcell_comb:wr_data~0_I" is illegal. Avoid using '*', ':' and '|' in your naming scheme because these characters have special meaning in Quartus.
Error(18303): Name "mod:mod_i|twentynm_lcell_comb:wr_data~14_I" is illegal. Avoid using '*', ':' and '|' in your naming scheme because these characters have special meaning in Quartus.

Not to mention that the only device family Standard and Pro shares is Arria 10 (and maybe a bit of Cyclone 10?).