Altera NIOS II jots

About this post

These are things I wrote down at different stages of introducing myself to Nios II and its environment. Nothing really consistent nor necessarily the right way to do things.

Jots

  • Open Qsys. Follow this post.
  • Went for Nios II classic, used Nios/e (no Hardware multiplication, as the target device doesn’t have it. Set instruction cache to 2 kB, and no data cache
  • Add 16 kB on-chip memory (Basic > On-Chip Memory > On-Chip Memory (RAM or ROM) ). Data width 32 bits, set hex file to raminit.hex (to be found at verilog/raminit.hex)
  • Attach memory to processor’s Avalon master
  • Attach peripherals
  • Connect clk_0′s clock output to all clock inputs (including processor’s).
  • Same with reset
  • Assign base addresses automatically: System > Assign Base Addresses
  • Enter the CPU configuration, and assign the Reset and Exception Vectors to the onchip memory (this issues an offset to the addresses, per the peripheral’s offsets).
  • Build the Qsys project. Among all (Verilog) files, it generates a processor.sopcinfo file.

Software

  • Launch Nios II Software Build Tools for Eclipse (from Qsys or Quartus)
  • Pick a path for the workspace
  • Pick File > New > Nios II Application and BSP from Template. Assign the SOPC information file as processor.sopcinfo as generated before, and pick the “Hello World” template. There’s also a much smaller “Minimal Hello World” which allows communication with the JTAG UART.
  • Build the project. Eh, it failed. Not enough memory (printf is heavy. There’s a thinner version, but doesn’t matter now)
  • Go back to Qsys, and make on-chip memory 40960 bytes large (40kB, fitter fails if it’s 48 kB). Re-run Assign Base Addresses.
  • Build the Qsys project again
  • Regenerate the BSP: In Eclipse, right-click the BSP project, pick Nios II > Generate BSP (NOT from the top menu’s Nios II, there is no such option there!). Or alternatively, within a NIOS2 shell (see below), and from the BSP project’s home directory, go
    nios2-bsp-generate-files --settings settings.bsp --bsp-dir .
  • Rebuild: Project > Clean… and clean all, with the rebuild option set.
  • To add a lot of files to a project: Right-click the project, pick Import…, General > File System. Click Browse… and navigate to the directory where the files are and pick the directory. Then choose the desired files. Pick Advanced below, and pick “Add links” (it works).
  • To add an existing file to the project: Right-click the project, New > File > Advanced, check “Link to file in the file system” and pick the file. Then right-click the file (or several files) and pick “Add to Nios II build”
  • To remove a file, first right-click it, and pick “Remove from Nios II build”. Then right-click and delete. Failing to remove the file first will make the build system continue to look for it.
  • Creating a new application, based upon an existing BSP, and including the relevant source file sets it all up.
  • To compile manually, right-click the project, go to Nios > Nios command shell… (that opens a shell window) and type “make”
  • It’s also possible to copy the relevant elements in the PATH variable, and compile with “make” outside this shell window. Or set up the environment, as shown here.
  • I had a stubborn linking error with alt_main.c having an undefined reference to ‘main’ because I didn’t read my own note above about how to add a file to a project. It turned out that the Makefile doesn’t include any of the C source files (C_SRCS assigned to nothing in the Makefile). I ended up adding these entries manually. That allowed at least a manual build with the command shell, as mentioned in the bullet above.
  • The Eclipse project seems to consists of the Makefile, the .cproject XML file containing mostly useless mumbo-jumbo, and the .project XML file, which contains information about source files and build targets. There’s also .settings/language.settings.xml, which also seems not to contain anything relevant.
  • When creating a custom component, and an interrupt is required, be sure to associate the interrupt sender interface with an “Associated Addressable Interface” (e.g. associatedAddressablePoint set to avalon_slave_0 in the component’s tcl file). Otherwise, the interrupt will no be assigned an entry nor controller, so *_IRQ and *_IRQ_INTERRUPT_CONTROLLER_ID end up assigned with -1 in system.h.
  • For a shell prompt (“NIOS2 shell”) with all paths set up properly, go e.g.
    /bulk/software/altera/lite-15.1/nios2eds/nios2_command_shell.sh

Running against hardware

Note: Quartus’ programmer and the “Run” environment on Eclipse are mutually exclusive, competing for the USB bitblaster.

  • Make sure you’ve quit Quartus’ programmer (actually not necessary. Just be sure that the blue LED on the USB Blaster is off).
  • Also make sure to “terminate launch” on the Eclipse side before attempting to reprogram the FPGA (pressing the red stop-like button on the Nios Console is enough.
  • Pick the “hello” project (that is, not the BSP) and go to top menu: Run > Run configurations…, pick Target Connection tab. Both a processor and a byte stream device should be enlisted (the latter is the jtaguart). Refresh to make sure it’s actually there.
  • If it says “Connected system ID hash not found on target at the expected base address” at the top, select “Ignore mismatched system ID” and “Ignore mismatched system timestamp”. This happens when there’s no system ID peripheral in the Qsys design.
  • The “Hello world from NIOS!!” should appear in the Nios II console
  • The base addresses etc. are listed in system.h inside the BSP (hello_bsp in my case).
  • This program printed out “Hello world” as well as blinked the LEDs:
    #include <stdio.h>
    #include <unistd.h>
    #include <io.h>
    #include <system.h>
    #include <altera_avalon_pio_regs.h>
    
    int main()
    {
      int i;
    
      printf("Hello from Nios II!\n");
    
      while (1) {
        IOWR_ALTERA_AVALON_PIO_DATA(PIO_BASE, ((-i) & 7));
        i++;
        usleep(100000);
      }
      return 0;
    }
  • To generate a hex file, right-click the project (“hello”) and pick Make Targets > Build…, chooise mem_init_generate and click the Build button. The juicy part in the process was
    elf2hex hello.elf 0x00010000 0x00019fff --width=32 --little-endian-mem --create-lanes=0 mem_init/raminit.hex
  • Alternatively, go (skip to the “make” statement if already in a NIOS shell)
    /path/to/altera/15.1/nios2eds/nios2_command_shell.sh make mem_init_generate
  • It noteworthy that the tools spotted my choice of the file name, even though it’s not located where Quartus expects it.
  • Giving the hex file to Quartus resulted in a lot of lines saying
    Warning (113015): Width of data items in "raminit.hex" is greater than the memory width. Wrapping data items to subsequent addresses. Found 1280 warnings, reporting 10
        Warning (113009): Data at line (2) of memory initialization file "raminit.hex" is too wide to fit in one memory word. Wrapping data to subsequent addresses.
        Warning (113009): Data at line (3) of memory initialization file "raminit.hex" is too wide to fit in one memory word. Wrapping data to subsequent addresses.
        Warning (113009): Data at line (4) of memory initialization file "raminit.hex" is too wide to fit in one memory word. Wrapping data to subsequent addresses.

    etc.
    But this is most probably OK, as the processor worked immediately after FPGA configuration.

  • Redirect printf() and other stdout to UART: By default, the standard output goes to the JTAG UART. To change this, right-click the BSP project, pick Nios II > BSP Editor. Pick the “Main” tab, navigate to “hal > common” (it usually starts there anyhow) and change the stdout target to the desired UART. And regenerate the BSP.

Remote Update from ECPQ flash on Altera Cyclone IV

Introduction

This post relates to Altera (or should I say Intel FPGA?) Cyclone IV FPGAs loaded from an ECPQ flash in Active Serial x 1 (AS x 1) mode. Things written below are probably relevant to other Altera FPGAs as well, but keep in mind that Cyclone IV FPGAs have several peculiarities you won’t find on other Altera device families.

“Remote Update” is the feature in some Altera FPGAs, which allows application logic / software to safely update the bitstream from which the FPGA is loaded. The trick is to always have a Factory (“Golden”) bitstream image on the flash, and update the “Application” image only. When powers goes up, the FPGA ends up with the Application bitstream if it’s OK, or the Factory bitstream if it’s absent or corrupt.

Since the bitstreams carry a CRC, it’s guaranteed that only valid bitstreams are used. It’s therefore safe to overwrite a previous Application bitstream image: If something goes wrong in the middle of writing, it won’t be deemed a valid bitstream, so the FPGA will end up with the Factory bitstream.

The basics

To implement a remote update feature on an FPGA design, there are two functional elements needed:

  • The ability to write data into the configuration flash with user-designed logic / software. This is discussed in this post.
  • The logic / software that makes sure the FPGA ends up with the right configuration (and, in particular, prevents an endless configuration loop as explained next)

Note that the Remote Update IP Core has nothing to do with flash programming: Its function is merely to allow the FPGA’s logic to issue a reconfiguration, and offer some information on how and why the current bitstream was loaded.

When an FPGA powers up, it always configures from a constant address of the flash, which is zero on ECPQ flashes. In other words, the FPGA always powers up from the Factory bitstream, no matter what. It’s the user application logic / software’s duty to force the configuration of the Application bitstream when adequate. This means that during normal operation, there are always two configurations of the FPGA at powerup, one for the Factory bitstream, and one for the Application. This doubles the configuration time, of course.

How it happens: The FPGA is powered up, and loads the Factory bitstream from a fixed address. Through the Remote Update IP Core, the logic / software in the FPGA sets the address of the Application image at the flash, from which the FPGA should configure itself. It then can triggers a reconfiguration of the FPGA.

The FPGA’s configuration state machine attempts to load a bitstream from the flash at the given address. If the bitstream’s magic words are in place and the CRC is OK, it starts running on the new bitstream. If not, it loads the Factory bitstream again as a fallback.

By virtue of a register of the Remote Update IP Core, the software / logic in the Factory bitstream detects that it was loaded due to a failure, and takes action (or no action) accordingly. It may try another address at the flash, or refrain from another reconfiguration altogether (i.e. stay with the “Golden Image”). The decision depends on the design requirements, but the crucial point here is to prevent an endless loop of configurations.

Some reading

This post is not a user guide or a substitute for these two must-read documents:

Spoiler

This Nios II code implements the loading of the Application bitstream. It written so it can be used on any bitstream, as is does nothing when run from an Application bitstream. It’s also safe for use with JTAG configuration (it won’t issue a reconfiguration in that case).

void do_remote_update(void) {
  alt_u32 app_bitstream_addr = 0x100000;

  alt_u32 mode = IORD_32DIRECT(REMOTE_UPDATE_0_BASE, 0) & 3;
  alt_u32 config_reason = IORD_32DIRECT(REMOTE_UPDATE_0_BASE, 0x64);

  if ((mode == 0) && (config_reason == 0)) {
    IOWR_32DIRECT(REMOTE_UPDATE_0_BASE, 0x30, 0); // Turn off watchdog
    IOWR_32DIRECT(REMOTE_UPDATE_0_BASE, 0x40, app_bitstream_addr);

    IOWR_32DIRECT(REMOTE_UPDATE_0_BASE, 0x74, 1); // Trigger reconfiguration

    while (1); // Wait briefly until configuration takes place
  }
}

do_remote_update() should be called first thing in the Nios II code entry. If the function returns, the FPGA is either running on the Application bitstream, or the Factory (“Golden”) bitstream with a good reason not to reconfigure (i.e. a previous failure to load the Application bitstream or after a JTAG bitstream load).

Please refer to the “Programming the flash with NIOS software” section in this post on how to generate the image of the Application bitstream.

The cod above works with the following setting:

  • The FPGA’s NCONFIG pin is tied high. This will not work if the NCONFIG pin is driven by some power supply watch logic or alike, because config_reason won’t be zero if NCONFIG triggered the configuration.
  • REMOTE_UPDATE_0_BASE is the base address in NIOS’ address space of a Remote Update IP core, which has the “writing configuration parameters” option enabled.
  • The application bitstream image is loaded at flash address 0x100000 (i.e. can be read with epcs_read_buffer() using this address)
  • The Golden image is at address zero, of course.

If loading the application image fails once, no other attempts are made. This is the straightforward thing to do if there’s no additional image to try from. There’s no sensible reason to try the same image again, unless the PCB designer has done a really bad job.

How this function works, briefly:

  • It verifies that the configuration mode is 0, that is Factory mode. If we’re in Application mode, the function returns.
  • It verifies that the trigger for configuration was a powerup by checking config_reason, or it returns. This prevents an endless loop of configurations in the case of a fallback into the Factory bitstream in the event of a failed attempt to load the Application bitstream.
    Note that if the configuration was triggered as a result of an assertion of the FPGA’s NCONFIG pin, or on a JTAG configuration, config_reason will read 0x10.
  • The watchdog is disabled, so the Application bitstream doesn’t have to deal with it
  • The Application bitstream’s address is set
  • A configuration is forced by writing to the dedicated register
  • An endless while (1) loop is invoked for preventing the execution to go on — not that it would go anywhere far.

General notes

  • It’s important to observe that the terminology of Factory / Application configuration modes, which is used in the docs, isn’t just for the sake of clarity: The Remote Update IP Core exposes different registers, based upon whether it considers itself to be in either of the modes: In particular, when in Application mode, there is very little the logic can do, except for jumping back to Factory mode or to reset the watchdog.
  • When generating the Remote Update block (most likely in Qsys), be sure to check “Add support for writing configuration parameters”. Or you’ll keep wondering why writing to the NIOS registers has no effect at all.
  • Also be sure to set configuration mode to remote for the FPGA project. There should be a line as follows in the QSF file:
    set_global_assignment -name STRATIXIII_UPDATE_MODE REMOTE
  • When setting the boot address register, use the actual boot address with the two LSBs forced to zero. When it’s read back after a configuration as the previous boot address, it’s shifted two bits to the left. The docs are a bit confusing about this too. Go figure.
  • The watchdog is enabled by default, so unless it’s tended to in the application bitstream, it must be explicitly turned off before firing off reconfiguration.
  • The watchdog timer runs on the internal configuration clock, which is 10 MHz unless the external CLKUSR is applied..

Accessing registers

Put short, the registers map is a mess. Out of the long list given in Tables 20 and 21 in the Remote Update IP Core User Guide, only a handful have a meaning.

It’s important to realize that some registers are valid when the Remote Update IP core is in Factory mode, and others when it’s in Application mode. These two register sets are mutually exclusive (except for the CURRENT_STATE_MODE register). The test program shown further down this post demonstrates which registers are valid in each mode.

This is a list of things to keep in mind regarding these registers:

  • Reading from a Factory mode register in Application mode (and vice versa) returns meaningless (and rather confusing) data.
  • The way to make sense of the registers from the docs is to refer to tables 16 and 17 in the Remote Update IP Core User Guide to tell what you want to access in terms of which param and which read_source, and then find the address for them in table 21. Several registers in table 21 constitute combinations of param and read_source that aren’t listed in table 17, which probably renders them meaningless.
  • … except for RU_RESET_TIMER and RU_RECONFIG, which are interpreted in logic to generate a reset signal / reconfiguration signal respectively, and and therefore not listed in table 17.
  • Too add more confusion, readbacks don’t work as one might expect. For example, the boot address for the next configuration is set at address offset 0x40, but reading back from the same address always yields the factory boot address. To get the boot address for the next configuration (i.e. the one written to 0x40), read it back at 0x4c.
  • More confusion: The translation from the param numbers to the Nios access register isn’t some arithmetic operation, but rather some lookup logic of the avl_controller_cycloneiii_iv module in Qsys_remote_update_0_remote_update_controller.sv, which is generated automatically by Qsys.
  • The registers listed in the BSP’s drivers/inc/altera_remote_update_regs.h are those of all Altera FPGAs except Cyclone IV. For example, the docs as well as the Qsys Verilog file () place RU_WATCHDOG_TIMEOUT at address 0x08 (actually, addresses 0x08-0x0b) but the BSP’s altera_remote_update_regs.h
  • Note that compared with other FPGA families, Cyclone IV’s register interface is considerably more extensive, allowing the controller to query the status of two configuration cycles back in history. Seems like this feature was dropped on later FPGAs (due to lack of interest vs complication…?)

The RU_RECONFIG_TRIGGER_CONDITIONS register

This register is interesting in particular, as it tells us why that caused the FPGA to configure the bitstream that is currently running:

IORD_32DIRECT(REMOTE_UPDATE_0_BASE, 0x64); // Register 0x19 in the guide

And to obtain the reason for the configuration before that:

IORD_32DIRECT(REMOTE_UPDATE_0_BASE, 0x68); // Register 0x1a in the guide

These read the remote config core’s param 3′b111 with read source 2′b01 and 2′b10 respectively. Note that the translation from the param number of 3′b111 to the Nios access register isn’t just a multiplication, but rather some lookup logic as mentioned (but not detailed) above.

Running some tests of my own (with the test program below), I got the following values. There’s nothing surprising about these results; they are exactly as documented.

  • On cold configuration: 0
  • When not disabling the watchdog (not handling it after configuration): 2 (bit 1 set, User watchdog timer timeout)
  • After failed application configuration due to lack of image: 4 (bit 2 set, nSTATUS asserted by an external device as the result of an error)
  • After failed application configuration due to damaged image: 8 (bit 3 set, CRC error during application configuration)
  • On configuration from JTAG; 0x10 (bit 4 set, External configuration reset (nCONFIG) assertion)

A test program

On my way to understanding how the whole thing works, I wrote a small test program that ran on the Nios II processor, which dumps all registers that are relevant for each mode. As a bonus, it can be used as a register reference, as it lists all registers available for reading Factory vs. Application mode in the respective structures.

#include <system.h>
#include <alt_types.h>
#include <io.h>
#include "sys/alt_stdio.h"
#include <unistd.h>

int main()
{
  int mode;

  struct regitem {
    int read_source;
    int param;
    const char *desc;
  };

  const struct regitem factoryparams[] = {
    { 0, 0x00, "Current Machine State Mode" },
    { 0, 0x10, "Factory Boot Address" },
    { 1, 0x10, "Previous Boot Address" },
    { 1, 0x18, "Previous reconfiguration trigger source" },
    { 2, 0x10, "One before previous Boot Address" },
    { 2, 0x18, "One before previous reconfiguration trigger source" },
    { 3, 0x04, "Early confdone check bits" },
    { 3, 0x08, "Watchdog timeout value" },
    { 3, 0x0c, "Watchdog enable bit" },
    { 3, 0x10, "Boot address" },
    { 3, 0x14, "Force internal oscillator" },
    {}
  };

  const struct regitem applicationparams[] = {
    { 0, 0x00, "Current Machine State Mode" },
    { 1, 0x08, "Watchdog timeout value" },
    { 1, 0x0c, "Watchdog enable bit" },
    { 2, 0x10, "Boot address" },
    {}
  };

  const struct regitem unknownparams[] = { {} };

  const struct {
    const struct regitem *list;
    char *desc;
  } modetab[4] = {
    { factoryparams, "Factory mode" },
    { applicationparams, "Application mode" },
    { applicationparams, "Application mode with watchdog enabled" },
    { unknownparams, "Unknown mode" },
  };

  const struct regitem *item;

  alt_putstr("\r\n----------------   BASE IMAGE   ---------------------\r\n\r\n");

  mode = IORD_32DIRECT(REMOTE_UPDATE_0_BASE, 0) & 3;

  alt_printf("Remote update register dump\r\nMode: %s\r\n",
	     modetab[mode].desc);

  alt_putstr("\r\nParameters:\r\n");

  for (item = modetab[mode].list; item->desc; item++) {
    int addr = (item->param + item->read_source) * 4;
    alt_printf("%s (0x%x) = 0x%x\r\n", item->desc, addr,
	       IORD_32DIRECT(REMOTE_UPDATE_0_BASE, addr));
  }

  if (mode == 0) { // Factory mode only
    usleep(500000);
    IOWR_32DIRECT(REMOTE_UPDATE_0_BASE, 0x30, 0); // Turn off watchdog
    IOWR_32DIRECT(REMOTE_UPDATE_0_BASE, 0x40, 0x100000);
    IOWR_32DIRECT(REMOTE_UPDATE_0_BASE, 0x74, 1);
  }

  /* Event loop never exits. */
  while (1);

  return 0;
}

Tests results

The test program above was compiled and included in the bitstream that was loaded into flash address 0 (Factory image).

The first alt_putstr was then changed to say “Application Image”, and the compiled version of that was included in the bitstream loaded at address 0x100000 of the flash (Application Image).

Standard output was directed to a physical UART (instead of the JTAG UART) for the purpose of this test (Eclipse’s JTAG UART console didn’t like these games with configurations).

And then I powered on:

----------------   BASE IMAGE   ---------------------

Remote update register dump
Mode: Factory mode

Parameters:
Current Machine State Mode (0x0) = 0x0
Factory Boot Address (0x40) = 0x0
Previous Boot Address (0x44) = 0xc
Previous reconfiguration trigger source (0x64) = 0x0
One before previous Boot Address (0x48) = 0xc
One before previous reconfiguration trigger source (0x68) = 0x0
Early confdone check bits (0x1c) = 0x1
Watchdog timeout value (0x2c) = 0x0
Watchdog enable bit (0x3c) = 0x1
Boot address (0x4c) = 0x0
Force internal oscillator (0x5c) = 0x1

----------------   APPLICATION IMAGE   ---------------------

Remote update register dump
Mode: Application mode

Parameters:
Current Machine State Mode (0x0) = 0x1
Watchdog timeout value (0x24) = 0x1ffe0008
Watchdog enable bit (0x34) = 0x0
Boot address (0x48) = 0x400000

Note that if the register writes in the example are done before showing the registers, these following two lines would replace their respective outputs in the Base Image parameter list:

Watchdog enable bit (0x3c) = 0x0
Boot address (0x4c) = 0x100000

The same, with the application image wiped out (zeros):

----------------   BASE IMAGE   ---------------------

Remote update register dump
Mode: Factory mode

Parameters:
Current Machine State Mode (0x0) = 0x0
Factory Boot Address (0x40) = 0x0
Previous Boot Address (0x44) = 0xc
Previous reconfiguration trigger source (0x64) = 0x0
One before previous Boot Address (0x48) = 0xc
One before previous reconfiguration trigger source (0x68) = 0x0
Early confdone check bits (0x1c) = 0x1
Watchdog timeout value (0x2c) = 0x0
Watchdog enable bit (0x3c) = 0x1
Boot address (0x4c) = 0x0
Force internal oscillator (0x5c) = 0x1

----------------   BASE IMAGE   ---------------------

Remote update register dump
Mode: Factory mode

Parameters:
Current Machine State Mode (0x0) = 0x0
Factory Boot Address (0x40) = 0x0
Previous Boot Address (0x44) = 0x400000
Previous reconfiguration trigger source (0x64) = 0x4
One before previous Boot Address (0x48) = 0xc
One before previous reconfiguration trigger source (0x68) = 0x0
Early confdone check bits (0x1c) = 0x1
Watchdog timeout value (0x2c) = 0x0
Watchdog enable bit (0x3c) = 0x1
Boot address (0x4c) = 0x0
Force internal oscillator (0x5c) = 0x1

----------------   BASE IMAGE   ---------------------

Remote update register dump
Mode: Factory mode

Parameters:
Current Machine State Mode (0x0) = 0x0
Factory Boot Address (0x40) = 0x0
Previous Boot Address (0x44) = 0x400000
Previous reconfiguration trigger source (0x64) = 0x4
One before previous Boot Address (0x48) = 0x400000
One before previous reconfiguration trigger source (0x68) = 0x4
Early confdone check bits (0x1c) = 0x1
Watchdog timeout value (0x2c) = 0x0
Watchdog enable bit (0x3c) = 0x1
Boot address (0x4c) = 0x0
Force internal oscillator (0x5c) = 0x1

[ ... etc ... ]

The same, with the Application image loaded in place, but with a small error (changed a single bit):

(this caused a CRC error)

----------------   BASE IMAGE   ---------------------

Remote update register dump
Mode: Factory mode

Parameters:
Current Machine State Mode (0x0) = 0x0
Factory Boot Address (0x40) = 0x0
Previous Boot Address (0x44) = 0xc
Previous reconfiguration trigger source (0x64) = 0x0
One before previous Boot Address (0x48) = 0xc
One before previous reconfiguration trigger source (0x68) = 0x0
Early confdone check bits (0x1c) = 0x1
Watchdog timeout value (0x2c) = 0x0
Watchdog enable bit (0x3c) = 0x1
Boot address (0x4c) = 0x0
Force internal oscillator (0x5c) = 0x1

----------------   BASE IMAGE   ---------------------

Remote update register dump
Mode: Factory mode

Parameters:
Current Machine State Mode (0x0) = 0x0
Factory Boot Address (0x40) = 0x0
Previous Boot Address (0x44) = 0x400000
Previous reconfiguration trigger source (0x64) = 0x8
One before previous Boot Address (0x48) = 0xc
One before previous reconfiguration trigger source (0x68) = 0x0
Early confdone check bits (0x1c) = 0x1
Watchdog timeout value (0x2c) = 0x0
Watchdog enable bit (0x3c) = 0x1
Boot address (0x4c) = 0x0
Force internal oscillator (0x5c) = 0x1

----------------   BASE IMAGE   ---------------------

Remote update register dump
Mode: Factory mode

Parameters:
Current Machine State Mode (0x0) = 0x0
Factory Boot Address (0x40) = 0x0
Previous Boot Address (0x44) = 0x400000
Previous reconfiguration trigger source (0x64) = 0x8
One before previous Boot Address (0x48) = 0x400000
One before previous reconfiguration trigger source (0x68) = 0x8
Early confdone check bits (0x1c) = 0x1
Watchdog timeout value (0x2c) = 0x0
Watchdog enable bit (0x3c) = 0x1
Boot address (0x4c) = 0x0
Force internal oscillator (0x5c) = 0x1

[ ... etc ... ]

Loading with JTAG: I set up both flash images properly, powered up so the FPGA stayed on the Application Image. At that point, I loaded the SOF of the Factory bitstream into the FPGA through JTAG (with a USB Blaster). The JTAG operation yielded this:

----------------   BASE IMAGE   ---------------------

Remote update register dump
Mode: Factory mode

Parameters:
Current Machine State Mode (0x0) = 0x0
Factory Boot Address (0x40) = 0x0
Previous Boot Address (0x44) = 0x400000
Previous reconfiguration trigger source (0x64) = 0x10
One before previous Boot Address (0x48) = 0xc
One before previous reconfiguration trigger source (0x68) = 0x0
Early confdone check bits (0x1c) = 0x1
Watchdog timeout value (0x2c) = 0x0
Watchdog enable bit (0x3c) = 0x1
Boot address (0x4c) = 0x0
Force internal oscillator (0x5c) = 0x1

----------------   APPLICATION IMAGE   ---------------------

Remote update register dump
Mode: Application mode

Parameters:
Current Machine State Mode (0x0) = 0x1
Watchdog timeout value (0x24) = 0x1ffe0008
Watchdog enable bit (0x34) = 0x0
Boot address (0x48) = 0x400000

When loading the same bitstream through JTAG once again the same result is obtained, only with “One before previous reconfiguration trigger source” set to 0x10 as well.

Quartus/Linux: Setting PATH and environment for command-line

The classic way:

$ export QUARTUS_ROOTDIR=/path/to/altera/15.1/quartus
$ . $QUARTUS_ROOTDIR/adm/qenv.sh

Or open a shell (will set path, but not a full environment):

$ /path/to/altera/15.1/nios2eds/nios2_command_shell.sh

This is good for compiling for NIOS etc.

VMplayer: Silencing excessive hard disk activity

For some unknown reason, possibly after an VMplayer upgrade, running any Windows Virtual machine on my Linux machine with WMware Player caused some non-stop heavy hard disk activity, even when the guest machine was effectively idle, and made had no I/O activity of its own.

Except for being surprisingly annoying, it also made the mouse pointer non-responsive and the effect was adverse on the hosting machine as well.

So eventually I managed to get things normal by editing the virtual machine’s  .vmx file as described below.

I have Vmplayer 6.0.2 on Fedora 12 (suppose both are considered quite old).

Following this post, add

isolation.tools.unity.disable = "TRUE"
unity.allowCompositingInGuest = "FALSE"
unity.enableLaunchMenu = "FALSE"
unity.showBadges = "FALSE"
unity.showBorders = "FALSE"
unity.wasCapable = "FALSE"

(unity.wasCapable was already in the file, so remove it first)

That appeared to help somewhat. But what really gave the punch was also adding

MemTrimRate = "0"
sched.mem.pshare.enable = "FALSE"
MemAllowAutoScaleDown = "FALSE"

Don’t ask me what it means. Your guess is as good as mine.

Linux: Where the USB related kernel files are

A few notes on where to find USB related kernel files on a Linux system (kernel 3.12.20 in my case)

$ lsusb
[ ... ]
Bus 001 Device 059: ID 046d:c52b Logitech, Inc.

Now find the position in the tree. It should be device 59 under bus number 1:

$ lsusb -t
[ ... ]
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/6p, 480M
    |__ Port 4: Dev 4, If 0, Class=hub, Driver=hub/4p, 480M
        |__ Port 1: Dev 59, If 0, Class=HID, Driver=usbhid, 12M
        |__ Port 1: Dev 59, If 1, Class=HID, Driver=usbhid, 12M
        |__ Port 1: Dev 59, If 2, Class=HID, Driver=usbhid, 12M
        |__ Port 3: Dev 98, If 0, Class=vend., Driver=pl2303, 12M
    |__ Port 6: Dev 94, If 0, Class=vend., Driver=rt2800usb, 480M

So it’s bus 1, hub on power 4 and then port 1. Verify by checking the IDs:

$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/idVendor
046d
$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/idProduct
c52b

or look at the individual interfaces:

$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/1-4.1\:1.2/bInterfaceClass
03

The device file accessed for raw userspace I/O with a USB device (with e.g libusb) is in /dev/usb/ followed by the bus number and address. For example, the Logitech device mentioned above is at bus 1, address 59, hence

$ ls -l /dev/bus/usb/001/059
crw-rw-r-- 1 root root 189, 58 2017-05-17 09:57 /dev/bus/usb/001/059

Note the permissions and major/minors. The major is 189 (usb_devices on my system, according to /proc/devices). The minor is the ((bus_number-1) * 128) + address – 1.

The permissions and ownership are those in effect for who’s allowed to access this device. This is the place to check if udev rules that allow wider access to a device have done their job.

Altera ECPQ flash access with a Nios II processor + programming bitfiles

Introduction

This post outlines some technical details on accessing an Altera ECPQ flash from a Nios II processor for read, write and erase. A non-OS settings (“bare metal”) setting is assumed.

And as a bonus (at the bottom of this post), how to program the flash based upon a SOF file, both with JTAG and by writing directly.

Remote Update is discussed in this post.

Hardware setup

In the Qsys project, there should be an instance of the Legacy EPCS/EPCQx1 Flash Controller, configured with the default parameters (not that there is much to configure). The peripheral’s epcs_control_port should be connected to the Nios II’s data master Avalon port (no point connecting it to the instruction master too).

In this example, we’ll assume that the name of Flash Controller in Qsys is epcs_flash_controller_0.

The interrupt signal isn’t used in the software setting given below, but as the connection to the Nios processor, as well as the interrupt number assignment is automatic, let it be.

Clock and reset — like the other peripherals.

The external conduit is connected as follows to an ECPQ flash, for a x1 access:

  • Flash pin DATA0 to epcs_flash_controller_0_sdo (FPGA pin ASDO)
  • Flash pin DCLK to epcs_flash_controller_0_dclk (FPGA pin DCLK)
  • Flash pin nCS to epcs_flash_controller_0_sce (FPGA pin NCSO)
  • Flash pin DATA1 to epcs_flash_controller_0_data (FPGA pin DATA0)

The FPGA pins above relate to dual-use of the configuration, which allows the FPGA to configure in Active Serial (AS) x 1 mode. Once the configuration is done, these pins become general-purpose I/O (when so required by assignments), which allows regular access to the flash device.

Note that the flash pin DATA1 is connected to the FPGA pin DATA0 — this is not a mistake, but the correct wiring for AS x 1 interface.

It’s of course possible to connect the flash to regular I/O pins, but then the FPGA won’t be able to configure from the flash.

Software

Altera’s BSP includes drivers for flash operations with multiple layers of abstraction. This abstraction is not always necessary, and makes it somewhat difficult to figure out what’s going on (in particular when things go wrong). In particular, the higher-level drivers erase flash sectors automatically before writing, which can render some counterintuitive behavior, for example if multiple write requests are made on the same sector.

I therefore prefer working with the lowest-level drivers, which merely translate the flash commands into SPI communication. It leaves the user with the responsibility to erase sectors before writing to them.

The rule is simple: The flash is divided into sectors of 64 kB each. An erase operation is performed on such 64 kB sector, leaving all its bytes in all-1′s (all bytes are 0xff).

Writing can then be done to arbitrary addresses, but effectively the data in the flash is the written data ANDed with the previous content of the memory cells. Which means a plain write, if the region has been previously erased. It’s commonly believed that it’s unhealthy for the flash to write to a byte cell twice without an erase in the middle.

This is a simple program that runs on the Nios II processor, which demonstrates read, write and erase.

#include <system.h>
#include <alt_types.h>
#include <io.h>
#include "sys/alt_stdio.h"
#include "epcs_commands.h"

static void hexprint(alt_u8 *buf, int num) {
  int i;

  const char hexes[] = "0123456789abcdef";

  for (i = 0; i < num; i++) {
    alt_putchar(hexes[(buf[i] >> 4) & 0xf]);
    alt_putchar(hexes[buf[i] & 0xf]);
    if ((i & 0xf) == 0xf)
      alt_putchar(10); // "\n"
    else
      alt_putchar(32); // " "
  }
  alt_putchar(10); // "\n"
}

int main()
{
  alt_u32 register_base = EPCS_FLASH_CONTROLLER_0_BASE + EPCS_FLASH_CONTROLLER_0_REGISTER_OFFSET;
  alt_u32 silicon_id;

  alt_u8 buf[256];
  alt_u32 junk = 0x12345678;
  const alt_u32 flash_address = 0x100000;

  silicon_id = epcs_read_device_id(register_base);

  alt_printf("ID = %x\n", silicon_id);

  // epcs_read_buffer always returns the length of the buffer, so no
  // point checking its return value.

  alt_printf("Before doing anything:\n");

  epcs_read_buffer(register_base, flash_address, buf, sizeof(buf), 0);
  hexprint(buf, 16);

  // epcs_sector_erase erases the 64 kiB sector that contains the address
  // given as its second argument, and waits for the erasure to complete
  // by polling the status register and waiting for the WIP (write in progress)
  // bit to clear.

  epcs_sector_erase(register_base, flash_address, 0);

  alt_printf("After erasing\n");

  epcs_read_buffer(register_base, flash_address, buf, sizeof(buf), 0);
  hexprint(buf, 16);

  // epcs_write_buffer must be used on a region previously erased. The
  // command waits for the operation to complete by polling the status
  // register and waiting for the WIP (write in progress) bit to clear.
  epcs_write_buffer(register_base, flash_address, (void *) &junk, sizeof(junk), 0);

  alt_printf("After writing\n");

  epcs_read_buffer(register_base, flash_address, buf, sizeof(buf), 0);
  hexprint(buf, 16);

  /* Event loop never exits. */

  while (1);

  return 0;
}

The program reads 256 bytes each time, even though only 16 bytes are displayed. Any byte count is allowed in read and write. Needless to say, flash_address can be changed to any address in the device’s range. The choice of 0x100000 kept it off the configuration bitstream for the relevant FPGA.

This is the output of the program above running against an EPCQ16:

ID = 20ba15
Before doing anything:
78 56 34 12 ff ff ff ff ff ff ff ff ff ff ff ff

After erasing
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

After writing
78 56 34 12 ff ff ff ff ff ff ff ff ff ff ff ff

The data in the “Before doing anything” part can be anything that was left in the flash when the program ran. In the case above, it’s the results of the previous run of the same program.

As a side note, all EPCQ flashes also support erasing subsectors, each of 4 kiB size (hence 16 subsectors per sectors). Altera’s low-level drivers don’t support subsector erase, but it’s quite easy to expand the code to do so.

Programming the flash with a SOF file

As promised, here’s the outline of how to program the EPCQ flash with a bitstream configuration file. Not as fancy as the topic above, but nevertheless useful. The flash needs to be connected as follows:

  • Flash pin DATA0 to FPGA pin ASDO
  • Flash pin DCLK to FPGA pin DCLK
  • Flash pin nCS to FPGA pin NCSO
  • Flash pin DATA1 to FPGA pin DATA0 (once again, this is not a mistake. DATA1 to DATA0 indeed)

First thing first: Generate a JIC file. Command-line style, e.g.:

quartus_cpf -c -d EPCQ16 -s EP4CE15 projname.sof projname.jic

In the example above, the EPCQ16 argument is the flash device, and the EP4CE15 is the FPGA that will be used to program the flash, which is most likely the same FPGA the SOF targets.

Or do it with GUI:

  • In Quartus, pick File > Convert Programming File…
  • Choose jic output file format, and set the output file name.
  • Set the configuration device to e.g. EPCQ16, Active Serial (not x4).
  • Pick the SOF Data row, Page_0, click Add File… and pick SOF file.
  • Pick the Flash Loader and click Add Device…, and choose e.g. Cyclone IV E, and then the same device as listed for the SOF file.
  • If you want to write to the flash with your own utility, check “Create Config data RPD”
  • Click Generate. A window saying the JIC file has been generated successfully should appear.
  • Click Close to close this tool.

Programming the flash with JTAG:

  • Open the regular JTAG programmer in Quartus (not the one in Eclipse). The one used to configure the FPGA via JTAG with a bitstream, that is.
  • Click Add File… and select the JIC file created above.
  • The FPGA with its flash attached should appear in the diagram part of the window.
  • Select the Program/Configure checkbox on the flash’ (e.g. EPCQ16) row
  • Click Start.
  • This should take some 10 seconds or so (for EP4CE15′s bitfile), and end successfully.
  • The flash is now programmed.

Note that there’s an “Erase” checkbox on the flash’ row — there is no need to enable it along with Program/Configure, and neither is it necessary. The Programmer gets the hint, and erases the flash before programming it.

Programming the flash with NIOS software (or similar)

Note that I have another post focusing on remote update.

To program the flash with your own utility, make sure that you’ve checked “Create Config data RPD” when generating the JIC. Then, using the flash API mentioned above, copy the RPD file into the flash from address 0 to make it load when the FPGA powers up, or to a higher address for using the bitstream with a Remote Update core (allowing configuration from higher addresses).

And note the following, which relates to my experience with using the EPCQ16 flash for AS configuring an Cyclone IV E FPGA, and running Quartus Prime Version 15.1.0 Build 185 (YMMV):

  • Bit reversal is mandatory if epcs_write_buffer() is used for writing to the flash (or any other Nios API, I suppose). That means that for each byte in the RPD file, move bit 7 to bit 0, bit 6 to bit 1 etc. There are small hints of bit reversal spread out in the docs, for example, in the “Read Bytes Operation” section of the Quad-Serial Configuration (EPCQ) Devices Datasheet.
  • All my attempts to generate RBF or RPD files in other ways, including using the command line tool (quartus_cpf) to create an RBF from the SOF or an RPD from a POF failed. That is, I got RBF and RPD files, but they slightly different from the file that eventually worked. In particular, the RBF file obtained with
    quartus_cpf -c project.sof project.rbf

    was almost identical to the RPD file that worked properly, with a few bytes different in the 0x20-0x4f positions of the files. And that difference probably made the FPGA refuse to configure from it. Go figure.

  • If you’re really into generating the flash image with command line tools, generate a COF file (containing the configuration parameters) with the GUI, and use it with something like
    quartus_cpf -c project.cof

    The trick about this COF is that it should generate a proper JIC file, but have the <auto_create_rpd> part set to “1″.

And finally, just a few sources I found (slightly unrelated):

  • Srunner is a command line utility for programming a EPCS flash. Since source code is given, it can give some insights, as well as its documentation.
  • The format of POF files is outlined in fmt_pof.pdf.

gcc: Solving “undefined reference” even when the required library is listed with -l

It worked all so nicely on my Fedora 12 machine, and then on Ubuntu 14.04.1 it failed colossally:

$ make
gcc -Wall  -O3 -g -lusb-1.0 -c  -o bulkread.o bulkread.c
gcc -Wall  -O3 -g -lusb-1.0 -c  -o usberrors.o usberrors.c
gcc -Wall  -O3 -g -lusb-1.0 bulkread.o usberrors.o -o bulkread
bulkread.o: In function `main':
bulkread.c:39: undefined reference to `libusb_init'
bulkread.c:46: undefined reference to `libusb_set_debug'
bulkread.c:48: undefined reference to `libusb_open_device_with_vid_pid'
[ ... ]

And it went on and on. Note that there was no complaint about not finding the library, and yet it failed to find the symbols.

The problem was the position of the -l flag. It turns out that Ubuntu silently adds an –as-needed flag to the linker, which effectively means that the -l flag must appear after the object file that needs the symbols, or it will be effectively ignored.

So the correct way is:

$ make
gcc -Wall  -O3 -g -c  -o bulkread.o bulkread.c
gcc -Wall  -O3 -g -c  -o usberrors.o usberrors.c
gcc -Wall  -O3 -g bulkread.o usberrors.o -o bulkread -lusb-1.0

It’s all about the flag’s position…

XEmacs / VHDL: Stop that annoying “assistance” while typing

Emacs’ (and hence XEmacs’) VHDL mode has an annoying thing about hopping in and “help me” with composing code. Type “if” and it tells me I need to add an expression. Thanks. I wouldn’t have figured it out myself.

So here’s how to disable this annoyance:

Add in~/.xemacs/custom.el, to the custom-set-variables clause

'(vhdl-electric-mode nil)
'(vhdl-stutter-mode nil)

or turn off the respective options inside XEmacs, under VHDL > Options > Mode, and then VHDL > Options > Save Options

And enjoy the bliss of an editor doing what it’s supposed to do.

Quartus’ timing analysis on set_input_delay and set_output_delay constraints

OK, what’s this?

This page is the example part of another post, which explains the meaning of set_input_delay and set_output_delay in SDC timing constraints.

TimeQuest (Quartus’ timing analyzer) performs a four-corner check (max/min temperature, max/min voltage) and picks the worst slack. In the examples below, the worst case of these four corners is shown. It’s not exactly clear why a certain delay model becomes the worst case all the times.

As mentioned on the other post, the relevant timing constraints were:

create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]

set_input_delay -max timing analysis (setup)

Delay Model:
    Slow 1100mV 0C Model

+------------------------------------------------------------------------------------------------------+
; Summary of Paths                                                                                     ;
+--------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; Slack  ; From Node ; To Node   ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+--------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; 12.341 ; test_in   ; test_samp ; theclk       ; theclk      ; 20.000       ; 3.940      ; 7.499      ;
+--------+-----------+-----------+--------------+-------------+--------------+------------+------------+

Path #1: Setup slack is 12.341
===============================================================================
+--------------------------------+
; Path Summary                   ;
+--------------------+-----------+
; Property           ; Value     ;
+--------------------+-----------+
; From Node          ; test_in   ;
; To Node            ; test_samp ;
; Launch Clock       ; theclk    ;
; Latch Clock        ; theclk    ;
; Data Arrival Time  ; 11.499    ;
; Data Required Time ; 23.840    ;
; Slack              ; 12.341    ;
+--------------------+-----------+

+---------------------------------------------------------------------------------------+
; Statistics                                                                            ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Property                  ; Value  ; Count ; Total Delay ; % of Total ; Min   ; Max   ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Setup Relationship        ; 20.000 ;       ;             ;            ;       ;       ;
; Clock Skew                ; 3.940  ;       ;             ;            ;       ;       ;
; Data Delay                ; 7.499  ;       ;             ;            ;       ;       ;
; Number of Logic Levels    ;        ; 1     ;             ;            ;       ;       ;
; Physical Delays           ;        ;       ;             ;            ;       ;       ;
;  Arrival Path             ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 0.000       ;            ; 0.000 ; 0.000 ;
;   Data                    ;        ;       ;             ;            ;       ;       ;
;    IC                     ;        ; 2     ; 2.447       ; 33         ; 0.000 ; 2.447 ;
;    Cell                   ;        ; 2     ; 5.052       ; 67         ; 0.652 ; 4.400 ;
;  Required Path            ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 3.940       ; 100        ; 3.940 ; 3.940 ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages

+-----------------------------------------------------------------------------------+
; Data Arrival Path                                                                 ;
+----------+---------+----+------+--------+-------------------+---------------------+
; Total    ; Incr    ; RF ; Type ; Fanout ; Location          ; Element             ;
+----------+---------+----+------+--------+-------------------+---------------------+
; 0.000    ; 0.000   ;    ;      ;        ;                   ; launch edge time    ;
; 0.000    ; 0.000   ;    ;      ;        ;                   ; clock path          ;
;   0.000  ;   0.000 ; R  ;      ;        ;                   ; clock network delay ;
; 4.000    ; 4.000   ; F  ; iExt ; 1      ; PIN_AP17          ; test_in             ;
; 11.499   ; 7.499   ;    ;      ;        ;                   ; data path           ;
;   4.000  ;   0.000 ; FF ; IC   ; 1      ; IOIBUF_X48_Y0_N58 ; test_in~input|i     ;
;   8.400  ;   4.400 ; FF ; CELL ; 1      ; IOIBUF_X48_Y0_N58 ; test_in~input|o     ;
;   10.847 ;   2.447 ; FF ; IC   ; 1      ; FF_X48_Y2_N40     ; test_samp|asdata    ;
;   11.499 ;   0.652 ; FF ; CELL ; 1      ; FF_X48_Y2_N40     ; test_samp           ;
+----------+---------+----+------+--------+-------------------+---------------------+

+-------------------------------------------------------------------------------+
; Data Required Path                                                            ;
+----------+---------+----+------+--------+---------------+---------------------+
; Total    ; Incr    ; RF ; Type ; Fanout ; Location      ; Element             ;
+----------+---------+----+------+--------+---------------+---------------------+
; 20.000   ; 20.000  ;    ;      ;        ;               ; latch edge time     ;
; 23.940   ; 3.940   ;    ;      ;        ;               ; clock path          ;
;   23.940 ;   3.940 ; R  ;      ;        ;               ; clock network delay ;
; 23.840   ; -0.100  ;    ;      ;        ;               ; clock uncertainty   ;
; 23.840   ; 0.000   ;    ; uTsu ; 1      ; FF_X48_Y2_N40 ; test_samp           ;
+----------+---------+----+------+--------+---------------+---------------------+

This analysis starts in “Data Arrival Path” with setting the input port (test_in) at 4 ns as specified in the max input delay constraint, and continues that data path. Together with the FPGA’s own data path delay (7.499 ns), the total data path delay stands at 11.499 ns.

The clock path is the calculated in “Data Required Path”, starting from the following clock at 20 ns. The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 23.840 ns, which is 12.341 ns after the data arrived to the flip-flop, which is this constraint’s slack.

It’s simple to see from this analysis that the max input delay is the clock-to-output ( + board delay), as it’s the starting time of the data path.

set_input_delay -min timing analysis (hold)

Delay Model:
    Slow 1100mV 85C Model

+-----------------------------------------------------------------------------------------------------+
; Summary of Paths                                                                                    ;
+-------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node ; To Node   ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+-------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; 0.770 ; test_in   ; test_samp ; theclk       ; theclk      ; 0.000        ; 4.287      ; 3.057      ;
+-------+-----------+-----------+--------------+-------------+--------------+------------+------------+

Path #1: Hold slack is 0.770
===============================================================================
+--------------------------------+
; Path Summary                   ;
+--------------------+-----------+
; Property           ; Value     ;
+--------------------+-----------+
; From Node          ; test_in   ;
; To Node            ; test_samp ;
; Launch Clock       ; theclk    ;
; Latch Clock        ; theclk    ;
; Data Arrival Time  ; 5.057     ;
; Data Required Time ; 4.287     ;
; Slack              ; 0.770     ;
+--------------------+-----------+

+--------------------------------------------------------------------------------------+
; Statistics                                                                           ;
+---------------------------+-------+-------+-------------+------------+-------+-------+
; Property                  ; Value ; Count ; Total Delay ; % of Total ; Min   ; Max   ;
+---------------------------+-------+-------+-------------+------------+-------+-------+
; Hold Relationship         ; 0.000 ;       ;             ;            ;       ;       ;
; Clock Skew                ; 4.287 ;       ;             ;            ;       ;       ;
; Data Delay                ; 3.057 ;       ;             ;            ;       ;       ;
; Number of Logic Levels    ;       ; 1     ;             ;            ;       ;       ;
; Physical Delays           ;       ;       ;             ;            ;       ;       ;
;  Arrival Path             ;       ;       ;             ;            ;       ;       ;
;   Clock                   ;       ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;       ; 1     ; 0.000       ;            ; 0.000 ; 0.000 ;
;   Data                    ;       ;       ;             ;            ;       ;       ;
;    IC                     ;       ; 2     ; 2.028       ; 66         ; 0.000 ; 2.028 ;
;    Cell                   ;       ; 2     ; 1.029       ; 34         ; 0.290 ; 0.739 ;
;  Required Path            ;       ;       ;             ;            ;       ;       ;
;   Clock                   ;       ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;       ; 1     ; 4.287       ; 100        ; 4.287 ; 4.287 ;
+---------------------------+-------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages

+----------------------------------------------------------------------------------+
; Data Arrival Path                                                                ;
+---------+---------+----+------+--------+-------------------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location          ; Element             ;
+---------+---------+----+------+--------+-------------------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                   ; launch edge time    ;
; 0.000   ; 0.000   ;    ;      ;        ;                   ; clock path          ;
;   0.000 ;   0.000 ; R  ;      ;        ;                   ; clock network delay ;
; 2.000   ; 2.000   ; R  ; iExt ; 1      ; PIN_AP17          ; test_in             ;
; 5.057   ; 3.057   ;    ;      ;        ;                   ; data path           ;
;   2.000 ;   0.000 ; RR ; IC   ; 1      ; IOIBUF_X48_Y0_N58 ; test_in~input|i     ;
;   2.739 ;   0.739 ; RR ; CELL ; 1      ; IOIBUF_X48_Y0_N58 ; test_in~input|o     ;
;   4.767 ;   2.028 ; RR ; IC   ; 1      ; FF_X48_Y2_N40     ; test_samp|asdata    ;
;   5.057 ;   0.290 ; RR ; CELL ; 1      ; FF_X48_Y2_N40     ; test_samp           ;
+---------+---------+----+------+--------+-------------------+---------------------+

+------------------------------------------------------------------------------+
; Data Required Path                                                           ;
+---------+---------+----+------+--------+---------------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location      ; Element             ;
+---------+---------+----+------+--------+---------------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;               ; latch edge time     ;
; 4.287   ; 4.287   ;    ;      ;        ;               ; clock path          ;
;   4.287 ;   4.287 ; R  ;      ;        ;               ; clock network delay ;
; 4.287   ; 0.000   ;    ;      ;        ;               ; clock uncertainty   ;
; 4.287   ; 0.000   ;    ; uTh  ; 1      ; FF_X48_Y2_N40 ; test_samp           ;
+---------+---------+----+------+--------+---------------+---------------------+

This analysis starts in “Data Arrival Path” with setting the input port (test_in) at 2 ns as specified in the min input delay constraint, and continues that data path. Together with the FPGA’s own data path delay (3.057 ns), the total data path delay stands at 5.057 ns.

The clock path is the calculated in “Data Required Path”, starting from the same clock edge at 0 ns. After all, this is a hold calculation, so the question is whether the mat wasn’t swept under the feet of the sampling flip-flop before it managed to sample it.

The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 4.287 ns, which is 0.770 ns earlier than the data switching, which is also the slack.

It’s simple to see from this analysis that the min input delay is the minimal clock-to-output, as it’s the starting time of the data path.

set_output_delay -max timing analysis (setup)

Delay Model:
    Slow 1100mV 85C Model

+--------------------------------------------------------------------------------------------------------+
; Summary of Paths                                                                                       ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node     ; To Node  ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; 2.651 ; test_out~reg0 ; test_out ; theclk       ; theclk      ; 20.000       ; -5.320     ; 3.929      ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+

Path #1: Setup slack is 2.651
===============================================================================
+------------------------------------+
; Path Summary                       ;
+--------------------+---------------+
; Property           ; Value         ;
+--------------------+---------------+
; From Node          ; test_out~reg0 ;
; To Node            ; test_out      ;
; Launch Clock       ; theclk        ;
; Latch Clock        ; theclk        ;
; Data Arrival Time  ; 9.249         ;
; Data Required Time ; 11.900        ;
; Slack              ; 2.651         ;
+--------------------+---------------+

+---------------------------------------------------------------------------------------+
; Statistics                                                                            ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Property                  ; Value  ; Count ; Total Delay ; % of Total ; Min   ; Max   ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Setup Relationship        ; 20.000 ;       ;             ;            ;       ;       ;
; Clock Skew                ; -5.320 ;       ;             ;            ;       ;       ;
; Data Delay                ; 3.929  ;       ;             ;            ;       ;       ;
; Number of Logic Levels    ;        ; 0     ;             ;            ;       ;       ;
; Physical Delays           ;        ;       ;             ;            ;       ;       ;
;  Arrival Path             ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 5.320       ; 100        ; 5.320 ; 5.320 ;
;   Data                    ;        ;       ;             ;            ;       ;       ;
;    IC                     ;        ; 1     ; 0.000       ; 0          ; 0.000 ; 0.000 ;
;    Cell                   ;        ; 3     ; 3.929       ; 100        ; 0.000 ; 2.150 ;
;    uTco                   ;        ; 1     ; 0.000       ; 0          ; 0.000 ; 0.000 ;
;  Required Path            ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 0.000       ;            ; 0.000 ; 0.000 ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages

+---------------------------------------------------------------------------------------+
; Data Arrival Path                                                                     ;
+---------+---------+----+------+--------+------------------------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location               ; Element             ;
+---------+---------+----+------+--------+------------------------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                        ; launch edge time    ;
; 5.320   ; 5.320   ;    ;      ;        ;                        ; clock path          ;
;   5.320 ;   5.320 ; R  ;      ;        ;                        ; clock network delay ;
; 9.249   ; 3.929   ;    ;      ;        ;                        ; data path           ;
;   5.320 ;   0.000 ;    ; uTco ; 1      ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0       ;
;   7.099 ;   1.779 ; FF ; CELL ; 1      ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0|q     ;
;   7.099 ;   0.000 ; FF ; IC   ; 1      ; IOOBUF_X48_Y0_N42      ; test_out~output|i   ;
;   9.249 ;   2.150 ; FF ; CELL ; 1      ; IOOBUF_X48_Y0_N42      ; test_out~output|o   ;
;   9.249 ;   0.000 ; FF ; CELL ; 0      ; PIN_AN17               ; test_out            ;
+---------+---------+----+------+--------+------------------------+---------------------+

+--------------------------------------------------------------------------+
; Data Required Path                                                       ;
+----------+---------+----+------+--------+----------+---------------------+
; Total    ; Incr    ; RF ; Type ; Fanout ; Location ; Element             ;
+----------+---------+----+------+--------+----------+---------------------+
; 20.000   ; 20.000  ;    ;      ;        ;          ; latch edge time     ;
; 20.000   ; 0.000   ;    ;      ;        ;          ; clock path          ;
;   20.000 ;   0.000 ; R  ;      ;        ;          ; clock network delay ;
; 19.900   ; -0.100  ;    ;      ;        ;          ; clock uncertainty   ;
; 11.900   ; -8.000  ; F  ; oExt ; 0      ; PIN_AN17 ; test_out            ;
+----------+---------+----+------+--------+----------+---------------------+

Since the purpose of this analysis is to measure the output delay, it starts off in “Data Arrival Path” with the clock edge, adds the clock network delay to the flip-flop, and then goes along the data path until the physical output is stable, calculated at 9.249 ns.

This is compared with the time of the following clock at 20 ns, minus the output delay. Minus the possible jitter (0.1 ns in the case above). Data arrived at 9.249 ns, the moment that counts is at 11.9 ns, so there’s a 2.651 ns slack.

This demonstrates why set_output_delay -max is the setup time of the receiver: The output delay is reduced from the following clock’s time position, and that’s the goal to meet. That’s exactly the definition of setup time: How long before the following clock the data must be stable.

set_output_delay -min timing analysis (hold)

Delay Model:
    Fast 1100mV 0C Model

+--------------------------------------------------------------------------------------------------------+
; Summary of Paths                                                                                       ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node     ; To Node  ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; 1.275 ; test_out~reg0 ; test_out ; theclk       ; theclk      ; 0.000        ; -2.255     ; 2.020      ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+

Path #1: Hold slack is 1.275
===============================================================================
+------------------------------------+
; Path Summary                       ;
+--------------------+---------------+
; Property           ; Value         ;
+--------------------+---------------+
; From Node          ; test_out~reg0 ;
; To Node            ; test_out      ;
; Launch Clock       ; theclk        ;
; Latch Clock        ; theclk        ;
; Data Arrival Time  ; 4.275         ;
; Data Required Time ; 3.000         ;
; Slack              ; 1.275         ;
+--------------------+---------------+

+---------------------------------------------------------------------------------------+
; Statistics                                                                            ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Property                  ; Value  ; Count ; Total Delay ; % of Total ; Min   ; Max   ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Hold Relationship         ; 0.000  ;       ;             ;            ;       ;       ;
; Clock Skew                ; -2.255 ;       ;             ;            ;       ;       ;
; Data Delay                ; 2.020  ;       ;             ;            ;       ;       ;
; Number of Logic Levels    ;        ; 0     ;             ;            ;       ;       ;
; Physical Delays           ;        ;       ;             ;            ;       ;       ;
;  Arrival Path             ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 2.255       ; 100        ; 2.255 ; 2.255 ;
;   Data                    ;        ;       ;             ;            ;       ;       ;
;    IC                     ;        ; 1     ; 0.000       ; 0          ; 0.000 ; 0.000 ;
;    Cell                   ;        ; 3     ; 2.020       ; 100        ; 0.000 ; 1.296 ;
;    uTco                   ;        ; 1     ; 0.000       ; 0          ; 0.000 ; 0.000 ;
;  Required Path            ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 0.000       ;            ; 0.000 ; 0.000 ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages

+---------------------------------------------------------------------------------------+
; Data Arrival Path                                                                     ;
+---------+---------+----+------+--------+------------------------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location               ; Element             ;
+---------+---------+----+------+--------+------------------------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                        ; launch edge time    ;
; 2.255   ; 2.255   ;    ;      ;        ;                        ; clock path          ;
;   2.255 ;   2.255 ; R  ;      ;        ;                        ; clock network delay ;
; 4.275   ; 2.020   ;    ;      ;        ;                        ; data path           ;
;   2.255 ;   0.000 ;    ; uTco ; 1      ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0       ;
;   2.979 ;   0.724 ; RR ; CELL ; 1      ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0|q     ;
;   2.979 ;   0.000 ; RR ; IC   ; 1      ; IOOBUF_X48_Y0_N42      ; test_out~output|i   ;
;   4.275 ;   1.296 ; RR ; CELL ; 1      ; IOOBUF_X48_Y0_N42      ; test_out~output|o   ;
;   4.275 ;   0.000 ; RR ; CELL ; 0      ; PIN_AN17               ; test_out            ;
+---------+---------+----+------+--------+------------------------+---------------------+

+-------------------------------------------------------------------------+
; Data Required Path                                                      ;
+---------+---------+----+------+--------+----------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location ; Element             ;
+---------+---------+----+------+--------+----------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;          ; latch edge time     ;
; 0.000   ; 0.000   ;    ;      ;        ;          ; clock path          ;
;   0.000 ;   0.000 ; R  ;      ;        ;          ; clock network delay ;
; 0.000   ; 0.000   ;    ;      ;        ;          ; clock uncertainty   ;
; 3.000   ; 3.000   ; R  ; oExt ; 0      ; PIN_AN17 ; test_out            ;
+---------+---------+----+------+--------+----------+---------------------+

This analysis is similar to the max output delay, only it’s calculated against the same clock edge (and not the following one).

As before, the data path continues the clock path until the physical output is stable, calculated at 4.275 ns.

This is compared with the time of the same clock at 0 ns, minus the output delay. Recall that the min output delay was negative (-3 ns), which is why it appears as a positive number in the calculation.

Conclusion: Data was stable until 4.275 ns, and needs to be stable until 3 ns. That’s fine, with a 1.275 ns slack.

This demonstrates why set_output_delay -min is minus the hold time of the receiver: The given output delay with reversed sign is used as the time which the data path delay must exceed. In other words, the data must be stable for that long after the clock. This is the definition of hold time.

Vivado’s timing analysis on set_input_delay and set_output_delay constraints

OK, what’s this?

This page is the example part of another post, which explains the meaning of set_input_delay and set_output_delay in SDC timing constraints.

As mentioned on the other post, the relevant timing constraints were:

create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]

set_input_delay -max timing analysis (setup)

Slack (MET) :             15.664ns  (required time - arrival time)
  Source:                 test_in
                            (input port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_samp_reg/D
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Setup (Max at Fast Process Corner)
  Requirement:            20.000ns  (theclk rise@20.000ns - theclk rise@0.000ns)
  Data Path Delay:        2.465ns  (logic 0.291ns (11.797%)  route 2.175ns (88.203%))
  Logic Levels:           1  (IBUF=1)
  Input Delay:            4.000ns
  Clock Path Skew:        2.162ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    2.162ns = ( 22.162 - 20.000 )
    Source Clock Delay      (SCD):    0.000ns
    Clock Pessimism Removal (CPR):    0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
                         input delay                  4.000     4.000
    AE20                                              0.000     4.000 r  test_in (IN)
                         net (fo=0)                   0.000     4.000    test_in
    AE20                 IBUF (Prop_ibuf_I_O)         0.291     4.291 r  test_in_IBUF_inst/O
                         net (fo=1, routed)           2.175     6.465    test_in_IBUF
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/D
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)    20.000    20.000 r
    AE23                                              0.000    20.000 r  test_clk (IN)
                         net (fo=0)                   0.000    20.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.077    20.077 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           1.278    21.355    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.026    21.381 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           0.781    22.162    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/C
                         clock pessimism              0.000    22.162
                         clock uncertainty           -0.035    22.126
    SLICE_X0Y1           FDRE (Setup_fdre_C_D)        0.003    22.129    test_samp_reg
  -------------------------------------------------------------------
                         required time                         22.129
                         arrival time                          -6.465
  -------------------------------------------------------------------
                         slack                                 15.664

This analysis starts at time zero, adds the 4 ns (clock-to-output) that was specified in the max input delay constraint, and continues that data path at the fastest possible combination of process, voltage and temperature. Together with the FPGA’s own data path delay (2.465 ns), the total data path delay stands at 6.465 ns.

The clock path is the calculated, once again with the fastest possible combination, starting from the following clock at 20 ns. The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 22.129 ns, which is 15.664 ns after the data arrived to the flip-flop, which is this constraint’s slack.

It’s simple to see from this analysis that the max input delay is the clock-to-output ( + board delay), as it’s added to the data path. So it’s basically how late the data path started. Note the “Max” part in the Path Type above.

set_input_delay -min timing analysis (hold)

Min Delay Paths
--------------------------------------------------------------------------------------
Slack (VIOLATED) :        -0.045ns  (arrival time - required time)
  Source:                 test_in
                            (input port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_samp_reg/D
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Hold (Min at Slow Process Corner)
  Requirement:            0.000ns  (theclk rise@0.000ns - theclk rise@0.000ns)
  Data Path Delay:        3.443ns  (logic 0.626ns (18.194%)  route 2.817ns (81.806%))
  Logic Levels:           1  (IBUF=1)
  Input Delay:            2.000ns
  Clock Path Skew:        5.351ns (DCD - SCD - CPR)
    Destination Clock Delay (DCD):    5.351ns
    Source Clock Delay      (SCD):    0.000ns
    Clock Pessimism Removal (CPR):    -0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
                         input delay                  2.000     2.000
    AE20                                              0.000     2.000 r  test_in (IN)
                         net (fo=0)                   0.000     2.000    test_in
    AE20                 IBUF (Prop_ibuf_I_O)         0.626     2.626 r  test_in_IBUF_inst/O
                         net (fo=1, routed)           2.817     5.443    test_in_IBUF
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/D
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)     0.000     0.000 r
    AE23                                              0.000     0.000 r  test_clk (IN)
                         net (fo=0)                   0.000     0.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.734     0.734 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           2.651     3.385    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.093     3.478 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           1.873     5.351    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/C
                         clock pessimism              0.000     5.351
                         clock uncertainty            0.035     5.387
    SLICE_X0Y1           FDRE (Hold_fdre_C_D)         0.101     5.488    test_samp_reg
  -------------------------------------------------------------------
                         required time                         -5.488
                         arrival time                           5.443
  -------------------------------------------------------------------
                         slack                                 -0.045

This analysis starts at time zero, adds the 2 ns (clock-to-output) that was specified in the min input delay constraint, and continues that data path at the slowest possible combination of process, voltage and temperature. Together with the FPGA’s own data path delay (3.443 ns), the total data path delay stands at 5.443 ns. It should be no surprise that the FPGA’s own delay is bigger compared with the fast analysis above.

The clock path is the calculated, now with the slowest possible combination, starting from the same clock edge at 0 ns. After all, this is a hold calculation, so the question is whether the mat wasn’t swept under the feet of the sampling flip-flop before it managed to sample it.

The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 5.488 ns, which is 0.045 ns too late after the data switched. So the constraint was violated, with a negative slack of 0.045.

It’s simple to see from this analysis that the min input delay is the minimal clock-to-output, as it’s added to the data path. So it’s basically how early the data path may start. Note the “Min” part in the Path Type above.

It may come as a surprise that a 2 ns clock-to-output can violate a hold constraint. This shouldn’t be taken lightly — it can cause real problems.

The solution for this case would be to add a PLL to the clock path, which locks the global network’s clock to the input clock. This effectively means pulling it several nanoseconds earlier, which definitely solves the problem.

set_output_delay -max timing analysis (setup)

Slack (MET) :             2.983ns  (required time - arrival time)
  Source:                 test_out_reg/C
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_out
                            (output port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Max at Slow Process Corner
  Requirement:            20.000ns  (theclk rise@20.000ns - theclk rise@0.000ns)
  Data Path Delay:        3.631ns  (logic 2.583ns (71.152%)  route 1.047ns (28.848%))
  Logic Levels:           1  (OBUF=1)
  Output Delay:           8.000ns
  Clock Path Skew:        -5.351ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    0.000ns = ( 20.000 - 20.000 )
    Source Clock Delay      (SCD):    5.351ns
    Clock Pessimism Removal (CPR):    0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
    AE23                                              0.000     0.000 r  test_clk (IN)
                         net (fo=0)                   0.000     0.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.734     0.734 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           2.651     3.385    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.093     3.478 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           1.873     5.351    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_out_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X0Y1           FDRE (Prop_fdre_C_Q)         0.223     5.574 r  test_out_reg/Q
                         net (fo=1, routed)           1.047     6.622    test_out_OBUF
    AK21                 OBUF (Prop_obuf_I_O)         2.360     8.982 r  test_out_OBUF_inst/O
                         net (fo=0)                   0.000     8.982    test_out
    AK21                                                              r  test_out (OUT)
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)    20.000    20.000 r
                         clock pessimism              0.000    20.000
                         clock uncertainty           -0.035    19.965
                         output delay                -8.000    11.965
  -------------------------------------------------------------------
                         required time                         11.965
                         arrival time                          -8.982
  -------------------------------------------------------------------
                         slack                                  2.983

Since the purpose of this analysis is to measure the output delay, it starts off with the clock edge, follows it towards the flip-flop, and then along the data path. That sums up to the overall delay. Note that the “Path Type” doesn’t say it’s a setup calculation (to avoid confusion?) even though it takes the following clock (at 20 ns) into consideration.

The calculation takes place at the slowest possible combination of process, voltage and temperature (recall that the input setup calculation took place with the fastest one). Following the clock path, it’s evidently very similar to the clock path of the hold analysis for input delay, which is quite expected, as both are based upon the slow model.

The data path simply continues the clock path until the physical output is stable, calculated at 8.982 ns.

This is compared with the time of the following clock at 20 ns, minus the output delay. Minus the possible jitter (0.035 ns in the case above). Data arrived at 8.982 ns, the moment that counts is at ~12 ns, so there’s almost 3 ns slack.

This demonstrates why set_output_delay -max is the setup time of the receiver: The output delay is reduced from the following clock’s time position, and that’s the goal to meet. That’s exactly the definition of setup time: How long before the following clock the data must be stable.

set_output_delay -min timing analysis (hold)

Slack (MET) :             0.791ns  (arrival time - required time)
  Source:                 test_out_reg/C
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_out
                            (output port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Min at Fast Process Corner
  Requirement:            0.000ns  (theclk rise@0.000ns - theclk rise@0.000ns)
  Data Path Delay:        1.665ns  (logic 1.384ns (83.159%)  route 0.280ns (16.841%))
  Logic Levels:           1  (OBUF=1)
  Output Delay:           -3.000ns
  Clock Path Skew:        -2.162ns (DCD - SCD - CPR)
    Destination Clock Delay (DCD):    0.000ns
    Source Clock Delay      (SCD):    2.162ns
    Clock Pessimism Removal (CPR):    -0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
    AE23                                              0.000     0.000 r  test_clk (IN)
                         net (fo=0)                   0.000     0.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.077     0.077 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           1.278     1.355    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.026     1.381 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           0.781     2.162    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_out_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X0Y1           FDRE (Prop_fdre_C_Q)         0.100     2.262 r  test_out_reg/Q
                         net (fo=1, routed)           0.280     2.542    test_out_OBUF
    AK21                 OBUF (Prop_obuf_I_O)         1.284     3.826 r  test_out_OBUF_inst/O
                         net (fo=0)                   0.000     3.826    test_out
    AK21                                                              r  test_out (OUT)
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)     0.000     0.000 r
                         clock pessimism              0.000     0.000
                         clock uncertainty            0.035     0.035
                         output delay                 3.000     3.035
  -------------------------------------------------------------------
                         required time                         -3.035
                         arrival time                           3.826
  -------------------------------------------------------------------
                         slack                                  0.791

This analysis is similar to the max output delay, only it’s calculated on the fastest possible combination of process, voltage and temperature, and against the same clock edge (and not the following one). So again, going from setup to hold, these are reversed. Once again, the clock path is very similar to the clock path of the setup analysis for input delay, which is quite expected, as both are based upon the fast model.

As before, the data path continues the clock path until the physical output is stable, calculated at 3.826 ns (note the difference with the slow path!).

This is compared with the time of the same clock at 0 ns, minus the output delay, minus the possible jitter (0.035 ns in the case above, not clear why it’s counted if it’s the same clock cycle, but anyhow). Recall that the min output delay was negative (-3 ns), which is why it appears as a positive number in the calculation.

Conclusion: Data was stable until 3.826 ns, and needs to be stable until 3.035. That’s fine, with a 0.791 ns slack.

This demonstrates why set_output_delay -min is minus the hold time of the receiver: Jitter aside, the given output delay with reversed sign is used as the time which the data path delay must exceed. In other words, the data must be stable for that long after the clock. This is the definition of hold time.