VMplayer: Silencing excessive hard disk activity

For some unknown reason, possibly after an VMplayer upgrade, running any Windows Virtual machine on my Linux machine with WMware Player caused some non-stop heavy hard disk activity, even when the guest machine was effectively idle, and made had no I/O activity of its own.

Except for being surprisingly annoying, it also made the mouse pointer non-responsive and the effect was adverse on the hosting machine as well.

So eventually I managed to get things normal by editing the virtual machine’s  .vmx file as described below.

I have Vmplayer 6.0.2 on Fedora 12 (suppose both are considered quite old).

Following this post, add

isolation.tools.unity.disable = "TRUE"
unity.allowCompositingInGuest = "FALSE"
unity.enableLaunchMenu = "FALSE"
unity.showBadges = "FALSE"
unity.showBorders = "FALSE"
unity.wasCapable = "FALSE"

(unity.wasCapable was already in the file, so remove it first)

That appeared to help somewhat. But what really gave the punch was also adding

MemTrimRate = "0"
sched.mem.pshare.enable = "FALSE"
MemAllowAutoScaleDown = "FALSE"

Don’t ask me what it means. Your guess is as good as mine.

Linux: Where the USB related kernel files are

A few notes on where to find USB related kernel files on a Linux system (kernel 3.12.20 in my case)

$ lsusb
[ ... ]
Bus 001 Device 059: ID 046d:c52b Logitech, Inc.

Now find the position in the tree. It should be device 59 under bus number 1:

$ lsusb -t
[ ... ]
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/6p, 480M
    |__ Port 4: Dev 4, If 0, Class=hub, Driver=hub/4p, 480M
        |__ Port 1: Dev 59, If 0, Class=HID, Driver=usbhid, 12M
        |__ Port 1: Dev 59, If 1, Class=HID, Driver=usbhid, 12M
        |__ Port 1: Dev 59, If 2, Class=HID, Driver=usbhid, 12M
        |__ Port 3: Dev 98, If 0, Class=vend., Driver=pl2303, 12M
    |__ Port 6: Dev 94, If 0, Class=vend., Driver=rt2800usb, 480M

So it’s bus 1, hub on power 4 and then port 1. Verify by checking the IDs:

$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/idVendor
046d
$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/idProduct
c52b

or look at the individual interfaces:

$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/1-4.1\:1.2/bInterfaceClass
03

The device file accessed for raw userspace I/O with a USB device (with e.g libusb) is in /dev/usb/ followed by the bus number and address. For example, the Logitech device mentioned above is at bus 1, address 59, hence

$ ls -l /dev/bus/usb/001/059
crw-rw-r-- 1 root root 189, 58 2017-05-17 09:57 /dev/bus/usb/001/059

Note the permissions and major/minors. The major is 189 (usb_devices on my system, according to /proc/devices). The minor is the ((bus_number-1) * 128) + address – 1.

The permissions and ownership are those in effect for who’s allowed to access this device. This is the place to check if udev rules that allow wider access to a device have done their job.

Altera ECPQ flash access with a Nios II processor + programming bitfiles

Introduction

This post outlines some technical details on accessing an Altera ECPQ flash from a Nios II processor for read, write and erase. A non-OS settings (“bare metal”) setting is assumed.

And as a bonus (at the bottom of this post), how to program the flash based upon a SOF file, both with JTAG and by writing directly.

Remote Update is discussed in this post.

Hardware setup

In the Qsys project, there should be an instance of the Legacy EPCS/EPCQx1 Flash Controller, configured with the default parameters (not that there is much to configure). The peripheral’s epcs_control_port should be connected to the Nios II’s data master Avalon port (no point connecting it to the instruction master too).

In this example, we’ll assume that the name of Flash Controller in Qsys is epcs_flash_controller_0.

The interrupt signal isn’t used in the software setting given below, but as the connection to the Nios processor, as well as the interrupt number assignment is automatic, let it be.

Clock and reset — like the other peripherals.

The external conduit is connected as follows to an ECPQ flash, for a x1 access:

  • Flash pin DATA0 to epcs_flash_controller_0_sdo (FPGA pin ASDO)
  • Flash pin DCLK to epcs_flash_controller_0_dclk (FPGA pin DCLK)
  • Flash pin nCS to epcs_flash_controller_0_sce (FPGA pin NCSO)
  • Flash pin DATA1 to epcs_flash_controller_0_data (FPGA pin DATA0)

The FPGA pins above relate to dual-use of the configuration, which allows the FPGA to configure in Active Serial (AS) x 1 mode. Once the configuration is done, these pins become general-purpose I/O (when so required by assignments), which allows regular access to the flash device.

Note that the flash pin DATA1 is connected to the FPGA pin DATA0 — this is not a mistake, but the correct wiring for AS x 1 interface.

It’s of course possible to connect the flash to regular I/O pins, but then the FPGA won’t be able to configure from the flash.

Software

Altera’s BSP includes drivers for flash operations with multiple layers of abstraction. This abstraction is not always necessary, and makes it somewhat difficult to figure out what’s going on (in particular when things go wrong). In particular, the higher-level drivers erase flash sectors automatically before writing, which can render some counterintuitive behavior, for example if multiple write requests are made on the same sector.

I therefore prefer working with the lowest-level drivers, which merely translate the flash commands into SPI communication. It leaves the user with the responsibility to erase sectors before writing to them.

The rule is simple: The flash is divided into sectors of 64 kB each. An erase operation is performed on such 64 kB sector, leaving all its bytes in all-1′s (all bytes are 0xff).

Writing can then be done to arbitrary addresses, but effectively the data in the flash is the written data ANDed with the previous content of the memory cells. Which means a plain write, if the region has been previously erased. It’s commonly believed that it’s unhealthy for the flash to write to a byte cell twice without an erase in the middle.

This is a simple program that runs on the Nios II processor, which demonstrates read, write and erase.

#include <system.h>
#include <alt_types.h>
#include <io.h>
#include "sys/alt_stdio.h"
#include "epcs_commands.h"

static void hexprint(alt_u8 *buf, int num) {
  int i;

  const char hexes[] = "0123456789abcdef";

  for (i = 0; i < num; i++) {
    alt_putchar(hexes[(buf[i] >> 4) & 0xf]);
    alt_putchar(hexes[buf[i] & 0xf]);
    if ((i & 0xf) == 0xf)
      alt_putchar(10); // "\n"
    else
      alt_putchar(32); // " "
  }
  alt_putchar(10); // "\n"
}

int main()
{
  alt_u32 register_base = EPCS_FLASH_CONTROLLER_0_BASE + EPCS_FLASH_CONTROLLER_0_REGISTER_OFFSET;
  alt_u32 silicon_id;

  alt_u8 buf[256];
  alt_u32 junk = 0x12345678;
  const alt_u32 flash_address = 0x100000;

  silicon_id = epcs_read_device_id(register_base);

  alt_printf("ID = %x\n", silicon_id);

  // epcs_read_buffer always returns the length of the buffer, so no
  // point checking its return value.

  alt_printf("Before doing anything:\n");

  epcs_read_buffer(register_base, flash_address, buf, sizeof(buf), 0);
  hexprint(buf, 16);

  // epcs_sector_erase erases the 64 kiB sector that contains the address
  // given as its second argument, and waits for the erasure to complete
  // by polling the status register and waiting for the WIP (write in progress)
  // bit to clear.

  epcs_sector_erase(register_base, flash_address, 0);

  alt_printf("After erasing\n");

  epcs_read_buffer(register_base, flash_address, buf, sizeof(buf), 0);
  hexprint(buf, 16);

  // epcs_write_buffer must be used on a region previously erased. The
  // command waits for the operation to complete by polling the status
  // register and waiting for the WIP (write in progress) bit to clear.
  epcs_write_buffer(register_base, flash_address, (void *) &junk, sizeof(junk), 0);

  alt_printf("After writing\n");

  epcs_read_buffer(register_base, flash_address, buf, sizeof(buf), 0);
  hexprint(buf, 16);

  /* Event loop never exits. */

  while (1);

  return 0;
}

The program reads 256 bytes each time, even though only 16 bytes are displayed. Any byte count is allowed in read and write. Needless to say, flash_address can be changed to any address in the device’s range. The choice of 0x100000 kept it off the configuration bitstream for the relevant FPGA.

This is the output of the program above running against an EPCQ16:

ID = 20ba15
Before doing anything:
78 56 34 12 ff ff ff ff ff ff ff ff ff ff ff ff

After erasing
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

After writing
78 56 34 12 ff ff ff ff ff ff ff ff ff ff ff ff

The data in the “Before doing anything” part can be anything that was left in the flash when the program ran. In the case above, it’s the results of the previous run of the same program.

As a side note, all EPCQ flashes also support erasing subsectors, each of 4 kiB size (hence 16 subsectors per sectors). Altera’s low-level drivers don’t support subsector erase, but it’s quite easy to expand the code to do so.

Programming the flash with a SOF file

As promised, here’s the outline of how to program the EPCQ flash with a bitstream configuration file. Not as fancy as the topic above, but nevertheless useful. The flash needs to be connected as follows:

  • Flash pin DATA0 to FPGA pin ASDO
  • Flash pin DCLK to FPGA pin DCLK
  • Flash pin nCS to FPGA pin NCSO
  • Flash pin DATA1 to FPGA pin DATA0 (once again, this is not a mistake. DATA1 to DATA0 indeed)

First thing first: Generate a JIC file. Command-line style, e.g.:

quartus_cpf -c -d EPCQ16 -s EP4CE15 projname.sof projname.jic

In the example above, the EPCQ16 argument is the flash device, and the EP4CE15 is the FPGA that will be used to program the flash, which is most likely the same FPGA the SOF targets.

Or do it with GUI:

  • In Quartus, pick File > Convert Programming File…
  • Choose jic output file format, and set the output file name.
  • Set the configuration device to e.g. EPCQ16, Active Serial (not x4).
  • Pick the SOF Data row, Page_0, click Add File… and pick SOF file.
  • Pick the Flash Loader and click Add Device…, and choose e.g. Cyclone IV E, and then the same device as listed for the SOF file.
  • If you want to write to the flash with your own utility, check “Create Config data RPD”
  • Click Generate. A window saying the JIC file has been generated successfully should appear.
  • Click Close to close this tool.

Programming the flash with JTAG:

  • Open the regular JTAG programmer in Quartus (not the one in Eclipse). The one used to configure the FPGA via JTAG with a bitstream, that is.
  • Click Add File… and select the JIC file created above.
  • The FPGA with its flash attached should appear in the diagram part of the window.
  • Select the Program/Configure checkbox on the flash’ (e.g. EPCQ16) row
  • Click Start.
  • This should take some 10 seconds or so (for EP4CE15′s bitfile), and end successfully.
  • The flash is now programmed.

Note that there’s an “Erase” checkbox on the flash’ row — there is no need to enable it along with Program/Configure, and neither is it necessary. The Programmer gets the hint, and erases the flash before programming it.

Programming the flash with NIOS software (or similar)

Note that I have another post focusing on remote update.

To program the flash with your own utility, make sure that you’ve checked “Create Config data RPD” when generating the JIC. Then, using the flash API mentioned above, copy the RPD file into the flash from address 0 to make it load when the FPGA powers up, or to a higher address for using the bitstream with a Remote Update core (allowing configuration from higher addresses).

And note the following, which relates to my experience with using the EPCQ16 flash for AS configuring an Cyclone IV E FPGA, and running Quartus Prime Version 15.1.0 Build 185 (YMMV):

  • Bit reversal is mandatory if epcs_write_buffer() is used for writing to the flash (or any other Nios API, I suppose). That means that for each byte in the RPD file, move bit 7 to bit 0, bit 6 to bit 1 etc. There are small hints of bit reversal spread out in the docs, for example, in the “Read Bytes Operation” section of the Quad-Serial Configuration (EPCQ) Devices Datasheet.
  • All my attempts to generate RBF or RPD files in other ways, including using the command line tool (quartus_cpf) to create an RBF from the SOF or an RPD from a POF failed. That is, I got RBF and RPD files, but they slightly different from the file that eventually worked. In particular, the RBF file obtained with
    quartus_cpf -c project.sof project.rbf

    was almost identical to the RPD file that worked properly, with a few bytes different in the 0x20-0x4f positions of the files. And that difference probably made the FPGA refuse to configure from it. Go figure.

  • If you’re really into generating the flash image with command line tools, generate a COF file (containing the configuration parameters) with the GUI, and use it with something like
    quartus_cpf -c project.cof

    The trick about this COF is that it should generate a proper JIC file, but have the <auto_create_rpd> part set to “1″.

And finally, just a few sources I found (slightly unrelated):

  • Srunner is a command line utility for programming a EPCS flash. Since source code is given, it can give some insights, as well as its documentation.
  • The format of POF files is outlined in fmt_pof.pdf.

gcc: Solving “undefined reference” even when the required library is listed with -l

It worked all so nicely on my Fedora 12 machine, and then on Ubuntu 14.04.1 it failed colossally:

$ make
gcc -Wall  -O3 -g -lusb-1.0 -c  -o bulkread.o bulkread.c
gcc -Wall  -O3 -g -lusb-1.0 -c  -o usberrors.o usberrors.c
gcc -Wall  -O3 -g -lusb-1.0 bulkread.o usberrors.o -o bulkread
bulkread.o: In function `main':
bulkread.c:39: undefined reference to `libusb_init'
bulkread.c:46: undefined reference to `libusb_set_debug'
bulkread.c:48: undefined reference to `libusb_open_device_with_vid_pid'
[ ... ]

And it went on and on. Note that there was no complaint about not finding the library, and yet it failed to find the symbols.

The problem was the position of the -l flag. It turns out that Ubuntu silently adds an –as-needed flag to the linker, which effectively means that the -l flag must appear after the object file that needs the symbols, or it will be effectively ignored.

So the correct way is:

$ make
gcc -Wall  -O3 -g -c  -o bulkread.o bulkread.c
gcc -Wall  -O3 -g -c  -o usberrors.o usberrors.c
gcc -Wall  -O3 -g bulkread.o usberrors.o -o bulkread -lusb-1.0

It’s all about the flag’s position…

XEmacs / VHDL: Stop that annoying “assistance” while typing

Emacs’ (and hence XEmacs’) VHDL mode has an annoying thing about hopping in and “help me” with composing code. Type “if” and it tells me I need to add an expression. Thanks. I wouldn’t have figured it out myself.

So here’s how to disable this annoyance:

Add in~/.xemacs/custom.el, to the custom-set-variables clause

'(vhdl-electric-mode nil)
'(vhdl-stutter-mode nil)

or turn off the respective options inside XEmacs, under VHDL > Options > Mode, and then VHDL > Options > Save Options

And enjoy the bliss of an editor doing what it’s supposed to do.

Quartus’ timing analysis on set_input_delay and set_output_delay constraints

OK, what’s this?

This page is the example part of another post, which explains the meaning of set_input_delay and set_output_delay in SDC timing constraints.

TimeQuest (Quartus’ timing analyzer) performs a four-corner check (max/min temperature, max/min voltage) and picks the worst slack. In the examples below, the worst case of these four corners is shown. It’s not exactly clear why a certain delay model becomes the worst case all the times.

As mentioned on the other post, the relevant timing constraints were:

create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]

set_input_delay -max timing analysis (setup)

Delay Model:
    Slow 1100mV 0C Model

+------------------------------------------------------------------------------------------------------+
; Summary of Paths                                                                                     ;
+--------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; Slack  ; From Node ; To Node   ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+--------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; 12.341 ; test_in   ; test_samp ; theclk       ; theclk      ; 20.000       ; 3.940      ; 7.499      ;
+--------+-----------+-----------+--------------+-------------+--------------+------------+------------+

Path #1: Setup slack is 12.341
===============================================================================
+--------------------------------+
; Path Summary                   ;
+--------------------+-----------+
; Property           ; Value     ;
+--------------------+-----------+
; From Node          ; test_in   ;
; To Node            ; test_samp ;
; Launch Clock       ; theclk    ;
; Latch Clock        ; theclk    ;
; Data Arrival Time  ; 11.499    ;
; Data Required Time ; 23.840    ;
; Slack              ; 12.341    ;
+--------------------+-----------+

+---------------------------------------------------------------------------------------+
; Statistics                                                                            ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Property                  ; Value  ; Count ; Total Delay ; % of Total ; Min   ; Max   ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Setup Relationship        ; 20.000 ;       ;             ;            ;       ;       ;
; Clock Skew                ; 3.940  ;       ;             ;            ;       ;       ;
; Data Delay                ; 7.499  ;       ;             ;            ;       ;       ;
; Number of Logic Levels    ;        ; 1     ;             ;            ;       ;       ;
; Physical Delays           ;        ;       ;             ;            ;       ;       ;
;  Arrival Path             ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 0.000       ;            ; 0.000 ; 0.000 ;
;   Data                    ;        ;       ;             ;            ;       ;       ;
;    IC                     ;        ; 2     ; 2.447       ; 33         ; 0.000 ; 2.447 ;
;    Cell                   ;        ; 2     ; 5.052       ; 67         ; 0.652 ; 4.400 ;
;  Required Path            ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 3.940       ; 100        ; 3.940 ; 3.940 ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages

+-----------------------------------------------------------------------------------+
; Data Arrival Path                                                                 ;
+----------+---------+----+------+--------+-------------------+---------------------+
; Total    ; Incr    ; RF ; Type ; Fanout ; Location          ; Element             ;
+----------+---------+----+------+--------+-------------------+---------------------+
; 0.000    ; 0.000   ;    ;      ;        ;                   ; launch edge time    ;
; 0.000    ; 0.000   ;    ;      ;        ;                   ; clock path          ;
;   0.000  ;   0.000 ; R  ;      ;        ;                   ; clock network delay ;
; 4.000    ; 4.000   ; F  ; iExt ; 1      ; PIN_AP17          ; test_in             ;
; 11.499   ; 7.499   ;    ;      ;        ;                   ; data path           ;
;   4.000  ;   0.000 ; FF ; IC   ; 1      ; IOIBUF_X48_Y0_N58 ; test_in~input|i     ;
;   8.400  ;   4.400 ; FF ; CELL ; 1      ; IOIBUF_X48_Y0_N58 ; test_in~input|o     ;
;   10.847 ;   2.447 ; FF ; IC   ; 1      ; FF_X48_Y2_N40     ; test_samp|asdata    ;
;   11.499 ;   0.652 ; FF ; CELL ; 1      ; FF_X48_Y2_N40     ; test_samp           ;
+----------+---------+----+------+--------+-------------------+---------------------+

+-------------------------------------------------------------------------------+
; Data Required Path                                                            ;
+----------+---------+----+------+--------+---------------+---------------------+
; Total    ; Incr    ; RF ; Type ; Fanout ; Location      ; Element             ;
+----------+---------+----+------+--------+---------------+---------------------+
; 20.000   ; 20.000  ;    ;      ;        ;               ; latch edge time     ;
; 23.940   ; 3.940   ;    ;      ;        ;               ; clock path          ;
;   23.940 ;   3.940 ; R  ;      ;        ;               ; clock network delay ;
; 23.840   ; -0.100  ;    ;      ;        ;               ; clock uncertainty   ;
; 23.840   ; 0.000   ;    ; uTsu ; 1      ; FF_X48_Y2_N40 ; test_samp           ;
+----------+---------+----+------+--------+---------------+---------------------+

This analysis starts in “Data Arrival Path” with setting the input port (test_in) at 4 ns as specified in the max input delay constraint, and continues that data path. Together with the FPGA’s own data path delay (7.499 ns), the total data path delay stands at 11.499 ns.

The clock path is the calculated in “Data Required Path”, starting from the following clock at 20 ns. The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 23.840 ns, which is 12.341 ns after the data arrived to the flip-flop, which is this constraint’s slack.

It’s simple to see from this analysis that the max input delay is the clock-to-output ( + board delay), as it’s the starting time of the data path.

set_input_delay -min timing analysis (hold)

Delay Model:
    Slow 1100mV 85C Model

+-----------------------------------------------------------------------------------------------------+
; Summary of Paths                                                                                    ;
+-------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node ; To Node   ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+-------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; 0.770 ; test_in   ; test_samp ; theclk       ; theclk      ; 0.000        ; 4.287      ; 3.057      ;
+-------+-----------+-----------+--------------+-------------+--------------+------------+------------+

Path #1: Hold slack is 0.770
===============================================================================
+--------------------------------+
; Path Summary                   ;
+--------------------+-----------+
; Property           ; Value     ;
+--------------------+-----------+
; From Node          ; test_in   ;
; To Node            ; test_samp ;
; Launch Clock       ; theclk    ;
; Latch Clock        ; theclk    ;
; Data Arrival Time  ; 5.057     ;
; Data Required Time ; 4.287     ;
; Slack              ; 0.770     ;
+--------------------+-----------+

+--------------------------------------------------------------------------------------+
; Statistics                                                                           ;
+---------------------------+-------+-------+-------------+------------+-------+-------+
; Property                  ; Value ; Count ; Total Delay ; % of Total ; Min   ; Max   ;
+---------------------------+-------+-------+-------------+------------+-------+-------+
; Hold Relationship         ; 0.000 ;       ;             ;            ;       ;       ;
; Clock Skew                ; 4.287 ;       ;             ;            ;       ;       ;
; Data Delay                ; 3.057 ;       ;             ;            ;       ;       ;
; Number of Logic Levels    ;       ; 1     ;             ;            ;       ;       ;
; Physical Delays           ;       ;       ;             ;            ;       ;       ;
;  Arrival Path             ;       ;       ;             ;            ;       ;       ;
;   Clock                   ;       ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;       ; 1     ; 0.000       ;            ; 0.000 ; 0.000 ;
;   Data                    ;       ;       ;             ;            ;       ;       ;
;    IC                     ;       ; 2     ; 2.028       ; 66         ; 0.000 ; 2.028 ;
;    Cell                   ;       ; 2     ; 1.029       ; 34         ; 0.290 ; 0.739 ;
;  Required Path            ;       ;       ;             ;            ;       ;       ;
;   Clock                   ;       ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;       ; 1     ; 4.287       ; 100        ; 4.287 ; 4.287 ;
+---------------------------+-------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages

+----------------------------------------------------------------------------------+
; Data Arrival Path                                                                ;
+---------+---------+----+------+--------+-------------------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location          ; Element             ;
+---------+---------+----+------+--------+-------------------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                   ; launch edge time    ;
; 0.000   ; 0.000   ;    ;      ;        ;                   ; clock path          ;
;   0.000 ;   0.000 ; R  ;      ;        ;                   ; clock network delay ;
; 2.000   ; 2.000   ; R  ; iExt ; 1      ; PIN_AP17          ; test_in             ;
; 5.057   ; 3.057   ;    ;      ;        ;                   ; data path           ;
;   2.000 ;   0.000 ; RR ; IC   ; 1      ; IOIBUF_X48_Y0_N58 ; test_in~input|i     ;
;   2.739 ;   0.739 ; RR ; CELL ; 1      ; IOIBUF_X48_Y0_N58 ; test_in~input|o     ;
;   4.767 ;   2.028 ; RR ; IC   ; 1      ; FF_X48_Y2_N40     ; test_samp|asdata    ;
;   5.057 ;   0.290 ; RR ; CELL ; 1      ; FF_X48_Y2_N40     ; test_samp           ;
+---------+---------+----+------+--------+-------------------+---------------------+

+------------------------------------------------------------------------------+
; Data Required Path                                                           ;
+---------+---------+----+------+--------+---------------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location      ; Element             ;
+---------+---------+----+------+--------+---------------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;               ; latch edge time     ;
; 4.287   ; 4.287   ;    ;      ;        ;               ; clock path          ;
;   4.287 ;   4.287 ; R  ;      ;        ;               ; clock network delay ;
; 4.287   ; 0.000   ;    ;      ;        ;               ; clock uncertainty   ;
; 4.287   ; 0.000   ;    ; uTh  ; 1      ; FF_X48_Y2_N40 ; test_samp           ;
+---------+---------+----+------+--------+---------------+---------------------+

This analysis starts in “Data Arrival Path” with setting the input port (test_in) at 2 ns as specified in the min input delay constraint, and continues that data path. Together with the FPGA’s own data path delay (3.057 ns), the total data path delay stands at 5.057 ns.

The clock path is the calculated in “Data Required Path”, starting from the same clock edge at 0 ns. After all, this is a hold calculation, so the question is whether the mat wasn’t swept under the feet of the sampling flip-flop before it managed to sample it.

The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 4.287 ns, which is 0.770 ns earlier than the data switching, which is also the slack.

It’s simple to see from this analysis that the min input delay is the minimal clock-to-output, as it’s the starting time of the data path.

set_output_delay -max timing analysis (setup)

Delay Model:
    Slow 1100mV 85C Model

+--------------------------------------------------------------------------------------------------------+
; Summary of Paths                                                                                       ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node     ; To Node  ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; 2.651 ; test_out~reg0 ; test_out ; theclk       ; theclk      ; 20.000       ; -5.320     ; 3.929      ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+

Path #1: Setup slack is 2.651
===============================================================================
+------------------------------------+
; Path Summary                       ;
+--------------------+---------------+
; Property           ; Value         ;
+--------------------+---------------+
; From Node          ; test_out~reg0 ;
; To Node            ; test_out      ;
; Launch Clock       ; theclk        ;
; Latch Clock        ; theclk        ;
; Data Arrival Time  ; 9.249         ;
; Data Required Time ; 11.900        ;
; Slack              ; 2.651         ;
+--------------------+---------------+

+---------------------------------------------------------------------------------------+
; Statistics                                                                            ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Property                  ; Value  ; Count ; Total Delay ; % of Total ; Min   ; Max   ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Setup Relationship        ; 20.000 ;       ;             ;            ;       ;       ;
; Clock Skew                ; -5.320 ;       ;             ;            ;       ;       ;
; Data Delay                ; 3.929  ;       ;             ;            ;       ;       ;
; Number of Logic Levels    ;        ; 0     ;             ;            ;       ;       ;
; Physical Delays           ;        ;       ;             ;            ;       ;       ;
;  Arrival Path             ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 5.320       ; 100        ; 5.320 ; 5.320 ;
;   Data                    ;        ;       ;             ;            ;       ;       ;
;    IC                     ;        ; 1     ; 0.000       ; 0          ; 0.000 ; 0.000 ;
;    Cell                   ;        ; 3     ; 3.929       ; 100        ; 0.000 ; 2.150 ;
;    uTco                   ;        ; 1     ; 0.000       ; 0          ; 0.000 ; 0.000 ;
;  Required Path            ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 0.000       ;            ; 0.000 ; 0.000 ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages

+---------------------------------------------------------------------------------------+
; Data Arrival Path                                                                     ;
+---------+---------+----+------+--------+------------------------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location               ; Element             ;
+---------+---------+----+------+--------+------------------------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                        ; launch edge time    ;
; 5.320   ; 5.320   ;    ;      ;        ;                        ; clock path          ;
;   5.320 ;   5.320 ; R  ;      ;        ;                        ; clock network delay ;
; 9.249   ; 3.929   ;    ;      ;        ;                        ; data path           ;
;   5.320 ;   0.000 ;    ; uTco ; 1      ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0       ;
;   7.099 ;   1.779 ; FF ; CELL ; 1      ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0|q     ;
;   7.099 ;   0.000 ; FF ; IC   ; 1      ; IOOBUF_X48_Y0_N42      ; test_out~output|i   ;
;   9.249 ;   2.150 ; FF ; CELL ; 1      ; IOOBUF_X48_Y0_N42      ; test_out~output|o   ;
;   9.249 ;   0.000 ; FF ; CELL ; 0      ; PIN_AN17               ; test_out            ;
+---------+---------+----+------+--------+------------------------+---------------------+

+--------------------------------------------------------------------------+
; Data Required Path                                                       ;
+----------+---------+----+------+--------+----------+---------------------+
; Total    ; Incr    ; RF ; Type ; Fanout ; Location ; Element             ;
+----------+---------+----+------+--------+----------+---------------------+
; 20.000   ; 20.000  ;    ;      ;        ;          ; latch edge time     ;
; 20.000   ; 0.000   ;    ;      ;        ;          ; clock path          ;
;   20.000 ;   0.000 ; R  ;      ;        ;          ; clock network delay ;
; 19.900   ; -0.100  ;    ;      ;        ;          ; clock uncertainty   ;
; 11.900   ; -8.000  ; F  ; oExt ; 0      ; PIN_AN17 ; test_out            ;
+----------+---------+----+------+--------+----------+---------------------+

Since the purpose of this analysis is to measure the output delay, it starts off in “Data Arrival Path” with the clock edge, adds the clock network delay to the flip-flop, and then goes along the data path until the physical output is stable, calculated at 9.249 ns.

This is compared with the time of the following clock at 20 ns, minus the output delay. Minus the possible jitter (0.1 ns in the case above). Data arrived at 9.249 ns, the moment that counts is at 11.9 ns, so there’s a 2.651 ns slack.

This demonstrates why set_output_delay -max is the setup time of the receiver: The output delay is reduced from the following clock’s time position, and that’s the goal to meet. That’s exactly the definition of setup time: How long before the following clock the data must be stable.

set_output_delay -min timing analysis (hold)

Delay Model:
    Fast 1100mV 0C Model

+--------------------------------------------------------------------------------------------------------+
; Summary of Paths                                                                                       ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node     ; To Node  ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; 1.275 ; test_out~reg0 ; test_out ; theclk       ; theclk      ; 0.000        ; -2.255     ; 2.020      ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+

Path #1: Hold slack is 1.275
===============================================================================
+------------------------------------+
; Path Summary                       ;
+--------------------+---------------+
; Property           ; Value         ;
+--------------------+---------------+
; From Node          ; test_out~reg0 ;
; To Node            ; test_out      ;
; Launch Clock       ; theclk        ;
; Latch Clock        ; theclk        ;
; Data Arrival Time  ; 4.275         ;
; Data Required Time ; 3.000         ;
; Slack              ; 1.275         ;
+--------------------+---------------+

+---------------------------------------------------------------------------------------+
; Statistics                                                                            ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Property                  ; Value  ; Count ; Total Delay ; % of Total ; Min   ; Max   ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Hold Relationship         ; 0.000  ;       ;             ;            ;       ;       ;
; Clock Skew                ; -2.255 ;       ;             ;            ;       ;       ;
; Data Delay                ; 2.020  ;       ;             ;            ;       ;       ;
; Number of Logic Levels    ;        ; 0     ;             ;            ;       ;       ;
; Physical Delays           ;        ;       ;             ;            ;       ;       ;
;  Arrival Path             ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 2.255       ; 100        ; 2.255 ; 2.255 ;
;   Data                    ;        ;       ;             ;            ;       ;       ;
;    IC                     ;        ; 1     ; 0.000       ; 0          ; 0.000 ; 0.000 ;
;    Cell                   ;        ; 3     ; 2.020       ; 100        ; 0.000 ; 1.296 ;
;    uTco                   ;        ; 1     ; 0.000       ; 0          ; 0.000 ; 0.000 ;
;  Required Path            ;        ;       ;             ;            ;       ;       ;
;   Clock                   ;        ;       ;             ;            ;       ;       ;
;    Clock Network (Lumped) ;        ; 1     ; 0.000       ;            ; 0.000 ; 0.000 ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages

+---------------------------------------------------------------------------------------+
; Data Arrival Path                                                                     ;
+---------+---------+----+------+--------+------------------------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location               ; Element             ;
+---------+---------+----+------+--------+------------------------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;                        ; launch edge time    ;
; 2.255   ; 2.255   ;    ;      ;        ;                        ; clock path          ;
;   2.255 ;   2.255 ; R  ;      ;        ;                        ; clock network delay ;
; 4.275   ; 2.020   ;    ;      ;        ;                        ; data path           ;
;   2.255 ;   0.000 ;    ; uTco ; 1      ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0       ;
;   2.979 ;   0.724 ; RR ; CELL ; 1      ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0|q     ;
;   2.979 ;   0.000 ; RR ; IC   ; 1      ; IOOBUF_X48_Y0_N42      ; test_out~output|i   ;
;   4.275 ;   1.296 ; RR ; CELL ; 1      ; IOOBUF_X48_Y0_N42      ; test_out~output|o   ;
;   4.275 ;   0.000 ; RR ; CELL ; 0      ; PIN_AN17               ; test_out            ;
+---------+---------+----+------+--------+------------------------+---------------------+

+-------------------------------------------------------------------------+
; Data Required Path                                                      ;
+---------+---------+----+------+--------+----------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location ; Element             ;
+---------+---------+----+------+--------+----------+---------------------+
; 0.000   ; 0.000   ;    ;      ;        ;          ; latch edge time     ;
; 0.000   ; 0.000   ;    ;      ;        ;          ; clock path          ;
;   0.000 ;   0.000 ; R  ;      ;        ;          ; clock network delay ;
; 0.000   ; 0.000   ;    ;      ;        ;          ; clock uncertainty   ;
; 3.000   ; 3.000   ; R  ; oExt ; 0      ; PIN_AN17 ; test_out            ;
+---------+---------+----+------+--------+----------+---------------------+

This analysis is similar to the max output delay, only it’s calculated against the same clock edge (and not the following one).

As before, the data path continues the clock path until the physical output is stable, calculated at 4.275 ns.

This is compared with the time of the same clock at 0 ns, minus the output delay. Recall that the min output delay was negative (-3 ns), which is why it appears as a positive number in the calculation.

Conclusion: Data was stable until 4.275 ns, and needs to be stable until 3 ns. That’s fine, with a 1.275 ns slack.

This demonstrates why set_output_delay -min is minus the hold time of the receiver: The given output delay with reversed sign is used as the time which the data path delay must exceed. In other words, the data must be stable for that long after the clock. This is the definition of hold time.

Vivado’s timing analysis on set_input_delay and set_output_delay constraints

OK, what’s this?

This page is the example part of another post, which explains the meaning of set_input_delay and set_output_delay in SDC timing constraints.

As mentioned on the other post, the relevant timing constraints were:

create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]

set_input_delay -max timing analysis (setup)

Slack (MET) :             15.664ns  (required time - arrival time)
  Source:                 test_in
                            (input port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_samp_reg/D
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Setup (Max at Fast Process Corner)
  Requirement:            20.000ns  (theclk rise@20.000ns - theclk rise@0.000ns)
  Data Path Delay:        2.465ns  (logic 0.291ns (11.797%)  route 2.175ns (88.203%))
  Logic Levels:           1  (IBUF=1)
  Input Delay:            4.000ns
  Clock Path Skew:        2.162ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    2.162ns = ( 22.162 - 20.000 )
    Source Clock Delay      (SCD):    0.000ns
    Clock Pessimism Removal (CPR):    0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
                         input delay                  4.000     4.000
    AE20                                              0.000     4.000 r  test_in (IN)
                         net (fo=0)                   0.000     4.000    test_in
    AE20                 IBUF (Prop_ibuf_I_O)         0.291     4.291 r  test_in_IBUF_inst/O
                         net (fo=1, routed)           2.175     6.465    test_in_IBUF
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/D
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)    20.000    20.000 r
    AE23                                              0.000    20.000 r  test_clk (IN)
                         net (fo=0)                   0.000    20.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.077    20.077 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           1.278    21.355    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.026    21.381 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           0.781    22.162    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/C
                         clock pessimism              0.000    22.162
                         clock uncertainty           -0.035    22.126
    SLICE_X0Y1           FDRE (Setup_fdre_C_D)        0.003    22.129    test_samp_reg
  -------------------------------------------------------------------
                         required time                         22.129
                         arrival time                          -6.465
  -------------------------------------------------------------------
                         slack                                 15.664

This analysis starts at time zero, adds the 4 ns (clock-to-output) that was specified in the max input delay constraint, and continues that data path at the fastest possible combination of process, voltage and temperature. Together with the FPGA’s own data path delay (2.465 ns), the total data path delay stands at 6.465 ns.

The clock path is the calculated, once again with the fastest possible combination, starting from the following clock at 20 ns. The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 22.129 ns, which is 15.664 ns after the data arrived to the flip-flop, which is this constraint’s slack.

It’s simple to see from this analysis that the max input delay is the clock-to-output ( + board delay), as it’s added to the data path. So it’s basically how late the data path started. Note the “Max” part in the Path Type above.

set_input_delay -min timing analysis (hold)

Min Delay Paths
--------------------------------------------------------------------------------------
Slack (VIOLATED) :        -0.045ns  (arrival time - required time)
  Source:                 test_in
                            (input port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_samp_reg/D
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Hold (Min at Slow Process Corner)
  Requirement:            0.000ns  (theclk rise@0.000ns - theclk rise@0.000ns)
  Data Path Delay:        3.443ns  (logic 0.626ns (18.194%)  route 2.817ns (81.806%))
  Logic Levels:           1  (IBUF=1)
  Input Delay:            2.000ns
  Clock Path Skew:        5.351ns (DCD - SCD - CPR)
    Destination Clock Delay (DCD):    5.351ns
    Source Clock Delay      (SCD):    0.000ns
    Clock Pessimism Removal (CPR):    -0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
                         input delay                  2.000     2.000
    AE20                                              0.000     2.000 r  test_in (IN)
                         net (fo=0)                   0.000     2.000    test_in
    AE20                 IBUF (Prop_ibuf_I_O)         0.626     2.626 r  test_in_IBUF_inst/O
                         net (fo=1, routed)           2.817     5.443    test_in_IBUF
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/D
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)     0.000     0.000 r
    AE23                                              0.000     0.000 r  test_clk (IN)
                         net (fo=0)                   0.000     0.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.734     0.734 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           2.651     3.385    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.093     3.478 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           1.873     5.351    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/C
                         clock pessimism              0.000     5.351
                         clock uncertainty            0.035     5.387
    SLICE_X0Y1           FDRE (Hold_fdre_C_D)         0.101     5.488    test_samp_reg
  -------------------------------------------------------------------
                         required time                         -5.488
                         arrival time                           5.443
  -------------------------------------------------------------------
                         slack                                 -0.045

This analysis starts at time zero, adds the 2 ns (clock-to-output) that was specified in the min input delay constraint, and continues that data path at the slowest possible combination of process, voltage and temperature. Together with the FPGA’s own data path delay (3.443 ns), the total data path delay stands at 5.443 ns. It should be no surprise that the FPGA’s own delay is bigger compared with the fast analysis above.

The clock path is the calculated, now with the slowest possible combination, starting from the same clock edge at 0 ns. After all, this is a hold calculation, so the question is whether the mat wasn’t swept under the feet of the sampling flip-flop before it managed to sample it.

The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 5.488 ns, which is 0.045 ns too late after the data switched. So the constraint was violated, with a negative slack of 0.045.

It’s simple to see from this analysis that the min input delay is the minimal clock-to-output, as it’s added to the data path. So it’s basically how early the data path may start. Note the “Min” part in the Path Type above.

It may come as a surprise that a 2 ns clock-to-output can violate a hold constraint. This shouldn’t be taken lightly — it can cause real problems.

The solution for this case would be to add a PLL to the clock path, which locks the global network’s clock to the input clock. This effectively means pulling it several nanoseconds earlier, which definitely solves the problem.

set_output_delay -max timing analysis (setup)

Slack (MET) :             2.983ns  (required time - arrival time)
  Source:                 test_out_reg/C
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_out
                            (output port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Max at Slow Process Corner
  Requirement:            20.000ns  (theclk rise@20.000ns - theclk rise@0.000ns)
  Data Path Delay:        3.631ns  (logic 2.583ns (71.152%)  route 1.047ns (28.848%))
  Logic Levels:           1  (OBUF=1)
  Output Delay:           8.000ns
  Clock Path Skew:        -5.351ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    0.000ns = ( 20.000 - 20.000 )
    Source Clock Delay      (SCD):    5.351ns
    Clock Pessimism Removal (CPR):    0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
    AE23                                              0.000     0.000 r  test_clk (IN)
                         net (fo=0)                   0.000     0.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.734     0.734 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           2.651     3.385    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.093     3.478 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           1.873     5.351    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_out_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X0Y1           FDRE (Prop_fdre_C_Q)         0.223     5.574 r  test_out_reg/Q
                         net (fo=1, routed)           1.047     6.622    test_out_OBUF
    AK21                 OBUF (Prop_obuf_I_O)         2.360     8.982 r  test_out_OBUF_inst/O
                         net (fo=0)                   0.000     8.982    test_out
    AK21                                                              r  test_out (OUT)
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)    20.000    20.000 r
                         clock pessimism              0.000    20.000
                         clock uncertainty           -0.035    19.965
                         output delay                -8.000    11.965
  -------------------------------------------------------------------
                         required time                         11.965
                         arrival time                          -8.982
  -------------------------------------------------------------------
                         slack                                  2.983

Since the purpose of this analysis is to measure the output delay, it starts off with the clock edge, follows it towards the flip-flop, and then along the data path. That sums up to the overall delay. Note that the “Path Type” doesn’t say it’s a setup calculation (to avoid confusion?) even though it takes the following clock (at 20 ns) into consideration.

The calculation takes place at the slowest possible combination of process, voltage and temperature (recall that the input setup calculation took place with the fastest one). Following the clock path, it’s evidently very similar to the clock path of the hold analysis for input delay, which is quite expected, as both are based upon the slow model.

The data path simply continues the clock path until the physical output is stable, calculated at 8.982 ns.

This is compared with the time of the following clock at 20 ns, minus the output delay. Minus the possible jitter (0.035 ns in the case above). Data arrived at 8.982 ns, the moment that counts is at ~12 ns, so there’s almost 3 ns slack.

This demonstrates why set_output_delay -max is the setup time of the receiver: The output delay is reduced from the following clock’s time position, and that’s the goal to meet. That’s exactly the definition of setup time: How long before the following clock the data must be stable.

set_output_delay -min timing analysis (hold)

Slack (MET) :             0.791ns  (arrival time - required time)
  Source:                 test_out_reg/C
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_out
                            (output port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Min at Fast Process Corner
  Requirement:            0.000ns  (theclk rise@0.000ns - theclk rise@0.000ns)
  Data Path Delay:        1.665ns  (logic 1.384ns (83.159%)  route 0.280ns (16.841%))
  Logic Levels:           1  (OBUF=1)
  Output Delay:           -3.000ns
  Clock Path Skew:        -2.162ns (DCD - SCD - CPR)
    Destination Clock Delay (DCD):    0.000ns
    Source Clock Delay      (SCD):    2.162ns
    Clock Pessimism Removal (CPR):    -0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
    AE23                                              0.000     0.000 r  test_clk (IN)
                         net (fo=0)                   0.000     0.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.077     0.077 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           1.278     1.355    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.026     1.381 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           0.781     2.162    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_out_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X0Y1           FDRE (Prop_fdre_C_Q)         0.100     2.262 r  test_out_reg/Q
                         net (fo=1, routed)           0.280     2.542    test_out_OBUF
    AK21                 OBUF (Prop_obuf_I_O)         1.284     3.826 r  test_out_OBUF_inst/O
                         net (fo=0)                   0.000     3.826    test_out
    AK21                                                              r  test_out (OUT)
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)     0.000     0.000 r
                         clock pessimism              0.000     0.000
                         clock uncertainty            0.035     0.035
                         output delay                 3.000     3.035
  -------------------------------------------------------------------
                         required time                         -3.035
                         arrival time                           3.826
  -------------------------------------------------------------------
                         slack                                  0.791

This analysis is similar to the max output delay, only it’s calculated on the fastest possible combination of process, voltage and temperature, and against the same clock edge (and not the following one). So again, going from setup to hold, these are reversed. Once again, the clock path is very similar to the clock path of the setup analysis for input delay, which is quite expected, as both are based upon the fast model.

As before, the data path continues the clock path until the physical output is stable, calculated at 3.826 ns (note the difference with the slow path!).

This is compared with the time of the same clock at 0 ns, minus the output delay, minus the possible jitter (0.035 ns in the case above, not clear why it’s counted if it’s the same clock cycle, but anyhow). Recall that the min output delay was negative (-3 ns), which is why it appears as a positive number in the calculation.

Conclusion: Data was stable until 3.826 ns, and needs to be stable until 3.035. That’s fine, with a 0.791 ns slack.

This demonstrates why set_output_delay -min is minus the hold time of the receiver: Jitter aside, the given output delay with reversed sign is used as the time which the data path delay must exceed. In other words, the data must be stable for that long after the clock. This is the definition of hold time.

Meaning of set_input_delay and set_output_delay in SDC timing constraints

Introduction

Synopsys Design Constraints (SDC) has been adopted by Xilinx (in Vivado, as .xdc files) as well as Altera (in Quartus, as .sdc files) and other FPGA vendors as well. Despite the wide use of this format, there seems to be some confusion regarding the constraints for defining I/O timing.

This post is defines what they mean, and then shows the timing calculations made by Vivado and Quartus (in separate pages), demonstrating their meaning when implementing a very simple example design. So there’s no need to take my word for it, and this also gives a direction on how to check that your own constraints did what they were supposed to do.

There are several options to these constraints, but these are documented elsewhere. This post is about the basics.

And yes, it’s the same format with Xilinx and Altera. Compatibility. Unbelievable, but true.

What they mean

In short,

  • set_input_delay -clock … -max … : The maximal clock-to-output of the driving chip + board propagation delay
  • set_input_delay -clock … -min … : The minimal clock-to-output of the driving chip. If not given, choose zero (maybe a future revision of the driving chip will be manufactured with a really fast process)
  • set_output_delay -clock … -max … : The t_setup time of the receiving chip + board propagation delay
  • set_output_delay -clock … -min … : Minus the t_hold time of the receiving chip (e.g. set to -1 if the hold time is 1 ns).

Note that if neither -min or -max are given, it’s like two assignments, one with -min and one with -max. In other words: Poor constraining.

Always constraint both min and max

It may seem meaningless to use the min/max constraints. For example, using a catch-both single set_output_delay sets the setup time correctly, and the hold time to a negative value which is incorrect, but why bother? It allows the output port to toggle before the clock, but that couldn’t happen, could it?

Well, actually it can. For example, it’s quite common to let an FPGA PLL (or alike) generate the internal FPGA clock from the clock at some input pin (the “clock on the board”). This allows the PLL to align the clock on the FPGA’s internal clock network to the input clock, by time-shifting it slightly to compensate for the delay of the clock distribution network.

Actually, the implementation tools may feel free to shift the clock to slightly earlier than the clock input, in order to meet timing better: A slow path from logic to output may violate the maximal delay allowed from clock to output. Moving the clock earlier fixes this. But moving the internal clock to earlier than the clock on the board may switch other outputs that depend on the same clock to before the clock on the board toggles, leading to hold time violations on the receiver of these outputs. Nothing prevents this from happening, except a min output delay constraint.

Outline of example design

We’ll assume test_clk input clock, test_in input pin, and test_out output, with the following relationship:

   always @(posedge test_clk)
     begin
	test_samp <= test_in;
	test_out <= test_samp;
     end

No PLL is used to align the internal clock with the board’s test_clk, so there’s a significant clock delay.

And the following timing constraints applied in the SDC/XDC file:

create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]

As the tools’ timing calculations are rather long, they are on separate pages:

Quartus: Packing registers into I/O cells

Often I prefer to handle I/O timing simply by ensuring that all registers are pushed into the I/O cells. Where timing matters, that is.

It seems like I/O register packing isn’t the default in Quartus. Anyhow, here’s the lazy man’s recipe for this scenario.

First, disable timing checking on all I/Os. This will silence the unconstrained path warning during implementation, and in particular prevent the “TimeQuest Timing Analyzer” section in Quartus’ reports pane turning red:

So these two go to the SDC file:

set_false_path -from [get_ports]
set_false_path -to [get_ports]

And next, convince the fitter to push registers into the I/O block. In the QSF, add

set_instance_assignment -name FAST_OUTPUT_REGISTER ON -to *
set_instance_assignment -name FAST_INPUT_REGISTER ON -to *
set_instance_assignment -name FAST_OUTPUT_ENABLE_REGISTER ON -to *

So it’s somewhat aggressive to assign these assignments to absolutely everything, but it does the job. The fitter issues warnings for the I/O elements it fails to enforce these constraints on, which is actually a good thing.

To see how well it went, look in the “Resource Section” of the fitter report (possibly find it in Quartus’ reports pane) and look for “Input Registers” etc., whatever applies.

The difference is evident in timing reports of paths involving I/O cells. For example, compare this path which involves an I/O register:

+----------------------------------------------------------------------------------+
; Data Arrival Path                                                                ;
+---------+---------+----+------+--------+-----------------------+-----------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location              ; Element         ;
+---------+---------+----+------+--------+-----------------------+-----------------+
; 2.918   ; 2.918   ;    ;      ;        ;                       ; data path       ;
;   0.000 ;   0.000 ;    ;      ; 1      ; DDIOOUTCELL_X3_Y0_N32 ; rst             ;
;   0.465 ;   0.465 ; RR ; CELL ; 1      ; DDIOOUTCELL_X3_Y0_N32 ; rst|q           ;
;   0.465 ;   0.000 ; RR ; IC   ; 1      ; IOOBUF_X3_Y0_N30      ; RESETB~output|i ;
;   2.918 ;   2.453 ; RR ; CELL ; 1      ; IOOBUF_X3_Y0_N30      ; RESETB~output|o ;
;   2.918 ;   0.000 ; RR ; CELL ; 0      ; PIN_P3                ; RESETB          ;
+---------+---------+----+------+--------+-----------------------+-----------------+

Note the DDIOOUTCELL element, and the zero increment in the routing between the register and the IOOBUF.

For comparison, here’s a path for which an I/O register wasn’t applied (prevented by logic):

+--------------------------------------------------------------------------------+
; Data Arrival Path                                                              ;
+---------+---------+----+------+--------+-----------------+---------------------+
; Total   ; Incr    ; RF ; Type ; Fanout ; Location        ; Element             ;
+---------+---------+----+------+--------+-----------------+---------------------+
; 8.284   ; 8.284   ;    ;      ;        ;                 ; data path           ;
;   0.000 ;   0.000 ;    ;      ; 1      ; FF_X3_Y0_N17    ; Dir_flop_sig        ;
;   0.496 ;   0.496 ; RR ; CELL ; 8      ; FF_X3_Y0_N17    ; Dir_flop_sig|q      ;
;   2.153 ;   1.657 ; RR ; IC   ; 1      ; IOOBUF_X3_Y0_N9 ; DATA[7]~output|oe   ;
;   8.284 ;   6.131 ; RF ; CELL ; 1      ; IOOBUF_X3_Y0_N9 ; DATA[7]~output|o    ;
;   8.284 ;   0.000 ; FF ; CELL ; 1      ; PIN_T3          ; DATA[7]             ;
+---------+---------+----+------+--------+-----------------+---------------------+

Here we see how a general-purpose flip-flop generates the signal, leading to routing of 1.657 ns. The main problem is that this routing delay will be different each implementation, so if there’s a signal integrity issue with the board, the FPGA might be blamed for it, since different FPGA versions seem to fix the problem or make it reappear.

Live TV / DVB on Linux demystified

Introduction

This is a not-so-short tutorial which is intended to make the setup of a Live TV media center on Linux a bit easier, walking through the processing chain from the digital transmission signal to the picture on the screen. Quite naturally, things go from “general knowledge” to a bit more hands-on. The approach here is to understand what you’re doing, something one can avoid when things just work by themselves. If you’re reading this, odds are they didn’t.

I suggest starting with the command line tools, as they give a lot of low-level information as they run. Once a TV channel can be viewed with these, I further suggest using Tvheadend as the backend, as it’s reasonably complicated to work with, and leaves a lot of control to the frontend software.

I also suggest a frontend (Kodi) for everyday TV viewing, and a way to configure it.

All in all, getting this to work is often a rather tedious process. This isn’t all that bad if one learns a few things on the way.

The DVB frontend (“adapter”)

Often with a USB interface, the DVB frontend receives the digital signal and turns it into a stream of bytes. Inside, it typically consist of a tuner, a demodulator and a USB interface chip. Often there’s a demux as well, which is discussed further on.

  • The tuner (the analog part) nails down a piece of the frequency spectrum with the digital signal in it, and makes it accessible to the demodulator. The signal may arrive from a simple UHF antenna, a satellite dish or from a cable network. In principle, there is no difference: It’s an analog signal that carries a bitstream of several Mb/s, and the tuner’s job is to bring the signal down to a known lower frequency, where the demodulator expects it to be. The tuner cares only about the signal’s center frequency, and possibly its bandwidth.
  • The demodulator (the digital communication part) turns the analog signal from the tuner into a stream of digital bits. This is the most sophisticated part, which includes a significant portion of signal processing and also decoding of error-correcting codes (and, of course, correcting bit errors if such are found and are correctable). All this is hidden from the end user, so all the demodulator tells us is typically if it’s locked on the signal or not. There are quite a few things that need to get synchronized properly, but all we get is something like “works” or “doesn’t work”. The demodulator may also supply information about the signal strength, the S/N ratio and the PostBER, which is an estimation of the bit error rate obtained after fixing bit errors by virtue of the error correction code. This estimation is possible because all but a fraction of the bits are recovered correctly, so the demodulator knows what its input signal would look like without noise and distortions, and so it can also tell how much noise comes in. And the S/N ratio is calculated accordingly.
  • The USB interface chip (the computer hardware part) is the less interesting part, but it’s what the computer sees. The driver is often named after it, even though its the other devices that are important. Its main functionalities are: Relaying the output of the demodulator to the computer via a bulk USB endpoint, and supplying means to control and read status from the demodulator and tuner, which is almost always done with an I2C/SMBus over USB kind of bridge. The I2C bus interface may also be used to download firmware to these two devices.

In a Linux system, the DVB adapter is represented with device files in /dev/dvb/adapter0/ (the index goes up if there are several of them). The notable file is /dev/dvb/adapter0/dvr0, from which the data stream is read, possibly with plain file I/O. In other words, when the adapter is set up, it’s possible to record a proper video clip with just “cat”, as shown in this post. /dev/dvb/adapter?/{frontend?,mux?} are used to control the device.

There may be other device files as well, such as net0 (for controlling network packet functionality) or ca0 for Conditional Access.

For more insight, this set of slides may come handy.

What’s in the trunk?

Assuming that the DVB frontend is tuned and locked on a digital signal, there will be data in MPEG-TS format flowing out from /dev/dvb/adapter0/dvr0. Almost always, there are several TV/radio channels on a single digital transmission signal: The data stream is used to pass several types of packets, which may contain MPEG video or audio data, or other related data.

The packets that contain MPEG video or audio data are marked with an identifier, PID. In fact, watching a TV program consists of

  • Tuning the DVB adapter to a certain reception frequency, and lock its demodulator on the digital signal
  • Filter out all packets belonging to a certain PID, and pass them on to an MPEG video decoder
  • Same for another PID, and pass them on to an MPEG audio decoder
  • If there are subtitles, there’s another PID to filter out, and pass on to a subtitle rendering mechanism

So all in all, it’s a lot of packets multiplexed into a single stream of data, and it’s the receiver’s job to fish out those of interest. In order to make it easier, packets containing information that organizes the PIDs into services, i.e. TV and radio channels, are also transmitted on the same stream (in dedicated packets).

The end of this post shows the output of a scan with the dvbv5-scan command-line utility. It lists the information obtained in a specific digital stream in an organized manner. One thing that may be surprising about this list, is that a single service (i.e. TV channel) may contain more than a single audio PID. Which isn’t all that odd, as some TV channels may have alternative sound tracks, e.g. in different languages.

There’s also the dvbsnoop utility, which shows and dissects the packets of an MPEG-TS stream. Only for those interested in the really gory details.

By the way, in Tvheadend’s terminology, the raw stream of data that arrives from the demodulator is called a mux. This is a highly confusing misnomer, which probably came from idea that the packets in the stream are multiplexed. To the reset of the world, a “mux” is the machine that takes data from several sources and turns them into a stream. Which brings us to:

Demultiplexing

Assuming that the PIDs of the video and audio streams of the desired TV channel are known, there are two possibilities to filter them out:

  • Software demuxing: Tell the DVB adapter to send everything to the dvr0 device file, and fish out the matching packets with the media player (or some intermediate software). Actually, it simply means that the media player ignores all packets that don’t have the requested PIDs.
  • Hardware demuxing: Using the e.g. /dev/dvb/adapter0/demux0 device file to command the adapter to pass through only packets with certain PIDs to the /dev/dvb/adapter0/dvr0 output.

Software demuxing is the preferred choice for the typical domestic use, as it allows viewing more than one TV program at a time (assuming that both programs are on the same digital stream). Hardware demuxing is useful for viewing (or recording) TV with command-line utilities (see these two posts for examples of command line sessions).

Watching TV for real

Command-line utilities (see this, this, and this) can indeed be used to hack together some very basic kit for watching TV, and they are priceless for understanding why things went wrong with the fancier tools (hint: Somehow things always get wrong when video is involved).

But for everyday use, it’s best to let a TV streaming backend talk with the hardware. It allows fancy media center software as well as command line utilities to easily access TV channels. As of March 2017, I found Tvheadend + Kodi to be the best combination. In Kodi, I went for PVR IPTV Simple Client rather than Tvheadend’s own front end, as I’ll explain below.

So let’s first understand this backend / frontend business, and I’ll take Tvheadend as an example.

Tvheadend (formerly HTS TVheadend) is a TCP/IP server, which takes control of the DVB adapter(s). And it listens to two TCP/IP ports:

  • HTTP Port 9981: Plain browsers can connect to http://localhost:9981/ for administration and configuration + machine readable playlists and Electronic Program Guide on certain URLs (see below). But most important: Availability of all TV channels as MPEG-TS streams, in a protocol easily accessible by a lot of software, with a plain HTTP connection.
  • HTSP Port 9982: Home Tv Streaming Protocol (invented for Tvheadend?), seems to be used only be a handful of Linux clients. It’s a one-stop shop for all TV related information, but my own experience was a bit lame. So I’ll leave this aside for now.

The installation of Tvheadend hence involves making it work with the DVB adapter (which is usually simple if everything works smoothly with the command line utilities) as well as setting up the server. It’s may not be all that easy (there are worse), but it’s worth the effort (maybe my own messy jots will help), because:

  • It allows simultaneous view of several TV channels, if they’re on the same digital stream (i.e. muxed on the same frequency channel). Watch one channel waiting for the commercials to end on another…
  • View TV from any computer on the (wireless?) LAN.
  • Virtually any media playing software supports its output format, MPEG-TS. Including stuff running on Windows.
  • There are a lot of other features (Electronic Program Guide via the web interface with a browser, recording), but I’m not sure if these belong to the backend. But they may be useful to some.

So let’s take a look on how the Tvheadend conveys the TV channels to its clients. For this, I’ll assume that Tvheadend is properly installed, has been set up (through the web interface) to tune on some TV channels, and that it allows access without user/password from localhost (it’s a convenient setting, and it’s safe at least for 127.0.0.1/32). And that all access is done from the localhost (even though it can be any computer with HTTP access and due permissions. If so, replace “localhost” in the examples below with the IP or domain name of the server).

But first…

A word on IPTV / HLS (not really important for DVB, actually)

We’ll make a small detour to IPTV or HLS (HTTP Live Streaming), because Tvheadend does something similar. IPTV is the commonly used name for TV channels broadcast over the internet, whether it’s live or video-on-demand like kind of broadcasts.

An IPTV/HLS stream is essentially an MPEG-TS stream, similar to the DVB stream on air or cable. In order to make its broadcast over the web easier, it’s cut into chunks, each a few seconds long. The cuts are made on packet boundaries, so each chunk is a legal MPEG-TS segment by itself. A plain concatenation of several subsequent chunks (with e.g. “cat”) makes a perfectly playable MPEG-TS clip. Or stream.

Now to the IPTV client: To start off, the IPTV client is given an initial playlist (as a file or a URL to download this playlist from). That playlist is an M3U file, with one or several URLs, usually a TV channel for each. When the client accesses one of these URLs to start showing a channel, it often receives another playlist, which redirects it to other URLs, which in turn might redirect it further, and so on. These playlists are often set up to allow the client to choose different paths, depending on desired bit rate, display resolution, encoding format etc.

Eventually, possibly after a few redirection hops, the client ends with receiving a playlist containing a list of chunks, so it has the information on where to fetch chunks of MPEG-TS segments from. It starts fetching these chunks, concatenates then, and plays the video stream.

The complicated part of the HLS protocol is the traveling around playlists until the list of chunks is found. Once there, it’s just a matter of downloading those chunks, concatenating, and treating them as a DVB stream.

Tvheadend’s IPTV-like interface

So M3U playlists is the name of the game. Tvheadend offers the TV and radio channels it exposes as an M3U playlist, available at http://localhost:9981/playlist . In my case (Israeli DVB-T), it starts like this:

#EXTM3U
#EXTINF:-1 tvg-id="38e914f04571f2a3f5c915872ba6e794",88FM

http://localhost:9981/stream/channelid/1880418616?ticket=B0E6E9AB06F41C13C0AEC87B7A88966BCBCCE8F4&profile=pass

#EXTINF:-1 tvg-id="219e62923848dac382ed7fcd35c4ed9e",Aleph

http://localhost:9981/stream/channelid/308452897?ticket=88F2FD731008F28454AB8FF7F75BF896FA1F9C7F&profile=pass

#EXTINF:-1 tvg-id="fd874e5286a13d161eb1fa011fb42731",Bet

http://localhost:9981/stream/channelid/1380878333?ticket=3BF6D86889B5DB3DFCAF2EF25D07894B653C3700&profile=pass

#EXTINF:-1 tvg-id="aae608ee725cad880781301f68592dbc",Ch 1

http://localhost:9981/stream/channelid/1846077098?ticket=116EEC70D4201D82BF2A0F1E9AB7387EC9E12D30&profile=pass

#EXTINF:-1 tvg-id="41d97066c9c97e348f83269a6b18e8e6",Ch 10

http://localhost:9981/stream/channelid/1718671681?ticket=757FAE1BA26561DAEDB659E707627366A5071B17&profile=pass

#EXTINF:-1 tvg-id="fc2a79daaca0afd9a39123d4d0305a1f",Ch 2

http://localhost:9981/stream/channelid/1517890300?ticket=BE2E496685F892A1036B3C982888D0778784BDB4&profile=pass

[ ... etc ... ]

After the #EXTM3U header, there are pair of lines for each channel: The first line contains information about the channel (in particular the display name) and the second is the URL for accessing the channel. Unlike HTS/IPTV, this isn’t a go-find-another-playlist, but it directs immediately to where the video stream can be obtained.

The “tvg-id” tag is not common in playlist files in general, and it pairs the channel with its appearance in the EPG (more about that later). If you don’t have it, you probably have an old version of Tvheadend, which doesn’t support the EPG trickery I’m going to show below.

As the URLs in the playlist are “for real”, a plain wget command can be used to record any of these channels. For example, recording from Channel 10:

$ wget -O mytvshow.ts 'http://localhost:9981/stream/channelid/1718671681?ticket=757FAE1BA26561DAEDB659E707627366A5071B17&profile=pass'
--2017-03-13 11:25:59--  http://localhost:9981/stream/channelid/1718671681?ticket=757FAE1BA26561DAEDB659E707627366A5071B17&profile=pass
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:9981... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [video/mp2t]
Saving to: ‘mytvshow.ts’

mytvshow.ts           [           <=>      ]   3.10M   334KB/s

This will run in principle forever until stopped with CTRL-C. The mytvshow.ts file can be played with VLC, mplayer, ffplay or any other reasonable media player.

These URLs to the channels don’t change once Tvheadend has been set up. It’s therefore possible to download the playlist once, edit away unwanted channels, reorder the list (noted the radio channels at the beginning of the playlist above?), possibly combine with “real” IPTV channels, and feed a media center player with the edited playlist file.

It’s also possible to give these URLs directly to VLC and other media players. Viewing multiple channels at once is as simple as opening several instances of VLC.

One word about what Tvheadend does behind the scenes. In response to the wget command above, the following went to /var/log/syslog:

Mar 13 11:25:59 tv tvheadend[6410]: mpegts: 538MHz in Idan Plus T - tuning on Realtek RTL2832 (DVB-T) : DVB-T #0
Mar 13 11:25:59 tv tvheadend[6410]: subscription: 0018: "HTTP" subscribing on channel "Ch 10", weight: 100, adapter: "Realtek RTL2832 (DVB-T) : DVB-T #0", network: "Idan Plus T", mux: "538MHz", provider: "Idan +", service: "Ch 10", profile="pass", hostname="127.0.0.1", client="Wget/1.17.1 (linux-gnu)"

Note that the HTTP connection resulted in a “subscription” to a certain channel within Tvheadend. This reflects the way Tvheadend mediates its resources, a single DVB adapter in this case, to fulfill requirements of subscribers requesting services.

Consequently, stopping the “recording” (pressing CTRL-C) resulted in

Mar 13 11:26:11 tv tvheadend[6410]: subscription: 0018: "HTTP" unsubscribing from "Ch 10", hostname="127.0.0.1", client="Wget/1.17.1 (linux-gnu)"

Needless to say, something similar happens when a media player opens a connection for streaming live TV.

EPG

A neat feature of DVB is that data for an Electronic Program Guide (EPG) is often embedded in the digital stream, so the name of the current program, along with a short description, is available when zapping to a new TV channel. As well as a TV guide to past and future programs, directly on the TV, shown neatly by the media center software.

There is probably no need to configure anything in Tvheadend to make this work. All those EPG grabbers available are tools for transferring information into Tvheadend, in its absence from the digital stream itself. In particular, if there’s satisfactory information in the “Electronic Program Guide” tab in Tvheadend’s web interface (http://localhost:9981/ with a browser), nothing needs to be fixed.

The common format for exchanging EPG information in Linux is XMLTV, which as its name implies, is in XML format. Tvheadend exports it at http://localhost:9981/xmltv or http://localhost:9981/xmltv/channels (accessing the former will cause an HTTP 302 redirection to the latter).

As of March 2017, this doesn’t work on Tvheadend versions available on the “stable” apt repositories. If attempting to access the URL for XMLTV from a browser results in “1 Unknown Code” appearing, an upgrade is required. Or no EPG will be available with the setup I suggest below.

An XMLTV file typically looks something like this:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE tv SYSTEM "xmltv.dtd">
<tv generator-info-name="TVHeadend-4.1-2405~geb495a0~xenial" source-info-name="tvh-Tvheadend">
<channel id="67a72084ee9a5ddb2fcd89129887bf78">
<display-name>Ch 99</display-name>
</channel>
<channel id="fa723385817605edc2b138d96c259b67">
<display-name>Ch 33</display-name>
</channel>
[ ... ]
<programme start="20170313113000 +0200" stop="20170313123000 +0200" channel="67a72084ee9a5ddb2fcd89129887bf78">
  <title lang="heb">ועדה ש.ח כת&apos; עב&apos; כת&apos; ער&apos;</title>
  <desc lang="heb">ועדה ש.ח כת&apos; עב&apos; כת&apos; ער&apos;
ועדה מיוחדת לפניות הציבור:
פניות ציבור בנושא התנהלות חברת החשמל בגביית תשלומים ומדיניות ניתוקי חשמל לצרכנים
</desc>
</programme>
<programme start="20170313121000 +0200" stop="20170313124500 +0200" channel="41d97066c9c97e348f83269a6b18e8e6">
  <title lang="heb">ראש בראש כת&apos; עב&apos; </title>
  <sub-title lang="heb">פרק 345</sub-title>
  <desc lang="heb">פרק 345
העיתונאי חגי סגל מארח ומתווכח. כת&apos; עב&apos;
</desc>
</programme>
<programme start="20170313124500 +0200" stop="20170313132000 +0200" channel="41d97066c9c97e348f83269a6b18e8e6">
  <title lang="heb">מעונן חלקית כת&apos; עב&apos; </title>
  <sub-title lang="heb">פרק 286</sub-title>
  <desc lang="heb">פרק 286
תחזית פוליטית: מבט אל השבוע הפוליטי והפרלמנטרי. מגישה: הדס לוי סצמסקי כת&apos; עב&apos;
</desc>
[ ... ]
</tv>

Note that the long hex blob marked red above matches the tvg-id entry of Channel 10 in the playlist given above. This allows pairing between an MPEG-TS stream and its info in the XMLTV file, and hence displaying the current TV program info for its respective channel.

Using Kodi as the front end

Kodi is a convenient front end for viewing TV on a media center computer, living room style. I suggest using the PVR IPTV Simple Client with a local file playlist, in particular because of the simplicity of this solution. And that it works so well.

The setup is fairly straightforward. First, install the plugin:

$ sudo apt-get install kodi-pvr-iptvsimple

and then, after having Kodi up and running, enable and set up the IPTV Simple Client as follows:

  • Change setting level to Advanced
  • System > Settings > Enable TV
  • Enable and Configure PVR IPTV Simple Client (System > Settings > Add-ons > My add-ons > PVR Clients > PVR IPTV Simple Client). Set the playlist to local file, and pick one edited (as suggested above).
  • Moving on to the EPG Settings tab, set Location to Remote Path, and XMLTV URL to http://localhost:9981/xmltv. As mentioned above, this requires a version of Tvheadend that supports XMLTV export. Check it manually with a browser.

The EPG interface isn’t necessary to watch TV properly, but makes Kodi display what’s on each channel in a neat way. As far as I know, the only alternative way to have EPG working with Kodi and Tvheadend is the Tvheadend Kodi plugin, which gave me errors all the time with the 4.09 version of Tvheadend.

Summary

Use Kodi if you want it to look like a set-top-box, or vlc, ffplay or mplayer for a more computerish experience. Tvheadend gives a simple and robust interface to all of these, leaving the gory details to be forgotten. Once you’ve been through the setup, of course.

If Tvheadend doesn’t play ball, go for the command-line utilities.

And if all of this takes forever to complete, remember: TV is a waste of time either way.