A few notes on where to find USB related kernel files on a Linux system (kernel 3.12.20 in my case)
$ lsusb
[ ... ]
Bus 001 Device 059: ID 046d:c52b Logitech, Inc.
Now find the position in the tree. It should be device 59 under bus number 1:
$ lsusb -t
[ ... ]
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/6p, 480M
|__ Port 4: Dev 4, If 0, Class=hub, Driver=hub/4p, 480M
|__ Port 1: Dev 59, If 0, Class=HID, Driver=usbhid, 12M
|__ Port 1: Dev 59, If 1, Class=HID, Driver=usbhid, 12M
|__ Port 1: Dev 59, If 2, Class=HID, Driver=usbhid, 12M
|__ Port 3: Dev 98, If 0, Class=vend., Driver=pl2303, 12M
|__ Port 6: Dev 94, If 0, Class=vend., Driver=rt2800usb, 480M
So it’s bus 1, hub on port 4 and then port 1. Verify by checking the IDs (the paths can be much shorter, see below):
$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/idVendor
046d
$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/idProduct
c52b
or look at the individual interfaces:
$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/1-4.1\:1.2/bInterfaceClass
03
or get everything in one go, with the “uevent” file:
$ cat /sys/bus/usb/devices/usb1/1-4/1-4.1/uevent
MAJOR=189
MINOR=56
DEVNAME=bus/usb/001/059
DEVTYPE=usb_device
DRIVER=usb
PRODUCT=46d/c52b/1209
TYPE=0/0/0
BUSNUM=001
DEVNUM=059
Even though “uevent” was originally intended for generating an udev event by writing to it, reading from it provides the variables supplied to the udev mechanism. The DRIVER entry, if present, contains the driver currently assigned to the device (or interface), and is absent if no such driver is assigned (e.g. after an rmmod of the relevant module). It will usually not contain anything interesting except for when looking at directories of interfaces, because all other parts of the hierarchy are USB infrastructure, driven by drivers for such.
The device file accessed for raw userspace I/O with a USB device (with e.g libusb) is in /dev/usb/ followed by the bus number and address. For example, the Logitech device mentioned above is at bus 1, address 59 (and note DEVNAME from uevent file), hence
$ ls -l /dev/bus/usb/001/059
crw-rw-r-- 1 root root 189, 58 2017-05-17 09:57 /dev/bus/usb/001/059
Note the permissions and major/minors. The major is 189 (usb_devices on my system, according to /proc/devices). The minor is the ((bus_number-1) * 128) + address – 1.
The permissions and ownership are those in effect for who’s allowed to access this device. This is the place to check if udev rules that allow wider access to a device have done their job.
/sys/bus/usb/devices/
This was mentioned briefly above, and now let’s do the deep dive. The sysfs structure for USB devices is rather tangled, because it has many references: Through the host controller it’s connected (typically as a PCI/PCIe device on a PC), as the device itself, and as the interfaces it provides.
It helps to note that those numeric-dash-dot-colon directory names actually contain all information about the position in the USB bus hierarchy, and all of these are present directly in /sys/bus/usb/devices, as a symbolic link.
Also in /sys/bus/usb/devices, there are usbN directories, each representing a USB root hub, with N being the USB bus number. One can travel down the USB bus hierarchy starting from the usbN directories, and find the directories those symlinked directories, in a directory hierarchy that represents the bus hierarchy.
So let’s look, for example, at a directory name 1-5.1.4:1.3.
- The “1-” part means bus number one.
- The “5.1.4″ part describes the path through hubs until the device is reached: port number 5 of the root hub, port 1 of the hub connected to it, and port 4 of the hub connected to that one. Without any physical hubs, this part is just one digit, so one gets those short “1-4″ names.
Note that the chain of ports is delimited by dots. It seems like there used to be dashes a long time ago, so it would read “5-1-4″ instead. But that’s probably ancient history.
- Then we have the “:1.3″ part, which means interface number 3 on the device running in configuration number 1.
This specific directory can be found in /sys/bus/usb/devices/usb1/1-5/1-5.1/1-5.1.4/1-5.1.4:1.3/, where it appears to be a plain directory, or as /sys/bus/usb/devices/1-5.1.4:1.3, where it appears to be a symbolic link to ../../../devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.1/1-5.1.4/1-5.1.4:1.3/. But the symbolic link actually points at the former, because /sys/devices/pci0000:00/0000:00:14.0/usb1/ and /sys/bus/usb/devices/usb1/ is exactly the same. The bonus with having the symbolic link pointing at the PCI device is that we can tell which PCI/PCIe device it belongs to.
Part of the reason for this mess is that the sysfs directory tree is a representation of the references between device structures inside the Linux kernel. Since these structures point at each other in every possible direction, so does the directory structure, sometimes with symbolic links, and sometimes with identical entries.
Messy or not, this directory structure allows traveling down the USB bus tree quite easily. For example, starting from /sys/bus/usb/devices/usb1/, one can travel down all the way to 1-5.1.4:1.3, each directory providing product and vendor IDs in both numerical and string format. Except for the final leaf (with the name including a colon-suffix, e.g. :1.3) which represents an interface, so it carries different information (about endpoints, for example).
The numbers in the directories in sysfs relate to the physical topology, and should not be confused with the bus address that is assigned to each device. The only thing they have in common is the bus number, and I’m not sure that can be trusted either. But in reality, that initial “1″ and the “usb1″ part in the path actually represent the bus number of all devices in that hierarchy. Recall that all devices that are connected to a USB root port have the same bus number, even if there are hubs inbetween (unlike PCI/PCIe and switches).
Ah, and once again: “usb1″ means USB bus 1. If you were temped to interpret this as a USB protocol level, well, no.
To obtain the enumerated addresses (those that are used to talk with the device, and appear with a plain lsusb), read the “uevent” file, which even supplies the path in /dev. Or read “busnum” and “devnum” files in each directory. Now one can ask if “busnum” is redundant, since it’s supposed to be known from the directory path itself. But one could likewise ask what “devpath” is doing there, as it consists the part that comes after the dash in the directory name. Go figure.
/sys/bus/usb/drivers/
But hey, it’s not over yet. The USB devices are also divided according to their drivers. This happens in /sys/bus/usb/drivers, which is a nice place to start if you’re looking for a device doing a specific task, hence easily found by its driver.
Back to the example above, /sys/bus/usb/drivers/usb-storage has a symbolic link named 1-5.1.4:1.3, pointing at ../../../../devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.1/1-5.1.4/1-5.1.4:1.3/, which is exactly the same leaf directory as before. Not surprisingly, the driver is attached to an interface, and not to a device. So this provides the entire path to the functional part, going from the PCI entry to the USB bus, and all the way down. If we want the bus address, fetch it from the leaf directory’s parent directory, 1-5.1.4 in this case.
The dedicated usb-storage directory also has the “bind” and “unbind” files, which allow to detach and re-attach the driver to the USB device. This may be equivalent to unplugging the device and plugging it back, but not necessarily — it will result in a certain level of re-initialization, but not as full as detaching the device completely (or unbinding the USB controller from the PCI bus).
Note that a device can have multiple interfaces, which are possibly handled by different drivers. For example, a camera can function as webcam for showing a live picture, but also as a mass storage device for exposing the SD card. It’s still one USB device, with one Vendor / Product ID, and with a single pool of endpoints. So a device may appear under several directories of /sys/bus/usb/drivers/.
Try lsusb -t and lsusb -vv. And now also appreciate what this utility does…
Introduction
This post outlines some technical details on accessing an Altera ECPQ flash from a Nios II processor for read, write and erase. A non-OS settings (“bare metal”) setting is assumed.
And as a bonus (at the bottom of this post), how to program the flash based upon a SOF file, both with JTAG and by writing directly.
Remote Update is discussed in this post.
Hardware setup
In the Qsys project, there should be an instance of the Legacy EPCS/EPCQx1 Flash Controller, configured with the default parameters (not that there is much to configure). The peripheral’s epcs_control_port should be connected to the Nios II’s data master Avalon port (no point connecting it to the instruction master too).
In this example, we’ll assume that the name of Flash Controller in Qsys is epcs_flash_controller_0.
The interrupt signal isn’t used in the software setting given below, but as the connection to the Nios processor, as well as the interrupt number assignment is automatic, let it be.
Clock and reset — like the other peripherals.
The external conduit is connected as follows to an ECPQ flash, for a x1 access:
- Flash pin DATA0 to epcs_flash_controller_0_sdo (FPGA pin ASDO)
- Flash pin DCLK to epcs_flash_controller_0_dclk (FPGA pin DCLK)
- Flash pin nCS to epcs_flash_controller_0_sce (FPGA pin NCSO)
- Flash pin DATA1 to epcs_flash_controller_0_data (FPGA pin DATA0)
The FPGA pins above relate to dual-use of the configuration, which allows the FPGA to configure in Active Serial (AS) x 1 mode. Once the configuration is done, these pins become general-purpose I/O (when so required by assignments), which allows regular access to the flash device.
Note that the flash pin DATA1 is connected to the FPGA pin DATA0 — this is not a mistake, but the correct wiring for AS x 1 interface.
It’s of course possible to connect the flash to regular I/O pins, but then the FPGA won’t be able to configure from the flash.
Software
Altera’s BSP includes drivers for flash operations with multiple layers of abstraction. This abstraction is not always necessary, and makes it somewhat difficult to figure out what’s going on (in particular when things go wrong). In particular, the higher-level drivers erase flash sectors automatically before writing, which can render some counterintuitive behavior, for example if multiple write requests are made on the same sector.
I therefore prefer working with the lowest-level drivers, which merely translate the flash commands into SPI communication. It leaves the user with the responsibility to erase sectors before writing to them.
The rule is simple: The flash is divided into sectors of 64 kB each. An erase operation is performed on such 64 kB sector, leaving all its bytes in all-1′s (all bytes are 0xff).
Writing can then be done to arbitrary addresses, but effectively the data in the flash is the written data ANDed with the previous content of the memory cells. Which means a plain write, if the region has been previously erased. It’s commonly believed that it’s unhealthy for the flash to write to a byte cell twice without an erase in the middle.
This is a simple program that runs on the Nios II processor, which demonstrates read, write and erase.
#include <system.h>
#include <alt_types.h>
#include <io.h>
#include "sys/alt_stdio.h"
#include "epcs_commands.h"
static void hexprint(alt_u8 *buf, int num) {
int i;
const char hexes[] = "0123456789abcdef";
for (i = 0; i < num; i++) {
alt_putchar(hexes[(buf[i] >> 4) & 0xf]);
alt_putchar(hexes[buf[i] & 0xf]);
if ((i & 0xf) == 0xf)
alt_putchar(10); // "\n"
else
alt_putchar(32); // " "
}
alt_putchar(10); // "\n"
}
int main()
{
alt_u32 register_base = EPCS_FLASH_CONTROLLER_0_BASE + EPCS_FLASH_CONTROLLER_0_REGISTER_OFFSET;
alt_u32 silicon_id;
alt_u8 buf[256];
alt_u32 junk = 0x12345678;
const alt_u32 flash_address = 0x100000;
silicon_id = epcs_read_device_id(register_base);
alt_printf("ID = %x\n", silicon_id);
// epcs_read_buffer always returns the length of the buffer, so no
// point checking its return value.
alt_printf("Before doing anything:\n");
epcs_read_buffer(register_base, flash_address, buf, sizeof(buf), 0);
hexprint(buf, 16);
// epcs_sector_erase erases the 64 kiB sector that contains the address
// given as its second argument, and waits for the erasure to complete
// by polling the status register and waiting for the WIP (write in progress)
// bit to clear.
epcs_sector_erase(register_base, flash_address, 0);
alt_printf("After erasing\n");
epcs_read_buffer(register_base, flash_address, buf, sizeof(buf), 0);
hexprint(buf, 16);
// epcs_write_buffer must be used on a region previously erased. The
// command waits for the operation to complete by polling the status
// register and waiting for the WIP (write in progress) bit to clear.
epcs_write_buffer(register_base, flash_address, (void *) &junk, sizeof(junk), 0);
alt_printf("After writing\n");
epcs_read_buffer(register_base, flash_address, buf, sizeof(buf), 0);
hexprint(buf, 16);
/* Event loop never exits. */
while (1);
return 0;
}
The program reads 256 bytes each time, even though only 16 bytes are displayed. Any byte count is allowed in read and write. Needless to say, flash_address can be changed to any address in the device’s range. The choice of 0x100000 kept it off the configuration bitstream for the relevant FPGA.
This is the output of the program above running against an EPCQ16:
ID = 20ba15
Before doing anything:
78 56 34 12 ff ff ff ff ff ff ff ff ff ff ff ff
After erasing
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
After writing
78 56 34 12 ff ff ff ff ff ff ff ff ff ff ff ff
The data in the “Before doing anything” part can be anything that was left in the flash when the program ran. In the case above, it’s the results of the previous run of the same program.
As a side note, all EPCQ flashes also support erasing subsectors, each of 4 kiB size (hence 16 subsectors per sectors). Altera’s low-level drivers don’t support subsector erase, but it’s quite easy to expand the code to do so.
Programming the flash with a SOF file
As promised, here’s the outline of how to program the EPCQ flash with a bitstream configuration file. Not as fancy as the topic above, but nevertheless useful. The flash needs to be connected as follows:
- Flash pin DATA0 to FPGA pin ASDO
- Flash pin DCLK to FPGA pin DCLK
- Flash pin nCS to FPGA pin NCSO
- Flash pin DATA1 to FPGA pin DATA0 (once again, this is not a mistake. DATA1 to DATA0 indeed)
First thing first: Generate a JIC file. Command-line style, e.g.:
quartus_cpf -c -d EPCQ16 -s EP4CE15 projname.sof projname.jic
In the example above, the EPCQ16 argument is the flash device, and the EP4CE15 is the FPGA that will be used to program the flash, which is most likely the same FPGA the SOF targets.
Or do it with GUI:
- In Quartus, pick File > Convert Programming File…
- Choose jic output file format, and set the output file name.
- Set the configuration device to e.g. EPCQ16, Active Serial (not x4).
- Pick the SOF Data row, Page_0, click Add File… and pick SOF file.
- Pick the Flash Loader and click Add Device…, and choose e.g. Cyclone IV E, and then the same device as listed for the SOF file.
- If you want to write to the flash with your own utility, check “Create Config data RPD”
- Click Generate. A window saying the JIC file has been generated successfully should appear.
- Click Close to close this tool.
Programming the flash with JTAG:
- Open the regular JTAG programmer in Quartus (not the one in Eclipse). The one used to configure the FPGA via JTAG with a bitstream, that is.
- Click Add File… and select the JIC file created above.
- The FPGA with its flash attached should appear in the diagram part of the window.
- Select the Program/Configure checkbox on the flash’ (e.g. EPCQ16) row
- Click Start.
- This should take some 10 seconds or so (for EP4CE15′s bitfile), and end successfully.
- The flash is now programmed.
Note that there’s an “Erase” checkbox on the flash’ row — there is no need to enable it along with Program/Configure, and neither is it necessary. The Programmer gets the hint, and erases the flash before programming it.
Programming the flash with NIOS software (or similar)
Note that I have another post focusing on remote update.
To program the flash with your own utility, make sure that you’ve checked “Create Config data RPD” when generating the JIC. Then, using the flash API mentioned above, copy the RPD file into the flash from address 0 to make it load when the FPGA powers up, or to a higher address for using the bitstream with a Remote Update core (allowing configuration from higher addresses).
And note the following, which relates to my experience with using the EPCQ16 flash for AS configuring an Cyclone IV E FPGA, and running Quartus Prime Version 15.1.0 Build 185 (YMMV):
- Bit reversal is mandatory if epcs_write_buffer() is used for writing to the flash (or any other Nios API, I suppose). That means that for each byte in the RPD file, move bit 7 to bit 0, bit 6 to bit 1 etc. There are small hints of bit reversal spread out in the docs, for example, in the “Read Bytes Operation” section of the Quad-Serial Configuration (EPCQ) Devices Datasheet.
- All my attempts to generate RBF or RPD files in other ways, including using the command line tool (quartus_cpf) to create an RBF from the SOF or an RPD from a POF failed. That is, I got RBF and RPD files, but they slightly different from the file that eventually worked. In particular, the RBF file obtained with
quartus_cpf -c project.sof project.rbf
was almost identical to the RPD file that worked properly, with a few bytes different in the 0x20-0x4f positions of the files. And that difference probably made the FPGA refuse to configure from it. Go figure.
- If you’re really into generating the flash image with command line tools, generate a COF file (containing the configuration parameters) with the GUI, and use it with something like
quartus_cpf -c project.cof
The trick about this COF is that it should generate a proper JIC file, but have the <auto_create_rpd> part set to “1″.
And finally, just a few sources I found (slightly unrelated):
- Srunner is a command line utility for programming a EPCS flash. Since source code is given, it can give some insights, as well as its documentation.
- The format of POF files is outlined in fmt_pof.pdf.
Using an EPCQ16A device instead
The EPCQ16 device is obsolete, and replaced with EPCQ16A. Unfortunately, the AN822 Migration Guide supplies confusing and rather discouraging information, but in the end of the day, it’s a drop-in replacement for all purposes mentioned above. Except that it replies with an ID = ef4015 instead of the 20ba15 shown above. Which is fine, because it’s only the lower 8 bits that Altera / Intel stand behind. The other 16 bits are considered junk data during “dummy clock cycles” according to the datasheet (even though they are taken seriously somewhere in Altera’s old drivers, don’t ask me where I saw it).
The Migration Guide lists different Altera IP cores related to the flash, and points at which are compatible and which are not. The Legacy Flash EPCS/EPCQx1 flash controller isn’t mentioned at all in this list, but as this controller is merely and SPI controller, it’s the opcode compatibility that matters. According to the Migration Guide, the relevant opcodes remain the same, which is probably all that matters: The 4BYTEADDREN/4BYTEADDEX commands that are gone in EPCQA are never used (the flash writing application never requests 4-byte write), and the 0x0b / 0xeb (fast read commands) aren’t even listed in epcs_commands.h.
Bottom line: No problem using the “A” version in the usage scenarios shown above.
It worked all so nicely on my Fedora 12 machine, and then on Ubuntu 14.04.1 it failed colossally:
$ make
gcc -Wall -O3 -g -lusb-1.0 -c -o bulkread.o bulkread.c
gcc -Wall -O3 -g -lusb-1.0 -c -o usberrors.o usberrors.c
gcc -Wall -O3 -g -lusb-1.0 bulkread.o usberrors.o -o bulkread
bulkread.o: In function `main':
bulkread.c:39: undefined reference to `libusb_init'
bulkread.c:46: undefined reference to `libusb_set_debug'
bulkread.c:48: undefined reference to `libusb_open_device_with_vid_pid'
[ ... ]
And it went on and on. Note that there was no complaint about not finding the library, and yet it failed to find the symbols.
The problem was the position of the -l flag. It turns out that Ubuntu silently adds an –as-needed flag to the linker, which effectively means that the -l flag must appear after the object file that needs the symbols, or it will be effectively ignored.
So the correct way is:
$ make
gcc -Wall -O3 -g -c -o bulkread.o bulkread.c
gcc -Wall -O3 -g -c -o usberrors.o usberrors.c
gcc -Wall -O3 -g bulkread.o usberrors.o -o bulkread -lusb-1.0
It’s all about the flag’s position…
Emacs’ (and hence XEmacs’) VHDL mode has an annoying thing about hopping in and “help me” with composing code. Type “if” and it tells me I need to add an expression. Thanks. I wouldn’t have figured it out myself.
So here’s how to disable this annoyance:
Add in~/.xemacs/custom.el, to the custom-set-variables clause
'(vhdl-electric-mode nil)
'(vhdl-stutter-mode nil)
or turn off the respective options inside XEmacs, under VHDL > Options > Mode, and then VHDL > Options > Save Options
And enjoy the bliss of an editor doing what it’s supposed to do.
OK, what’s this?
This page is the example part of another post, which explains the meaning of set_input_delay and set_output_delay in SDC timing constraints.
TimeQuest (Quartus’ timing analyzer) performs a four-corner check (max/min temperature, max/min voltage) and picks the worst slack. In the examples below, the worst case of these four corners is shown. It’s not exactly clear why a certain delay model becomes the worst case all the times.
Another post of mine discusses the generation of timing reports as shown below.
As mentioned on the other post, the relevant timing constraints were:
create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]
set_input_delay -max timing analysis (setup)
Delay Model:
Slow 1100mV 0C Model
+------------------------------------------------------------------------------------------------------+
; Summary of Paths ;
+--------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node ; To Node ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+--------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; 12.341 ; test_in ; test_samp ; theclk ; theclk ; 20.000 ; 3.940 ; 7.499 ;
+--------+-----------+-----------+--------------+-------------+--------------+------------+------------+
Path #1: Setup slack is 12.341
===============================================================================
+--------------------------------+
; Path Summary ;
+--------------------+-----------+
; Property ; Value ;
+--------------------+-----------+
; From Node ; test_in ;
; To Node ; test_samp ;
; Launch Clock ; theclk ;
; Latch Clock ; theclk ;
; Data Arrival Time ; 11.499 ;
; Data Required Time ; 23.840 ;
; Slack ; 12.341 ;
+--------------------+-----------+
+---------------------------------------------------------------------------------------+
; Statistics ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Property ; Value ; Count ; Total Delay ; % of Total ; Min ; Max ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Setup Relationship ; 20.000 ; ; ; ; ; ;
; Clock Skew ; 3.940 ; ; ; ; ; ;
; Data Delay ; 7.499 ; ; ; ; ; ;
; Number of Logic Levels ; ; 1 ; ; ; ; ;
; Physical Delays ; ; ; ; ; ; ;
; Arrival Path ; ; ; ; ; ; ;
; Clock ; ; ; ; ; ; ;
; Clock Network (Lumped) ; ; 1 ; 0.000 ; ; 0.000 ; 0.000 ;
; Data ; ; ; ; ; ; ;
; IC ; ; 2 ; 2.447 ; 33 ; 0.000 ; 2.447 ;
; Cell ; ; 2 ; 5.052 ; 67 ; 0.652 ; 4.400 ;
; Required Path ; ; ; ; ; ; ;
; Clock ; ; ; ; ; ; ;
; Clock Network (Lumped) ; ; 1 ; 3.940 ; 100 ; 3.940 ; 3.940 ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages
+-----------------------------------------------------------------------------------+
; Data Arrival Path ;
+----------+---------+----+------+--------+-------------------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+----------+---------+----+------+--------+-------------------+---------------------+
; 0.000 ; 0.000 ; ; ; ; ; launch edge time ;
; 0.000 ; 0.000 ; ; ; ; ; clock path ;
; 0.000 ; 0.000 ; R ; ; ; ; clock network delay ;
; 4.000 ; 4.000 ; F ; iExt ; 1 ; PIN_AP17 ; test_in ;
; 11.499 ; 7.499 ; ; ; ; ; data path ;
; 4.000 ; 0.000 ; FF ; IC ; 1 ; IOIBUF_X48_Y0_N58 ; test_in~input|i ;
; 8.400 ; 4.400 ; FF ; CELL ; 1 ; IOIBUF_X48_Y0_N58 ; test_in~input|o ;
; 10.847 ; 2.447 ; FF ; IC ; 1 ; FF_X48_Y2_N40 ; test_samp|asdata ;
; 11.499 ; 0.652 ; FF ; CELL ; 1 ; FF_X48_Y2_N40 ; test_samp ;
+----------+---------+----+------+--------+-------------------+---------------------+
+-------------------------------------------------------------------------------+
; Data Required Path ;
+----------+---------+----+------+--------+---------------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+----------+---------+----+------+--------+---------------+---------------------+
; 20.000 ; 20.000 ; ; ; ; ; latch edge time ;
; 23.940 ; 3.940 ; ; ; ; ; clock path ;
; 23.940 ; 3.940 ; R ; ; ; ; clock network delay ;
; 23.840 ; -0.100 ; ; ; ; ; clock uncertainty ;
; 23.840 ; 0.000 ; ; uTsu ; 1 ; FF_X48_Y2_N40 ; test_samp ;
+----------+---------+----+------+--------+---------------+---------------------+
This analysis starts in “Data Arrival Path” with setting the input port (test_in) at 4 ns as specified in the max input delay constraint, and continues that data path. Together with the FPGA’s own data path delay (7.499 ns), the total data path delay stands at 11.499 ns.
The clock path is the calculated in “Data Required Path”, starting from the following clock at 20 ns. The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 23.840 ns, which is 12.341 ns after the data arrived to the flip-flop, which is this constraint’s slack.
It’s simple to see from this analysis that the max input delay is the clock-to-output ( + board delay), as it’s the starting time of the data path.
set_input_delay -min timing analysis (hold)
Delay Model:
Slow 1100mV 85C Model
+-----------------------------------------------------------------------------------------------------+
; Summary of Paths ;
+-------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node ; To Node ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+-------+-----------+-----------+--------------+-------------+--------------+------------+------------+
; 0.770 ; test_in ; test_samp ; theclk ; theclk ; 0.000 ; 4.287 ; 3.057 ;
+-------+-----------+-----------+--------------+-------------+--------------+------------+------------+
Path #1: Hold slack is 0.770
===============================================================================
+--------------------------------+
; Path Summary ;
+--------------------+-----------+
; Property ; Value ;
+--------------------+-----------+
; From Node ; test_in ;
; To Node ; test_samp ;
; Launch Clock ; theclk ;
; Latch Clock ; theclk ;
; Data Arrival Time ; 5.057 ;
; Data Required Time ; 4.287 ;
; Slack ; 0.770 ;
+--------------------+-----------+
+--------------------------------------------------------------------------------------+
; Statistics ;
+---------------------------+-------+-------+-------------+------------+-------+-------+
; Property ; Value ; Count ; Total Delay ; % of Total ; Min ; Max ;
+---------------------------+-------+-------+-------------+------------+-------+-------+
; Hold Relationship ; 0.000 ; ; ; ; ; ;
; Clock Skew ; 4.287 ; ; ; ; ; ;
; Data Delay ; 3.057 ; ; ; ; ; ;
; Number of Logic Levels ; ; 1 ; ; ; ; ;
; Physical Delays ; ; ; ; ; ; ;
; Arrival Path ; ; ; ; ; ; ;
; Clock ; ; ; ; ; ; ;
; Clock Network (Lumped) ; ; 1 ; 0.000 ; ; 0.000 ; 0.000 ;
; Data ; ; ; ; ; ; ;
; IC ; ; 2 ; 2.028 ; 66 ; 0.000 ; 2.028 ;
; Cell ; ; 2 ; 1.029 ; 34 ; 0.290 ; 0.739 ;
; Required Path ; ; ; ; ; ; ;
; Clock ; ; ; ; ; ; ;
; Clock Network (Lumped) ; ; 1 ; 4.287 ; 100 ; 4.287 ; 4.287 ;
+---------------------------+-------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages
+----------------------------------------------------------------------------------+
; Data Arrival Path ;
+---------+---------+----+------+--------+-------------------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+-------------------+---------------------+
; 0.000 ; 0.000 ; ; ; ; ; launch edge time ;
; 0.000 ; 0.000 ; ; ; ; ; clock path ;
; 0.000 ; 0.000 ; R ; ; ; ; clock network delay ;
; 2.000 ; 2.000 ; R ; iExt ; 1 ; PIN_AP17 ; test_in ;
; 5.057 ; 3.057 ; ; ; ; ; data path ;
; 2.000 ; 0.000 ; RR ; IC ; 1 ; IOIBUF_X48_Y0_N58 ; test_in~input|i ;
; 2.739 ; 0.739 ; RR ; CELL ; 1 ; IOIBUF_X48_Y0_N58 ; test_in~input|o ;
; 4.767 ; 2.028 ; RR ; IC ; 1 ; FF_X48_Y2_N40 ; test_samp|asdata ;
; 5.057 ; 0.290 ; RR ; CELL ; 1 ; FF_X48_Y2_N40 ; test_samp ;
+---------+---------+----+------+--------+-------------------+---------------------+
+------------------------------------------------------------------------------+
; Data Required Path ;
+---------+---------+----+------+--------+---------------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+---------------+---------------------+
; 0.000 ; 0.000 ; ; ; ; ; latch edge time ;
; 4.287 ; 4.287 ; ; ; ; ; clock path ;
; 4.287 ; 4.287 ; R ; ; ; ; clock network delay ;
; 4.287 ; 0.000 ; ; ; ; ; clock uncertainty ;
; 4.287 ; 0.000 ; ; uTh ; 1 ; FF_X48_Y2_N40 ; test_samp ;
+---------+---------+----+------+--------+---------------+---------------------+
This analysis starts in “Data Arrival Path” with setting the input port (test_in) at 2 ns as specified in the min input delay constraint, and continues that data path. Together with the FPGA’s own data path delay (3.057 ns), the total data path delay stands at 5.057 ns.
The clock path is the calculated in “Data Required Path”, starting from the same clock edge at 0 ns. After all, this is a hold calculation, so the question is whether the mat wasn’t swept under the feet of the sampling flip-flop before it managed to sample it.
The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 4.287 ns, which is 0.770 ns earlier than the data switching, which is also the slack.
It’s simple to see from this analysis that the min input delay is the minimal clock-to-output, as it’s the starting time of the data path.
set_output_delay -max timing analysis (setup)
Delay Model:
Slow 1100mV 85C Model
+--------------------------------------------------------------------------------------------------------+
; Summary of Paths ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node ; To Node ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; 2.651 ; test_out~reg0 ; test_out ; theclk ; theclk ; 20.000 ; -5.320 ; 3.929 ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
Path #1: Setup slack is 2.651
===============================================================================
+------------------------------------+
; Path Summary ;
+--------------------+---------------+
; Property ; Value ;
+--------------------+---------------+
; From Node ; test_out~reg0 ;
; To Node ; test_out ;
; Launch Clock ; theclk ;
; Latch Clock ; theclk ;
; Data Arrival Time ; 9.249 ;
; Data Required Time ; 11.900 ;
; Slack ; 2.651 ;
+--------------------+---------------+
+---------------------------------------------------------------------------------------+
; Statistics ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Property ; Value ; Count ; Total Delay ; % of Total ; Min ; Max ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Setup Relationship ; 20.000 ; ; ; ; ; ;
; Clock Skew ; -5.320 ; ; ; ; ; ;
; Data Delay ; 3.929 ; ; ; ; ; ;
; Number of Logic Levels ; ; 0 ; ; ; ; ;
; Physical Delays ; ; ; ; ; ; ;
; Arrival Path ; ; ; ; ; ; ;
; Clock ; ; ; ; ; ; ;
; Clock Network (Lumped) ; ; 1 ; 5.320 ; 100 ; 5.320 ; 5.320 ;
; Data ; ; ; ; ; ; ;
; IC ; ; 1 ; 0.000 ; 0 ; 0.000 ; 0.000 ;
; Cell ; ; 3 ; 3.929 ; 100 ; 0.000 ; 2.150 ;
; uTco ; ; 1 ; 0.000 ; 0 ; 0.000 ; 0.000 ;
; Required Path ; ; ; ; ; ; ;
; Clock ; ; ; ; ; ; ;
; Clock Network (Lumped) ; ; 1 ; 0.000 ; ; 0.000 ; 0.000 ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages
+---------------------------------------------------------------------------------------+
; Data Arrival Path ;
+---------+---------+----+------+--------+------------------------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+------------------------+---------------------+
; 0.000 ; 0.000 ; ; ; ; ; launch edge time ;
; 5.320 ; 5.320 ; ; ; ; ; clock path ;
; 5.320 ; 5.320 ; R ; ; ; ; clock network delay ;
; 9.249 ; 3.929 ; ; ; ; ; data path ;
; 5.320 ; 0.000 ; ; uTco ; 1 ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0 ;
; 7.099 ; 1.779 ; FF ; CELL ; 1 ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0|q ;
; 7.099 ; 0.000 ; FF ; IC ; 1 ; IOOBUF_X48_Y0_N42 ; test_out~output|i ;
; 9.249 ; 2.150 ; FF ; CELL ; 1 ; IOOBUF_X48_Y0_N42 ; test_out~output|o ;
; 9.249 ; 0.000 ; FF ; CELL ; 0 ; PIN_AN17 ; test_out ;
+---------+---------+----+------+--------+------------------------+---------------------+
+--------------------------------------------------------------------------+
; Data Required Path ;
+----------+---------+----+------+--------+----------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+----------+---------+----+------+--------+----------+---------------------+
; 20.000 ; 20.000 ; ; ; ; ; latch edge time ;
; 20.000 ; 0.000 ; ; ; ; ; clock path ;
; 20.000 ; 0.000 ; R ; ; ; ; clock network delay ;
; 19.900 ; -0.100 ; ; ; ; ; clock uncertainty ;
; 11.900 ; -8.000 ; F ; oExt ; 0 ; PIN_AN17 ; test_out ;
+----------+---------+----+------+--------+----------+---------------------+
Since the purpose of this analysis is to measure the output delay, it starts off in “Data Arrival Path” with the clock edge, adds the clock network delay to the flip-flop, and then goes along the data path until the physical output is stable, calculated at 9.249 ns.
This is compared with the time of the following clock at 20 ns, minus the output delay. Minus the possible jitter (0.1 ns in the case above). Data arrived at 9.249 ns, the moment that counts is at 11.9 ns, so there’s a 2.651 ns slack.
This demonstrates why set_output_delay -max is the setup time of the receiver: The output delay is reduced from the following clock’s time position, and that’s the goal to meet. That’s exactly the definition of setup time: How long before the following clock the data must be stable.
set_output_delay -min timing analysis (hold)
Delay Model:
Fast 1100mV 0C Model
+--------------------------------------------------------------------------------------------------------+
; Summary of Paths ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; Slack ; From Node ; To Node ; Launch Clock ; Latch Clock ; Relationship ; Clock Skew ; Data Delay ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
; 1.275 ; test_out~reg0 ; test_out ; theclk ; theclk ; 0.000 ; -2.255 ; 2.020 ;
+-------+---------------+----------+--------------+-------------+--------------+------------+------------+
Path #1: Hold slack is 1.275
===============================================================================
+------------------------------------+
; Path Summary ;
+--------------------+---------------+
; Property ; Value ;
+--------------------+---------------+
; From Node ; test_out~reg0 ;
; To Node ; test_out ;
; Launch Clock ; theclk ;
; Latch Clock ; theclk ;
; Data Arrival Time ; 4.275 ;
; Data Required Time ; 3.000 ;
; Slack ; 1.275 ;
+--------------------+---------------+
+---------------------------------------------------------------------------------------+
; Statistics ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Property ; Value ; Count ; Total Delay ; % of Total ; Min ; Max ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
; Hold Relationship ; 0.000 ; ; ; ; ; ;
; Clock Skew ; -2.255 ; ; ; ; ; ;
; Data Delay ; 2.020 ; ; ; ; ; ;
; Number of Logic Levels ; ; 0 ; ; ; ; ;
; Physical Delays ; ; ; ; ; ; ;
; Arrival Path ; ; ; ; ; ; ;
; Clock ; ; ; ; ; ; ;
; Clock Network (Lumped) ; ; 1 ; 2.255 ; 100 ; 2.255 ; 2.255 ;
; Data ; ; ; ; ; ; ;
; IC ; ; 1 ; 0.000 ; 0 ; 0.000 ; 0.000 ;
; Cell ; ; 3 ; 2.020 ; 100 ; 0.000 ; 1.296 ;
; uTco ; ; 1 ; 0.000 ; 0 ; 0.000 ; 0.000 ;
; Required Path ; ; ; ; ; ; ;
; Clock ; ; ; ; ; ; ;
; Clock Network (Lumped) ; ; 1 ; 0.000 ; ; 0.000 ; 0.000 ;
+---------------------------+--------+-------+-------------+------------+-------+-------+
Note: Negative delays are omitted from totals when calculating percentages
+---------------------------------------------------------------------------------------+
; Data Arrival Path ;
+---------+---------+----+------+--------+------------------------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+------------------------+---------------------+
; 0.000 ; 0.000 ; ; ; ; ; launch edge time ;
; 2.255 ; 2.255 ; ; ; ; ; clock path ;
; 2.255 ; 2.255 ; R ; ; ; ; clock network delay ;
; 4.275 ; 2.020 ; ; ; ; ; data path ;
; 2.255 ; 0.000 ; ; uTco ; 1 ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0 ;
; 2.979 ; 0.724 ; RR ; CELL ; 1 ; DDIOOUTCELL_X48_Y0_N50 ; test_out~reg0|q ;
; 2.979 ; 0.000 ; RR ; IC ; 1 ; IOOBUF_X48_Y0_N42 ; test_out~output|i ;
; 4.275 ; 1.296 ; RR ; CELL ; 1 ; IOOBUF_X48_Y0_N42 ; test_out~output|o ;
; 4.275 ; 0.000 ; RR ; CELL ; 0 ; PIN_AN17 ; test_out ;
+---------+---------+----+------+--------+------------------------+---------------------+
+-------------------------------------------------------------------------+
; Data Required Path ;
+---------+---------+----+------+--------+----------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+----------+---------------------+
; 0.000 ; 0.000 ; ; ; ; ; latch edge time ;
; 0.000 ; 0.000 ; ; ; ; ; clock path ;
; 0.000 ; 0.000 ; R ; ; ; ; clock network delay ;
; 0.000 ; 0.000 ; ; ; ; ; clock uncertainty ;
; 3.000 ; 3.000 ; R ; oExt ; 0 ; PIN_AN17 ; test_out ;
+---------+---------+----+------+--------+----------+---------------------+
This analysis is similar to the max output delay, only it’s calculated against the same clock edge (and not the following one).
As before, the data path continues the clock path until the physical output is stable, calculated at 4.275 ns.
This is compared with the time of the same clock at 0 ns, minus the output delay. Recall that the min output delay was negative (-3 ns), which is why it appears as a positive number in the calculation.
Conclusion: Data was stable until 4.275 ns, and needs to be stable until 3 ns. That’s fine, with a 1.275 ns slack.
This demonstrates why set_output_delay -min is minus the hold time of the receiver: The given output delay with reversed sign is used as the time which the data path delay must exceed. In other words, the data must be stable for that long after the clock. This is the definition of hold time.
OK, what’s this?
This page is the example part of another post, which explains the meaning of set_input_delay and set_output_delay in SDC timing constraints.
As mentioned on the other post, the relevant timing constraints were:
create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]
set_input_delay -max timing analysis (setup)
Slack (MET) : 15.664ns (required time - arrival time)
Source: test_in
(input port clocked by theclk {rise@0.000ns fall@10.000ns period=20.000ns})
Destination: test_samp_reg/D
(rising edge-triggered cell FDRE clocked by theclk {rise@0.000ns fall@10.000ns period=20.000ns})
Path Group: theclk
Path Type: Setup (Max at Fast Process Corner)
Requirement: 20.000ns (theclk rise@20.000ns - theclk rise@0.000ns)
Data Path Delay: 2.465ns (logic 0.291ns (11.797%) route 2.175ns (88.203%))
Logic Levels: 1 (IBUF=1)
Input Delay: 4.000ns
Clock Path Skew: 2.162ns (DCD - SCD + CPR)
Destination Clock Delay (DCD): 2.162ns = ( 22.162 - 20.000 )
Source Clock Delay (SCD): 0.000ns
Clock Pessimism Removal (CPR): 0.000ns
Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
Total System Jitter (TSJ): 0.071ns
Total Input Jitter (TIJ): 0.000ns
Discrete Jitter (DJ): 0.000ns
Phase Error (PE): 0.000ns
Location Delay type Incr(ns) Path(ns) Netlist Resource(s)
------------------------------------------------------------------- -------------------
(clock theclk rise edge) 0.000 0.000 r
input delay 4.000 4.000
AE20 0.000 4.000 r test_in (IN)
net (fo=0) 0.000 4.000 test_in
AE20 IBUF (Prop_ibuf_I_O) 0.291 4.291 r test_in_IBUF_inst/O
net (fo=1, routed) 2.175 6.465 test_in_IBUF
SLICE_X0Y1 FDRE r test_samp_reg/D
------------------------------------------------------------------- -------------------
(clock theclk rise edge) 20.000 20.000 r
AE23 0.000 20.000 r test_clk (IN)
net (fo=0) 0.000 20.000 test_clk
AE23 IBUF (Prop_ibuf_I_O) 0.077 20.077 r test_clk_IBUF_inst/O
net (fo=1, routed) 1.278 21.355 test_clk_IBUF
BUFGCTRL_X0Y4 BUFG (Prop_bufg_I_O) 0.026 21.381 r test_clk_IBUF_BUFG_inst/O
net (fo=2, routed) 0.781 22.162 test_clk_IBUF_BUFG
SLICE_X0Y1 FDRE r test_samp_reg/C
clock pessimism 0.000 22.162
clock uncertainty -0.035 22.126
SLICE_X0Y1 FDRE (Setup_fdre_C_D) 0.003 22.129 test_samp_reg
-------------------------------------------------------------------
required time 22.129
arrival time -6.465
-------------------------------------------------------------------
slack 15.664
This analysis starts at time zero, adds the 4 ns (clock-to-output) that was specified in the max input delay constraint, and continues that data path at the fastest possible combination of process, voltage and temperature. Together with the FPGA’s own data path delay (2.465 ns), the total data path delay stands at 6.465 ns.
The clock path is the calculated, once again with the fastest possible combination, starting from the following clock at 20 ns. The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 22.129 ns, which is 15.664 ns after the data arrived to the flip-flop, which is this constraint’s slack.
It’s simple to see from this analysis that the max input delay is the clock-to-output ( + board delay), as it’s added to the data path. So it’s basically how late the data path started. Note the “Max” part in the Path Type above.
set_input_delay -min timing analysis (hold)
Min Delay Paths
--------------------------------------------------------------------------------------
Slack (VIOLATED) : -0.045ns (arrival time - required time)
Source: test_in
(input port clocked by theclk {rise@0.000ns fall@10.000ns period=20.000ns})
Destination: test_samp_reg/D
(rising edge-triggered cell FDRE clocked by theclk {rise@0.000ns fall@10.000ns period=20.000ns})
Path Group: theclk
Path Type: Hold (Min at Slow Process Corner)
Requirement: 0.000ns (theclk rise@0.000ns - theclk rise@0.000ns)
Data Path Delay: 3.443ns (logic 0.626ns (18.194%) route 2.817ns (81.806%))
Logic Levels: 1 (IBUF=1)
Input Delay: 2.000ns
Clock Path Skew: 5.351ns (DCD - SCD - CPR)
Destination Clock Delay (DCD): 5.351ns
Source Clock Delay (SCD): 0.000ns
Clock Pessimism Removal (CPR): -0.000ns
Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
Total System Jitter (TSJ): 0.071ns
Total Input Jitter (TIJ): 0.000ns
Discrete Jitter (DJ): 0.000ns
Phase Error (PE): 0.000ns
Location Delay type Incr(ns) Path(ns) Netlist Resource(s)
------------------------------------------------------------------- -------------------
(clock theclk rise edge) 0.000 0.000 r
input delay 2.000 2.000
AE20 0.000 2.000 r test_in (IN)
net (fo=0) 0.000 2.000 test_in
AE20 IBUF (Prop_ibuf_I_O) 0.626 2.626 r test_in_IBUF_inst/O
net (fo=1, routed) 2.817 5.443 test_in_IBUF
SLICE_X0Y1 FDRE r test_samp_reg/D
------------------------------------------------------------------- -------------------
(clock theclk rise edge) 0.000 0.000 r
AE23 0.000 0.000 r test_clk (IN)
net (fo=0) 0.000 0.000 test_clk
AE23 IBUF (Prop_ibuf_I_O) 0.734 0.734 r test_clk_IBUF_inst/O
net (fo=1, routed) 2.651 3.385 test_clk_IBUF
BUFGCTRL_X0Y4 BUFG (Prop_bufg_I_O) 0.093 3.478 r test_clk_IBUF_BUFG_inst/O
net (fo=2, routed) 1.873 5.351 test_clk_IBUF_BUFG
SLICE_X0Y1 FDRE r test_samp_reg/C
clock pessimism 0.000 5.351
clock uncertainty 0.035 5.387
SLICE_X0Y1 FDRE (Hold_fdre_C_D) 0.101 5.488 test_samp_reg
-------------------------------------------------------------------
required time -5.488
arrival time 5.443
-------------------------------------------------------------------
slack -0.045
This analysis starts at time zero, adds the 2 ns (clock-to-output) that was specified in the min input delay constraint, and continues that data path at the slowest possible combination of process, voltage and temperature. Together with the FPGA’s own data path delay (3.443 ns), the total data path delay stands at 5.443 ns. It should be no surprise that the FPGA’s own delay is bigger compared with the fast analysis above.
The clock path is the calculated, now with the slowest possible combination, starting from the same clock edge at 0 ns. After all, this is a hold calculation, so the question is whether the mat wasn’t swept under the feet of the sampling flip-flop before it managed to sample it.
The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 5.488 ns, which is 0.045 ns too late after the data switched. So the constraint was violated, with a negative slack of 0.045.
It’s simple to see from this analysis that the min input delay is the minimal clock-to-output, as it’s added to the data path. So it’s basically how early the data path may start. Note the “Min” part in the Path Type above.
It may come as a surprise that a 2 ns clock-to-output can violate a hold constraint. This shouldn’t be taken lightly — it can cause real problems.
The solution for this case would be to add a PLL to the clock path, which locks the global network’s clock to the input clock. This effectively means pulling it several nanoseconds earlier, which definitely solves the problem.
set_output_delay -max timing analysis (setup)
Slack (MET) : 2.983ns (required time - arrival time)
Source: test_out_reg/C
(rising edge-triggered cell FDRE clocked by theclk {rise@0.000ns fall@10.000ns period=20.000ns})
Destination: test_out
(output port clocked by theclk {rise@0.000ns fall@10.000ns period=20.000ns})
Path Group: theclk
Path Type: Max at Slow Process Corner
Requirement: 20.000ns (theclk rise@20.000ns - theclk rise@0.000ns)
Data Path Delay: 3.631ns (logic 2.583ns (71.152%) route 1.047ns (28.848%))
Logic Levels: 1 (OBUF=1)
Output Delay: 8.000ns
Clock Path Skew: -5.351ns (DCD - SCD + CPR)
Destination Clock Delay (DCD): 0.000ns = ( 20.000 - 20.000 )
Source Clock Delay (SCD): 5.351ns
Clock Pessimism Removal (CPR): 0.000ns
Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
Total System Jitter (TSJ): 0.071ns
Total Input Jitter (TIJ): 0.000ns
Discrete Jitter (DJ): 0.000ns
Phase Error (PE): 0.000ns
Location Delay type Incr(ns) Path(ns) Netlist Resource(s)
------------------------------------------------------------------- -------------------
(clock theclk rise edge) 0.000 0.000 r
AE23 0.000 0.000 r test_clk (IN)
net (fo=0) 0.000 0.000 test_clk
AE23 IBUF (Prop_ibuf_I_O) 0.734 0.734 r test_clk_IBUF_inst/O
net (fo=1, routed) 2.651 3.385 test_clk_IBUF
BUFGCTRL_X0Y4 BUFG (Prop_bufg_I_O) 0.093 3.478 r test_clk_IBUF_BUFG_inst/O
net (fo=2, routed) 1.873 5.351 test_clk_IBUF_BUFG
SLICE_X0Y1 FDRE r test_out_reg/C
------------------------------------------------------------------- -------------------
SLICE_X0Y1 FDRE (Prop_fdre_C_Q) 0.223 5.574 r test_out_reg/Q
net (fo=1, routed) 1.047 6.622 test_out_OBUF
AK21 OBUF (Prop_obuf_I_O) 2.360 8.982 r test_out_OBUF_inst/O
net (fo=0) 0.000 8.982 test_out
AK21 r test_out (OUT)
------------------------------------------------------------------- -------------------
(clock theclk rise edge) 20.000 20.000 r
clock pessimism 0.000 20.000
clock uncertainty -0.035 19.965
output delay -8.000 11.965
-------------------------------------------------------------------
required time 11.965
arrival time -8.982
-------------------------------------------------------------------
slack 2.983
Since the purpose of this analysis is to measure the output delay, it starts off with the clock edge, follows it towards the flip-flop, and then along the data path. That sums up to the overall delay. Note that the “Path Type” doesn’t say it’s a setup calculation (to avoid confusion?) even though it takes the following clock (at 20 ns) into consideration.
The calculation takes place at the slowest possible combination of process, voltage and temperature (recall that the input setup calculation took place with the fastest one). Following the clock path, it’s evidently very similar to the clock path of the hold analysis for input delay, which is quite expected, as both are based upon the slow model.
The data path simply continues the clock path until the physical output is stable, calculated at 8.982 ns.
This is compared with the time of the following clock at 20 ns, minus the output delay. Minus the possible jitter (0.035 ns in the case above). Data arrived at 8.982 ns, the moment that counts is at ~12 ns, so there’s almost 3 ns slack.
This demonstrates why set_output_delay -max is the setup time of the receiver: The output delay is reduced from the following clock’s time position, and that’s the goal to meet. That’s exactly the definition of setup time: How long before the following clock the data must be stable.
set_output_delay -min timing analysis (hold)
Slack (MET) : 0.791ns (arrival time - required time)
Source: test_out_reg/C
(rising edge-triggered cell FDRE clocked by theclk {rise@0.000ns fall@10.000ns period=20.000ns})
Destination: test_out
(output port clocked by theclk {rise@0.000ns fall@10.000ns period=20.000ns})
Path Group: theclk
Path Type: Min at Fast Process Corner
Requirement: 0.000ns (theclk rise@0.000ns - theclk rise@0.000ns)
Data Path Delay: 1.665ns (logic 1.384ns (83.159%) route 0.280ns (16.841%))
Logic Levels: 1 (OBUF=1)
Output Delay: -3.000ns
Clock Path Skew: -2.162ns (DCD - SCD - CPR)
Destination Clock Delay (DCD): 0.000ns
Source Clock Delay (SCD): 2.162ns
Clock Pessimism Removal (CPR): -0.000ns
Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
Total System Jitter (TSJ): 0.071ns
Total Input Jitter (TIJ): 0.000ns
Discrete Jitter (DJ): 0.000ns
Phase Error (PE): 0.000ns
Location Delay type Incr(ns) Path(ns) Netlist Resource(s)
------------------------------------------------------------------- -------------------
(clock theclk rise edge) 0.000 0.000 r
AE23 0.000 0.000 r test_clk (IN)
net (fo=0) 0.000 0.000 test_clk
AE23 IBUF (Prop_ibuf_I_O) 0.077 0.077 r test_clk_IBUF_inst/O
net (fo=1, routed) 1.278 1.355 test_clk_IBUF
BUFGCTRL_X0Y4 BUFG (Prop_bufg_I_O) 0.026 1.381 r test_clk_IBUF_BUFG_inst/O
net (fo=2, routed) 0.781 2.162 test_clk_IBUF_BUFG
SLICE_X0Y1 FDRE r test_out_reg/C
------------------------------------------------------------------- -------------------
SLICE_X0Y1 FDRE (Prop_fdre_C_Q) 0.100 2.262 r test_out_reg/Q
net (fo=1, routed) 0.280 2.542 test_out_OBUF
AK21 OBUF (Prop_obuf_I_O) 1.284 3.826 r test_out_OBUF_inst/O
net (fo=0) 0.000 3.826 test_out
AK21 r test_out (OUT)
------------------------------------------------------------------- -------------------
(clock theclk rise edge) 0.000 0.000 r
clock pessimism 0.000 0.000
clock uncertainty 0.035 0.035
output delay 3.000 3.035
-------------------------------------------------------------------
required time -3.035
arrival time 3.826
-------------------------------------------------------------------
slack 0.791
This analysis is similar to the max output delay, only it’s calculated on the fastest possible combination of process, voltage and temperature, and against the same clock edge (and not the following one). So again, going from setup to hold, these are reversed. Once again, the clock path is very similar to the clock path of the setup analysis for input delay, which is quite expected, as both are based upon the fast model.
As before, the data path continues the clock path until the physical output is stable, calculated at 3.826 ns (note the difference with the slow path!).
This is compared with the time of the same clock at 0 ns, minus the output delay, minus the possible jitter (0.035 ns in the case above, not clear why it’s counted if it’s the same clock cycle, but anyhow). Recall that the min output delay was negative (-3 ns), which is why it appears as a positive number in the calculation.
Conclusion: Data was stable until 3.826 ns, and needs to be stable until 3.035. That’s fine, with a 0.791 ns slack.
This demonstrates why set_output_delay -min is minus the hold time of the receiver: Jitter aside, the given output delay with reversed sign is used as the time which the data path delay must exceed. In other words, the data must be stable for that long after the clock. This is the definition of hold time.
Introduction
Synopsys Design Constraints (SDC) has been adopted by Xilinx (in Vivado, as .xdc files) as well as Altera (in Quartus, as .sdc files) and other FPGA vendors as well. Despite the wide use of this format, there seems to be some confusion regarding the constraints for defining I/O timing.
This post is defines what they mean, and then shows the timing calculations made by Vivado and Quartus (in separate pages), demonstrating their meaning when implementing a very simple example design. So there’s no need to take my word for it, and this also gives a direction on how to check that your own constraints did what they were supposed to do.
There are several options to these constraints, but these are documented elsewhere. This post is about the basics.
And yes, it’s the same format with Xilinx and Altera. Compatibility. Unbelievable, but true.
What they mean
In short,
- set_input_delay -clock … -max … : The maximal clock-to-output of the driving chip + board propagation delay
- set_input_delay -clock … -min … : The minimal clock-to-output of the driving chip. If not given, choose zero (maybe a future revision of the driving chip will be manufactured with a really fast process)
- set_output_delay -clock … -max … : The t_setup time of the receiving chip + board propagation delay
- set_output_delay -clock … -min … : Minus the t_hold time of the receiving chip (e.g. set to -1 if the hold time is 1 ns).
Note that if neither -min or -max are given, it’s like two assignments, one with -min and one with -max. In other words: Poor constraining.
The definitions are confusing: set_input_delay defines the allowed range of delays of the data toggle after a clock, but set_output_delay defines the range of delays of the clock after a data toggle. Presumably, the rationale behind this is to match datasheet figures of the device on the other end.
Always constraint both min and max
It may seem meaningless to use the min/max constraints. For example, using a catch-both single set_output_delay sets the setup time correctly, and the hold time to a negative value which is incorrect, but why bother? It allows the output port to toggle before the clock, but that couldn’t happen, could it?
Well, actually it can. For example, it’s quite common to let an FPGA PLL (or alike) generate the internal FPGA clock from the clock at some input pin (the “clock on the board”). This allows the PLL to align the clock on the FPGA’s internal clock network to the input clock, by time-shifting it slightly to compensate for the delay of the clock distribution network.
Actually, the implementation tools may feel free to shift the clock to slightly earlier than the clock input, in order to meet timing better: A slow path from logic to output may violate the maximal delay allowed from clock to output. Moving the clock earlier fixes this. But moving the internal clock to earlier than the clock on the board may switch other outputs that depend on the same clock to before the clock on the board toggles, leading to hold time violations on the receiver of these outputs. Nothing prevents this from happening, except a min output delay constraint.
Outline of example design
We’ll assume test_clk input clock, test_in input pin, and test_out output, with the following relationship:
always @(posedge test_clk)
begin
test_samp <= test_in;
test_out <= test_samp;
end
No PLL is used to align the internal clock with the board’s test_clk, so there’s a significant clock delay.
And the following timing constraints applied in the SDC/XDC file:
create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]
As the tools’ timing calculations are rather long, they are on separate pages:
Often I prefer to handle I/O timing simply by ensuring that all registers are pushed into the I/O cells. Where timing matters, that is.
It seems like I/O register packing isn’t the default in Quartus. Anyhow, here’s the lazy man’s recipe for this scenario.
In a previous version of this post, I suggested to disable timing checking on all I/Os. This silences the unconstrained path warning during implementation, and in particular prevents the “TimeQuest Timing Analyzer” section in Quartus’ reports pane turning red:
set_false_path -from [get_ports]
set_false_path -to [get_ports]
This isn’t such a good idea, it turns out, in particular regarding input ports. This is elaborated further below.
Nevertheless, one needs to convince the fitter to push registers into the I/O block. In the QSF, add
set_instance_assignment -name FAST_OUTPUT_REGISTER ON -to *
set_instance_assignment -name FAST_INPUT_REGISTER ON -to *
set_instance_assignment -name FAST_OUTPUT_ENABLE_REGISTER ON -to *
It’s somewhat aggressive to assign these assignments to absolutely everything, but it does the job. The fitter issues warnings for the I/O elements it fails to enforce these constraints on, which is actually a good thing.
To see how well it went, look in the “Resource Section” of the fitter report (possibly find it in Quartus’ reports pane) and look for “Input Registers” etc., whatever applies.
The difference is evident in timing reports of paths involving I/O cells. For example, compare this path which involves an I/O register:
+----------------------------------------------------------------------------------+
; Data Arrival Path ;
+---------+---------+----+------+--------+-----------------------+-----------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+-----------------------+-----------------+
; 2.918 ; 2.918 ; ; ; ; ; data path ;
; 0.000 ; 0.000 ; ; ; 1 ; DDIOOUTCELL_X3_Y0_N32 ; rst ;
; 0.465 ; 0.465 ; RR ; CELL ; 1 ; DDIOOUTCELL_X3_Y0_N32 ; rst|q ;
; 0.465 ; 0.000 ; RR ; IC ; 1 ; IOOBUF_X3_Y0_N30 ; RESETB~output|i ;
; 2.918 ; 2.453 ; RR ; CELL ; 1 ; IOOBUF_X3_Y0_N30 ; RESETB~output|o ;
; 2.918 ; 0.000 ; RR ; CELL ; 0 ; PIN_P3 ; RESETB ;
+---------+---------+----+------+--------+-----------------------+-----------------+
Note the DDIOOUTCELL element, and the zero increment in the routing between the register and the IOOBUF.
For comparison, here’s a path for which an I/O register wasn’t applied (prevented by logic):
+--------------------------------------------------------------------------------+
; Data Arrival Path ;
+---------+---------+----+------+--------+-----------------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+-----------------+---------------------+
; 8.284 ; 8.284 ; ; ; ; ; data path ;
; 0.000 ; 0.000 ; ; ; 1 ; FF_X3_Y0_N17 ; Dir_flop_sig ;
; 0.496 ; 0.496 ; RR ; CELL ; 8 ; FF_X3_Y0_N17 ; Dir_flop_sig|q ;
; 2.153 ; 1.657 ; RR ; IC ; 1 ; IOOBUF_X3_Y0_N9 ; DATA[7]~output|oe ;
; 8.284 ; 6.131 ; RF ; CELL ; 1 ; IOOBUF_X3_Y0_N9 ; DATA[7]~output|o ;
; 8.284 ; 0.000 ; FF ; CELL ; 1 ; PIN_T3 ; DATA[7] ;
+---------+---------+----+------+--------+-----------------+---------------------+
Here we see how a general-purpose flip-flop generates the signal, leading to routing of 1.657 ns. The main problem is that this routing delay will be different each implementation, so if there’s a signal integrity issue with the board, the FPGA might be blamed for it, since different FPGA versions seem to fix the problem or make it reappear.
Timing constraints
Both input and output ports should be tightly constrained, so they can’t be met other than making the best of I/O registers. Not only will this generate a timing failure if something goes wrong with the desired register packing, but it’s also necessary to achieve the minimal input-to-register timing, as explained next.
The discussion below applies only when the clock that drives the registers is directly related to an external clock (i.e. with a PLL that doesn’t multiply it with some exotic ratio). If the driving clock is practically unrelated to the external clock, things get significantly more complicated, as discussed in this post.
To demonstrate this issue, consider the following Verilog snippet:
module top
(
input clk,
input in,
output reg out
);
reg in_d, in_d2;
wire pll_clk;
always @(posedge pll_clk)
begin
in_d <= in;
in_d2 <= in_d;
out <= in_d2;
end
/* Here comes an instantiation of a phase-compensating PLL, which
doesn't change the frequency */
endmodule
with the following constraint in the SDC file
create_clock -name main_clk -period 10 -waveform { 0 5 } [get_ports {clk}]
derive_pll_clocks
derive_clock_uncertainty
set_input_delay -clock main_clk -max 8.5 [get_ports in*]
set_input_delay -clock main_clk -min 0 [get_ports in*]
As explained on this post, set_input_delay is the maximal delay of the source of the signal, from clock to a valid logic state. Since the clock’s period is set to 10 ns, setting the delay constraint to 8.5 ns leaves 1.5 ns until the following clock arrives (at 10 ns). In other words, the setup time on the FPGA pin is constrained not to exceed 1.5 ns.
Note that set_max_delay can be used as well for this purpose (in some cases it’s the only way) as discussed in this post.
Compiling this (along with the FAST_INPUT_REGISTER ON QSF assignment shown above) yields the following segment in the timing report:
+----------------------------------------------------------------------------------+
; Data Arrival Path ;
+---------+---------+----+------+--------+-------------------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+-------------------+---------------------+
; 0.000 ; 0.000 ; ; ; ; ; launch edge time ;
; 0.000 ; 0.000 ; ; ; ; ; clock path ;
; 0.000 ; 0.000 ; R ; ; ; ; clock network delay ;
; 8.500 ; 8.500 ; F ; iExt ; 1 ; PIN_F2 ; in ;
; 9.550 ; 1.050 ; ; ; ; ; data path ;
; 8.500 ; 0.000 ; FF ; IC ; 1 ; IOIBUF_X0_Y22_N15 ; in~input|i ;
; 9.308 ; 0.808 ; FF ; CELL ; 1 ; IOIBUF_X0_Y22_N15 ; in~input|o ;
; 9.308 ; 0.000 ; FF ; IC ; 1 ; FF_X0_Y22_N17 ; in_d|d ;
; 9.550 ; 0.242 ; FF ; CELL ; 1 ; FF_X0_Y22_N17 ; in_d ;
+---------+---------+----+------+--------+-------------------+---------------------+
Unlike the output register, there is no “DDIOINCELL” flip-flop listed, but what appears to be a regular flip-flop. However note that the interconnect to this flip-flop has zero delay (marked in red), which is a clear indication that the flip-flop and input buffer are fused together.
The datasheet report for this input goes:
+---------------------------------------------------------------------------------------------------+
; Setup Times ;
+-----------+------------+-------+-------+------------+---------------------------------------------+
; Data Port ; Clock Port ; Rise ; Fall ; Clock Edge ; Clock Reference ;
+-----------+------------+-------+-------+------------+---------------------------------------------+
; in ; main_clk ; 1.282 ; 1.461 ; Rise ; altpll_component|auto_generated|pll1|clk[0] ;
+-----------+------------+-------+-------+------------+---------------------------------------------+
+-----------------------------------------------------------------------------------------------------+
; Hold Times ;
+-----------+------------+--------+--------+------------+---------------------------------------------+
; Data Port ; Clock Port ; Rise ; Fall ; Clock Edge ; Clock Reference ;
+-----------+------------+--------+--------+------------+---------------------------------------------+
; in ; main_clk ; -0.683 ; -0.862 ; Rise ; altpll_component|auto_generated|pll1|clk[0] ;
+-----------+------------+--------+--------+------------+---------------------------------------------+
As required, the setup time required by the FPGA is lower than the 1.5 ns limit set by the constraint.
Now let’s loosen the input setup delay by 2 ns, leave everything else as it was, and rerun the compilation:
set_input_delay -clock main_clk -max 6.5 [get_ports in*]
set_input_delay -clock main_clk -min 0 [get_ports in*]
The segment in the timing report is now:
+----------------------------------------------------------------------------------+
; Data Arrival Path ;
+---------+---------+----+------+--------+-------------------+---------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+-------------------+---------------------+
; 0.000 ; 0.000 ; ; ; ; ; launch edge time ;
; 0.000 ; 0.000 ; ; ; ; ; clock path ;
; 0.000 ; 0.000 ; R ; ; ; ; clock network delay ;
; 6.500 ; 6.500 ; F ; iExt ; 1 ; PIN_F2 ; in ;
; 8.612 ; 2.112 ; ; ; ; ; data path ;
; 6.500 ; 0.000 ; FF ; IC ; 1 ; IOIBUF_X0_Y22_N15 ; in~input|i ;
; 7.308 ; 0.808 ; FF ; CELL ; 1 ; IOIBUF_X0_Y22_N15 ; in~input|o ;
; 8.370 ; 1.062 ; FF ; IC ; 1 ; FF_X0_Y22_N17 ; in_d|d ;
; 8.612 ; 0.242 ; FF ; CELL ; 1 ; FF_X0_Y22_N17 ; in_d ;
+---------+---------+----+------+--------+-------------------+---------------------+
Huh? The interconnect suddenly rose to 1.062 ns?! Note that the placement of the register didn’t change, so there’s no doubt that in_d is an I/O register. So where did this delay come from?
To answer this, a closer look on the design is required. After a full compilation and selecting Tools > Netlist Viewers > Technology Map Viewer (Post-Fitting), the following diagram appears (partly shown below, click to enlarge):
Right-clicking in_d (the register) and selecting Locate Note > Locate in Resource Property Editor reveals the following (click to enlarge):
To the right of this drawing (not shown above), the property “Input Pin to Input Register Delay” is set to 2. This is the reason for the delay. Before the constraint was loosened up, it was set to 0. The immediate lesson is:
If the setup constraint isn’t set to the technology’s best possible value, Quartus may add a delay on its expense.
But why, Quartus, why?
So one may wonder why Quartus inserts this delay between the input pad and the register. Wasn’t the whole point to sample as soon as possible? To answer this, let’s look at the updated datasheet report:
---------------------+
; Data Port ; Clock Port ; Rise ; Fall ; Clock Edge ; Clock Reference ;
+-----------+------------+-------+-------+------------+---------------------------------------------+
; in ; main_clk ; 2.205 ; 2.523 ; Rise ; altpll_component|auto_generated|pll1|clk[0] ;
+-----------+------------+-------+-------+------------+---------------------------------------------+
+-----------------------------------------------------------------------------------------------------+
; Hold Times ;
+-----------+------------+--------+--------+------------+---------------------------------------------+
; Data Port ; Clock Port ; Rise ; Fall ; Clock Edge ; Clock Reference ;
+-----------+------------+--------+--------+------------+---------------------------------------------+
; in ; main_clk ; -1.570 ; -1.882 ; Rise ; altpll_component|auto_generated|pll1|clk[0] ;
+-----------+------------+--------+--------+------------+---------------------------------------------+
Recall that 2 ns were reduced from the delay constraint, hence the maximal allowed setup time went up from 1.5 ns to 3.5 ns. It’s easy to see that this requirement is met, with a slack of almost 1 ns.
So what Quartus did was saying “I can meet the setup requirement easily, with a spare of 2 ns. Let’s give 1 ns extra to the setup time, and one 1 ns to the hold time requirement (which is 0 ns)”. And indeed, by adding this 1.062 ns delay, the hold time improved from -0.683 ns to -1.570 ns (and please don’t pick on me on why the difference isn’t exact).
Bottom line: Quartus widened the margin for both setup and hold, making the input more robust to jitter. While this is a rather sensible thing to do, this is often not desired nor expected to happen.
Conclusion: If you want to get the absolutely minimal delay from the input to the register, run a compilation with a delay constraint that fails, and then loosen the constraint just enough to resolve this failure. This ensures Quartus won’t try to “improve” the timing by adding this input delay for the sake of a better hold time.
Using DDR primitives
Intel’s FPGAs have dedicated logic on or near the I/O cells to allow for DDR output and sampling, as detailed in the relevant user guide, ug_altddio.pdf. Instantiating such (or using the ALTDDIO_BIDIR megafunction) is an appealing way to force the tools into pushing the register(s) into the I/O cells. Spoiler: It’s not necessarily a good idea.
For example, instantiating something like
altddio_bidir ioddr
(
.padio(pin),
.aclr (1'b0),
.datain_h(datain_h),
.datain_l(datain_l),
.inclock(clk),
.oe(oe),
.outclock(clk),
.dataout_h(dataout_h),
.dataout_l(dataout_l),
.oe_out (),
.aset (1'b0),
.combout(),
.dqsundelayedout(),
.inclocken(1'b1),
.outclocken(1'b1),
.sclr(1'b0),
.sset(1'b0));
defparam
ioddr.extend_oe_disable = "OFF",
ioddr.implement_input_in_lcell = "OFF",
ioddr.intended_device_family = "Cyclone IV E",
ioddr.invert_output = "OFF",
ioddr.lpm_hint = "UNUSED",
ioddr.lpm_type = "altddio_bidir",
ioddr.oe_reg = "REGISTERED",
ioddr.power_up_high = "OFF",
ioddr.width = 1;
indeed results in logic that implements bidirectional DDR interface, but it’s a partial success as far as timing is concerned, at least on Cyclone IV: While the clock-to-output timing is exactly the same as a plain output register that is packet into the I/O cell, the delay on the input path is actually worse with the instantiation above. YMMV with other Intel FPGA families.
Note that in order to mimic plain SDR registers with a DDR primitive, its datain_h and datain_l ports must be connected to the same wire, so the clock’s falling edge doesn’t change anything. Likewise, the dataout_l port’s value should be ignored, as it’s sampled on the falling edge. Also note that the output enable port (oe) is an SDR input — as far as I can understand, it’s not possible to go on and off high-Z in DDR rate with Intel FPGAs. At least not natively.
Now to why it worked nicely on the output registers, and not with the input: The hint is in the timing reports above: Even for a plain I/O cell register, a DDIOOUTCELL_Xn_Ym_Nk component is the register used. In other words, the DDR output register is used even for single-rate outputs, but only with one clock edge. As for the input path, the timing reports above show that a logic fabric register (FF_Xn_Ym_Nk) is used. And here’s the crux: The DDR input logic is implemented in fabric as well, and to make it worse, combinatoric blocks are squeezed between the I/O cell and the flip-flop in the DDR case. Frankly, I don’t understand why, because these combinatoric blocks are just single-input-single-output passthroughs.
These observations are backed by timing reports as well as the drawings displayed by Quartus’ Post-Fit Technology Map Viewer. In particular those useless combinatoric blocks.
This entire issue most likely varies from one FPGA family to another. As for Cyclone IV, it only makes sense to use DDR primitives for outputs.
Even more important, the fact that a DDR primitive output uses identical logic as an packed output register allows producing an output clock that is aligned with the the other outputs: Feed a DDR output primitive with constant ’1′ and ’0′ on the datain_h and datain_l ports, respectively, and apply plain output register packing for the other outputs. The toggling of the other outputs is aligned to the rising edge of clock that comes from the DDR output.
Well, almost. The timing analysis of a output clock is different, because the clock toggles a mux that selects which of the two output registers feeds the output (scroll horizontally for the details):
+------------------------------------------------------------------------------------------------------------------------------------+
; Data Arrival Path ;
+---------+---------+----+------+--------+-------------------------+-----------------------------------------------------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+-------------------------+-----------------------------------------------------------------+
; 0.000 ; 0.000 ; ; ; ; ; launch edge time ;
; 0.000 ; 0.000 ; ; ; ; ; clock path ;
; 0.000 ; 0.000 ; R ; ; ; ; clock network delay ;
; 0.000 ; 0.000 ; R ; ; 1 ; PIN_B12 ; osc_clock ;
; 5.610 ; 5.610 ; ; ; ; ; data path ;
; 0.000 ; 0.000 ; RR ; IC ; 1 ; IOIBUF_X19_Y29_N8 ; osc_clock~input|i ;
; 0.667 ; 0.667 ; RR ; CELL ; 2 ; IOIBUF_X19_Y29_N8 ; osc_clock~input|o ;
; 0.853 ; 0.186 ; RR ; IC ; 1 ; CLKCTRL_G12 ; osc_clock~inputclkctrl|inclk[0] ;
; 0.853 ; 0.000 ; RR ; CELL ; 165 ; CLKCTRL_G12 ; osc_clock~inputclkctrl|outclk ;
; 1.971 ; 1.118 ; RR ; IC ; 1 ; DDIOOUTCELL_X16_Y29_N11 ; sram_controller_ins|ddr_clk|auto_generated|ddio_outa[0]|muxsel ;
; 3.137 ; 1.166 ; RR ; CELL ; 1 ; DDIOOUTCELL_X16_Y29_N11 ; sram_controller_ins|ddr_clk|auto_generated|ddio_outa[0]|dataout ;
; 3.137 ; 0.000 ; RR ; IC ; 1 ; IOOBUF_X16_Y29_N9 ; sram_clk~output|i ;
; 5.610 ; 2.473 ; RR ; CELL ; 1 ; IOOBUF_X16_Y29_N9 ; sram_clk~output|o ;
; 5.610 ; 0.000 ; RR ; CELL ; 0 ; PIN_E10 ; sram_clk ;
+---------+---------+----+------+--------+-------------------------+-----------------------------------------------------------------;
Note that this isn’t a register-to-pin analysis, but clock-to-pin. A set_output_delay constraint constraint will include this path nevertheless. However a set_max_delay constraint from registers to ports, if used, won’t include this path, so it has to be handled separately. In other words, if set_max_delay is used, it has to be of the form:
set_max_delay -from [get_clocks main_clk] -to [get_ports sram_clk] 3.8
Now, compare this with another pin with the same voltage standard etc., only driven by a register:
+----------------------------------------------------------------------------------------------------------------------+
; Data Arrival Path ;
+---------+---------+----+------+--------+-------------------------+---------------------------------------------------+
; Total ; Incr ; RF ; Type ; Fanout ; Location ; Element ;
+---------+---------+----+------+--------+-------------------------+---------------------------------------------------+
; 0.000 ; 0.000 ; ; ; ; ; launch edge time ;
; 2.507 ; 2.507 ; ; ; ; ; clock path ;
; 0.000 ; 0.000 ; ; ; ; ; source latency ;
; 0.000 ; 0.000 ; ; ; 1 ; PIN_B12 ; osc_clock ;
; 0.000 ; 0.000 ; RR ; IC ; 1 ; IOIBUF_X19_Y29_N8 ; osc_clock~input|i ;
; 0.667 ; 0.667 ; RR ; CELL ; 2 ; IOIBUF_X19_Y29_N8 ; osc_clock~input|o ;
; 0.853 ; 0.186 ; RR ; IC ; 1 ; CLKCTRL_G12 ; osc_clock~inputclkctrl|inclk[0] ;
; 0.853 ; 0.000 ; RR ; CELL ; 165 ; CLKCTRL_G12 ; osc_clock~inputclkctrl|outclk ;
; 1.970 ; 1.117 ; RR ; IC ; 1 ; DDIOOUTCELL_X37_Y29_N11 ; sram_controller_ins|dq_wr_data[6]|clk ;
; 2.507 ; 0.537 ; RR ; CELL ; 1 ; DDIOOUTCELL_X37_Y29_N11 ; sram_controller:sram_controller_ins|dq_wr_data[6] ;
; 5.645 ; 3.138 ; ; ; ; ; data path ;
; 2.717 ; 0.210 ; ; uTco ; 1 ; DDIOOUTCELL_X37_Y29_N11 ; sram_controller:sram_controller_ins|dq_wr_data[6] ;
; 3.182 ; 0.465 ; RR ; CELL ; 1 ; DDIOOUTCELL_X37_Y29_N11 ; sram_controller_ins|dq_wr_data[6]|q ;
; 3.182 ; 0.000 ; RR ; IC ; 1 ; IOOBUF_X37_Y29_N9 ; sram_dq[6]~output|i ;
; 5.645 ; 2.463 ; RR ; CELL ; 1 ; IOOBUF_X37_Y29_N9 ; sram_dq[6]~output|o ;
; 5.645 ; 0.000 ; RR ; CELL ; 1 ; PIN_G14 ; sram_dq[6] ;
+---------+---------+----+------+--------+-------------------------+---------------------------------------------------;
The total clock-to-output time differs by no more than 35 ps, even though the latter path is completely different on the face of it. This isn’t a coincidence. The FPGA is clearly designed to produce this similarity. Specifically, the timing analysis above is slow 1200 mV at 100 degrees, but this small difference is consistent in the other analyzed conditions as well.
Introduction
This is a not-so-short tutorial which is intended to make the setup of a Live TV media center on Linux a bit easier, walking through the processing chain from the digital transmission signal to the picture on the screen. Quite naturally, things go from “general knowledge” to a bit more hands-on. The approach here is to understand what you’re doing, something one can avoid when things just work by themselves. If you’re reading this, odds are they didn’t.
I suggest starting with the command line tools, as they give a lot of low-level information as they run. Once a TV channel can be viewed with these, I further suggest using Tvheadend as the backend, as it’s reasonably complicated to work with, and leaves a lot of control to the frontend software.
I also suggest a frontend (Kodi) for everyday TV viewing, and a way to configure it.
All in all, getting this to work is often a rather tedious process. This isn’t all that bad if one learns a few things on the way.
The DVB frontend (“adapter”)
Often with a USB interface, the DVB frontend receives the digital signal and turns it into a stream of bytes. Inside, it typically consist of a tuner, a demodulator and a USB interface chip. Often there’s a demux as well, which is discussed further on.
- The tuner (the analog part) nails down a piece of the frequency spectrum with the digital signal in it, and makes it accessible to the demodulator. The signal may arrive from a simple UHF antenna, a satellite dish or from a cable network. In principle, there is no difference: It’s an analog signal that carries a bitstream of several Mb/s, and the tuner’s job is to bring the signal down to a known lower frequency, where the demodulator expects it to be. The tuner cares only about the signal’s center frequency, and possibly its bandwidth.
- The demodulator (the digital communication part) turns the analog signal from the tuner into a stream of digital bits. This is the most sophisticated part, which includes a significant portion of signal processing and also decoding of error-correcting codes (and, of course, correcting bit errors if such are found and are correctable). All this is hidden from the end user, so all the demodulator tells us is typically if it’s locked on the signal or not. There are quite a few things that need to get synchronized properly, but all we get is something like “works” or “doesn’t work”. The demodulator may also supply information about the signal strength, the S/N ratio and the PostBER, which is an estimation of the bit error rate obtained after fixing bit errors by virtue of the error correction code. This estimation is possible because all but a fraction of the bits are recovered correctly, so the demodulator knows what its input signal would look like without noise and distortions, and so it can also tell how much noise comes in. And the S/N ratio is calculated accordingly.
- The USB interface chip (the computer hardware part) is the less interesting part, but it’s what the computer sees. The driver is often named after it, even though its the other devices that are important. Its main functionalities are: Relaying the output of the demodulator to the computer via a bulk USB endpoint, and supplying means to control and read status from the demodulator and tuner, which is almost always done with an I2C/SMBus over USB kind of bridge. The I2C bus interface may also be used to download firmware to these two devices.
In a Linux system, the DVB adapter is represented with device files in /dev/dvb/adapter0/ (the index goes up if there are several of them). The notable file is /dev/dvb/adapter0/dvr0, from which the data stream is read, possibly with plain file I/O. In other words, when the adapter is set up, it’s possible to record a proper video clip with just “cat”, as shown in this post. /dev/dvb/adapter?/{frontend?,mux?} are used to control the device.
There may be other device files as well, such as net0 (for controlling network packet functionality) or ca0 for Conditional Access.
For more insight, this set of slides may come handy.
What’s in the trunk?
Assuming that the DVB frontend is tuned and locked on a digital signal, there will be data in MPEG-TS format flowing out from /dev/dvb/adapter0/dvr0. Almost always, there are several TV/radio channels on a single digital transmission signal: The data stream is used to pass several types of packets, which may contain MPEG video or audio data, or other related data.
The packets that contain MPEG video or audio data are marked with an identifier, PID. In fact, watching a TV program consists of
- Tuning the DVB adapter to a certain reception frequency, and lock its demodulator on the digital signal
- Filter out all packets belonging to a certain PID, and pass them on to an MPEG video decoder
- Same for another PID, and pass them on to an MPEG audio decoder
- If there are subtitles, there’s another PID to filter out, and pass on to a subtitle rendering mechanism
So all in all, it’s a lot of packets multiplexed into a single stream of data, and it’s the receiver’s job to fish out those of interest. In order to make it easier, packets containing information that organizes the PIDs into services, i.e. TV and radio channels, are also transmitted on the same stream (in dedicated packets).
The end of this post shows the output of a scan with the dvbv5-scan command-line utility. It lists the information obtained in a specific digital stream in an organized manner. One thing that may be surprising about this list, is that a single service (i.e. TV channel) may contain more than a single audio PID. Which isn’t all that odd, as some TV channels may have alternative sound tracks, e.g. in different languages.
There’s also the dvbsnoop utility, which shows and dissects the packets of an MPEG-TS stream. Only for those interested in the really gory details.
By the way, in Tvheadend’s terminology, the raw stream of data that arrives from the demodulator is called a mux. This is a highly confusing misnomer, which probably came from idea that the packets in the stream are multiplexed. To the reset of the world, a “mux” is the machine that takes data from several sources and turns them into a stream. Which brings us to:
Demultiplexing
Assuming that the PIDs of the video and audio streams of the desired TV channel are known, there are two possibilities to filter them out:
- Software demuxing: Tell the DVB adapter to send everything to the dvr0 device file, and fish out the matching packets with the media player (or some intermediate software). Actually, it simply means that the media player ignores all packets that don’t have the requested PIDs.
- Hardware demuxing: Using the e.g. /dev/dvb/adapter0/demux0 device file to command the adapter to pass through only packets with certain PIDs to the /dev/dvb/adapter0/dvr0 output.
Software demuxing is the preferred choice for the typical domestic use, as it allows viewing more than one TV program at a time (assuming that both programs are on the same digital stream). Hardware demuxing is useful for viewing (or recording) TV with command-line utilities (see these two posts for examples of command line sessions).
Watching TV for real
Command-line utilities (see this, this, and this) can indeed be used to hack together some very basic kit for watching TV, and they are priceless for understanding why things went wrong with the fancier tools (hint: Somehow things always get wrong when video is involved).
But for everyday use, it’s best to let a TV streaming backend talk with the hardware. It allows fancy media center software as well as command line utilities to easily access TV channels. As of March 2017, I found Tvheadend + Kodi to be the best combination. In Kodi, I went for PVR IPTV Simple Client rather than Tvheadend’s own front end, as I’ll explain below.
So let’s first understand this backend / frontend business, and I’ll take Tvheadend as an example.
Tvheadend (formerly HTS TVheadend) is a TCP/IP server, which takes control of the DVB adapter(s). And it listens to two TCP/IP ports:
- HTTP Port 9981: Plain browsers can connect to http://localhost:9981/ for administration and configuration + machine readable playlists and Electronic Program Guide on certain URLs (see below). But most important: Availability of all TV channels as MPEG-TS streams, in a protocol easily accessible by a lot of software, with a plain HTTP connection.
- HTSP Port 9982: Home Tv Streaming Protocol (invented for Tvheadend?), seems to be used only be a handful of Linux clients. It’s a one-stop shop for all TV related information, but my own experience was a bit lame. So I’ll leave this aside for now.
The installation of Tvheadend hence involves making it work with the DVB adapter (which is usually simple if everything works smoothly with the command line utilities) as well as setting up the server. It’s may not be all that easy (there are worse), but it’s worth the effort (maybe my own messy jots will help), because:
- It allows simultaneous view of several TV channels, if they’re on the same digital stream (i.e. muxed on the same frequency channel). Watch one channel waiting for the commercials to end on another…
- View TV from any computer on the (wireless?) LAN.
- Virtually any media playing software supports its output format, MPEG-TS. Including stuff running on Windows.
- There are a lot of other features (Electronic Program Guide via the web interface with a browser, recording), but I’m not sure if these belong to the backend. But they may be useful to some.
So let’s take a look on how the Tvheadend conveys the TV channels to its clients. For this, I’ll assume that Tvheadend is properly installed, has been set up (through the web interface) to tune on some TV channels, and that it allows access without user/password from localhost (it’s a convenient setting, and it’s safe at least for 127.0.0.1/32). And that all access is done from the localhost (even though it can be any computer with HTTP access and due permissions. If so, replace “localhost” in the examples below with the IP or domain name of the server).
But first…
A word on IPTV / HLS (not really important for DVB, actually)
We’ll make a small detour to IPTV or HLS (HTTP Live Streaming), because Tvheadend does something similar. IPTV is the commonly used name for TV channels broadcast over the internet, whether it’s live or video-on-demand like kind of broadcasts.
An IPTV/HLS stream is essentially an MPEG-TS stream, similar to the DVB stream on air or cable. In order to make its broadcast over the web easier, it’s cut into chunks, each a few seconds long. The cuts are made on packet boundaries, so each chunk is a legal MPEG-TS segment by itself. A plain concatenation of several subsequent chunks (with e.g. “cat”) makes a perfectly playable MPEG-TS clip. Or stream.
Now to the IPTV client: To start off, the IPTV client is given an initial playlist (as a file or a URL to download this playlist from). That playlist is an M3U file, with one or several URLs, usually a TV channel for each. When the client accesses one of these URLs to start showing a channel, it often receives another playlist, which redirects it to other URLs, which in turn might redirect it further, and so on. These playlists are often set up to allow the client to choose different paths, depending on desired bit rate, display resolution, encoding format etc.
Eventually, possibly after a few redirection hops, the client ends with receiving a playlist containing a list of chunks, so it has the information on where to fetch chunks of MPEG-TS segments from. It starts fetching these chunks, concatenates then, and plays the video stream.
The complicated part of the HLS protocol is the traveling around playlists until the list of chunks is found. Once there, it’s just a matter of downloading those chunks, concatenating, and treating them as a DVB stream.
Tvheadend’s IPTV-like interface
So M3U playlists is the name of the game. Tvheadend offers the TV and radio channels it exposes as an M3U playlist, available at http://localhost:9981/playlist . In my case (Israeli DVB-T), it starts like this:
#EXTM3U
#EXTINF:-1 tvg-id="38e914f04571f2a3f5c915872ba6e794",88FM
http://localhost:9981/stream/channelid/1880418616?ticket=B0E6E9AB06F41C13C0AEC87B7A88966BCBCCE8F4&profile=pass
#EXTINF:-1 tvg-id="219e62923848dac382ed7fcd35c4ed9e",Aleph
http://localhost:9981/stream/channelid/308452897?ticket=88F2FD731008F28454AB8FF7F75BF896FA1F9C7F&profile=pass
#EXTINF:-1 tvg-id="fd874e5286a13d161eb1fa011fb42731",Bet
http://localhost:9981/stream/channelid/1380878333?ticket=3BF6D86889B5DB3DFCAF2EF25D07894B653C3700&profile=pass
#EXTINF:-1 tvg-id="aae608ee725cad880781301f68592dbc",Ch 1
http://localhost:9981/stream/channelid/1846077098?ticket=116EEC70D4201D82BF2A0F1E9AB7387EC9E12D30&profile=pass
#EXTINF:-1 tvg-id="41d97066c9c97e348f83269a6b18e8e6",Ch 10
http://localhost:9981/stream/channelid/1718671681?ticket=757FAE1BA26561DAEDB659E707627366A5071B17&profile=pass
#EXTINF:-1 tvg-id="fc2a79daaca0afd9a39123d4d0305a1f",Ch 2
http://localhost:9981/stream/channelid/1517890300?ticket=BE2E496685F892A1036B3C982888D0778784BDB4&profile=pass
[ ... etc ... ]
After the #EXTM3U header, there are pair of lines for each channel: The first line contains information about the channel (in particular the display name) and the second is the URL for accessing the channel. Unlike HTS/IPTV, this isn’t a go-find-another-playlist, but it directs immediately to where the video stream can be obtained.
The “tvg-id” tag is not common in playlist files in general, and it pairs the channel with its appearance in the EPG (more about that later). If you don’t have it, you probably have an old version of Tvheadend, which doesn’t support the EPG trickery I’m going to show below.
As the URLs in the playlist are “for real”, a plain wget command can be used to record any of these channels. For example, recording from Channel 10:
$ wget -O mytvshow.ts 'http://localhost:9981/stream/channelid/1718671681?ticket=757FAE1BA26561DAEDB659E707627366A5071B17&profile=pass'
--2017-03-13 11:25:59-- http://localhost:9981/stream/channelid/1718671681?ticket=757FAE1BA26561DAEDB659E707627366A5071B17&profile=pass
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:9981... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [video/mp2t]
Saving to: ‘mytvshow.ts’
mytvshow.ts [ <=> ] 3.10M 334KB/s
This will run in principle forever until stopped with CTRL-C. The mytvshow.ts file can be played with VLC, mplayer, ffplay or any other reasonable media player.
These URLs to the channels don’t change once Tvheadend has been set up. It’s therefore possible to download the playlist once, edit away unwanted channels, reorder the list (noted the radio channels at the beginning of the playlist above?), possibly combine with “real” IPTV channels, and feed a media center player with the edited playlist file.
It’s also possible to give these URLs directly to VLC and other media players. Viewing multiple channels at once is as simple as opening several instances of VLC.
One word about what Tvheadend does behind the scenes. In response to the wget command above, the following went to /var/log/syslog:
Mar 13 11:25:59 tv tvheadend[6410]: mpegts: 538MHz in Idan Plus T - tuning on Realtek RTL2832 (DVB-T) : DVB-T #0
Mar 13 11:25:59 tv tvheadend[6410]: subscription: 0018: "HTTP" subscribing on channel "Ch 10", weight: 100, adapter: "Realtek RTL2832 (DVB-T) : DVB-T #0", network: "Idan Plus T", mux: "538MHz", provider: "Idan +", service: "Ch 10", profile="pass", hostname="127.0.0.1", client="Wget/1.17.1 (linux-gnu)"
Note that the HTTP connection resulted in a “subscription” to a certain channel within Tvheadend. This reflects the way Tvheadend mediates its resources, a single DVB adapter in this case, to fulfill requirements of subscribers requesting services.
Consequently, stopping the “recording” (pressing CTRL-C) resulted in
Mar 13 11:26:11 tv tvheadend[6410]: subscription: 0018: "HTTP" unsubscribing from "Ch 10", hostname="127.0.0.1", client="Wget/1.17.1 (linux-gnu)"
Needless to say, something similar happens when a media player opens a connection for streaming live TV.
EPG
A neat feature of DVB is that data for an Electronic Program Guide (EPG) is often embedded in the digital stream, so the name of the current program, along with a short description, is available when zapping to a new TV channel. As well as a TV guide to past and future programs, directly on the TV, shown neatly by the media center software.
There is probably no need to configure anything in Tvheadend to make this work. All those EPG grabbers available are tools for transferring information into Tvheadend, in its absence from the digital stream itself. In particular, if there’s satisfactory information in the “Electronic Program Guide” tab in Tvheadend’s web interface (http://localhost:9981/ with a browser), nothing needs to be fixed.
The common format for exchanging EPG information in Linux is XMLTV, which as its name implies, is in XML format. Tvheadend exports it at http://localhost:9981/xmltv or http://localhost:9981/xmltv/channels (accessing the former will cause an HTTP 302 redirection to the latter).
As of March 2017, this doesn’t work on Tvheadend versions available on the “stable” apt repositories. If attempting to access the URL for XMLTV from a browser results in “1 Unknown Code” appearing, an upgrade is required. Or no EPG will be available with the setup I suggest below.
An XMLTV file typically looks something like this:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE tv SYSTEM "xmltv.dtd">
<tv generator-info-name="TVHeadend-4.1-2405~geb495a0~xenial" source-info-name="tvh-Tvheadend">
<channel id="67a72084ee9a5ddb2fcd89129887bf78">
<display-name>Ch 99</display-name>
</channel>
<channel id="fa723385817605edc2b138d96c259b67">
<display-name>Ch 33</display-name>
</channel>
[ ... ]
<programme start="20170313113000 +0200" stop="20170313123000 +0200" channel="67a72084ee9a5ddb2fcd89129887bf78">
<title lang="heb">ועדה ש.ח כת' עב' כת' ער'</title>
<desc lang="heb">ועדה ש.ח כת' עב' כת' ער'
ועדה מיוחדת לפניות הציבור:
פניות ציבור בנושא התנהלות חברת החשמל בגביית תשלומים ומדיניות ניתוקי חשמל לצרכנים
</desc>
</programme>
<programme start="20170313121000 +0200" stop="20170313124500 +0200" channel="41d97066c9c97e348f83269a6b18e8e6">
<title lang="heb">ראש בראש כת' עב' </title>
<sub-title lang="heb">פרק 345</sub-title>
<desc lang="heb">פרק 345
העיתונאי חגי סגל מארח ומתווכח. כת' עב'
</desc>
</programme>
<programme start="20170313124500 +0200" stop="20170313132000 +0200" channel="41d97066c9c97e348f83269a6b18e8e6">
<title lang="heb">מעונן חלקית כת' עב' </title>
<sub-title lang="heb">פרק 286</sub-title>
<desc lang="heb">פרק 286
תחזית פוליטית: מבט אל השבוע הפוליטי והפרלמנטרי. מגישה: הדס לוי סצמסקי כת' עב'
</desc>
[ ... ]
</tv>
Note that the long hex blob marked red above matches the tvg-id entry of Channel 10 in the playlist given above. This allows pairing between an MPEG-TS stream and its info in the XMLTV file, and hence displaying the current TV program info for its respective channel.
Using Kodi as the front end
Kodi is a convenient front end for viewing TV on a media center computer, living room style. I suggest using the PVR IPTV Simple Client with a local file playlist, in particular because of the simplicity of this solution. And that it works so well.
The setup is fairly straightforward. First, install the plugin:
$ sudo apt-get install kodi-pvr-iptvsimple
and then, after having Kodi up and running, enable and set up the IPTV Simple Client as follows:
- Change setting level to Advanced
- System > Settings > Enable TV
- Enable and Configure PVR IPTV Simple Client (System > Settings > Add-ons > My add-ons > PVR Clients > PVR IPTV Simple Client). Set the playlist to local file, and pick one edited (as suggested above).
- Moving on to the EPG Settings tab, set Location to Remote Path, and XMLTV URL to http://localhost:9981/xmltv. As mentioned above, this requires a version of Tvheadend that supports XMLTV export. Check it manually with a browser.
The EPG interface isn’t necessary to watch TV properly, but makes Kodi display what’s on each channel in a neat way. As far as I know, the only alternative way to have EPG working with Kodi and Tvheadend is the Tvheadend Kodi plugin, which gave me errors all the time with the 4.09 version of Tvheadend.
Kodi has a screensaver enabled by default, which causes the screen to appear darker after a few minutes. It’s possible to turn it off under System > Settings > Appearance > Screensaver.
Summary
Use Kodi if you want it to look like a set-top-box, or vlc, ffplay or mplayer for a more computerish experience. Tvheadend gives a simple and robust interface to all of these, leaving the gory details to be forgotten. Once you’ve been through the setup, of course.
If Tvheadend doesn’t play ball, go for the command-line utilities.
And if all of this takes forever to complete, remember: TV is a waste of time either way.
Intro
These are my jots as I installed Linux Mint 18.1 on a Gigabyte GB-BACE-3160 Compact PC with a 240 GB SSD hard disk and 8 GB RAM, for the purpose of driving my TV in the living room. Not all issues are solved yet.
General notes
Installation flow
- BIOS is invoked by pressing the “Delete” button (constantly) during powerup
- Changed the OS in the BIOS to Linux (not clear why it matters)
- Always power on when AC power is applied: In the BIOS menu, Chipset > Restore AC Power Loss set to “Power on”.
- Pushed USB stick with Linux Mint 18.1 Cinnamon. Booted with no issues
- Installed ssh for remote access (ssh daemon starts immediately after installation)
# apt-get update
# apt-get install ssh
# passwd mint
- Set partition table manually: First partition for boot, 250 MB. Second partition for LVM, 120 GB. Leaving half the disk unpartitioned. LVM can easily use disk space from another partition in the future, if that’s needed. So it read in the end (fdisk):
Command (m for help): p
Disk /dev/sda: 223.6 GiB, 240057409536 bytes, 468862128 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x34449fd0
Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 514047 512000 250M 83 Linux
/dev/sda2 514048 252172287 251658240 120G 8e Linux LVM
- Created one big logical volume on /dev/sda2:
# pvcreate /dev/sda2
Physical volume "/dev/sda2" successfully created
# vgcreate vg_ssd2 /dev/sda2
Volume group "vg_ssd2" successfully created
# lvcreate vg_ssd2 -l 100%FREE -n lv_root
Logical volume "lv_root" created.
- Launched the “Install Linux Mint”
- During installation, enabled installation of third party software (mp3 and media stuff). Installation type: “Something else” to pick what goes where. No swap partition. Installation went through cleanly
- Installed git (“apt-get install git”) and created a repository in root
- Installed the ssh daemon (apt-get install ssh)
- Stop the Linux splash. Give me some real boot text output, and this probably also solves some weird issues with the graphics card (mouse pointer traces and sluggish mouse response?):
Edit /etc/default/grub, changing GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash” to GRUB_CMDLINE_LINUX_DEFAULT=”", and then update the actual GRUB2 file from the /etc/grub.d and the file just edited into /boot/grub/grub.cfg
# update-grub
- Also, make Grub’s menu always appear, since pressing Shift at bootup doesn’t work on this platform. So edit /etc/default/grub once again, uncommenting (explained in Ubuntu’s GRUB2 doc):
GRUB_HIDDEN_TIMEOUT=0
and setting
GRUB_TIMEOUT=2
GRUB_RECORDFAIL_TIMEOUT=2
and once again, run update-grub. The GRUB_RECORDFAIL_TIMEOUT part is required for all the times the computer is turned off abruptly (which is fine with an overlayfs). Otherwise GRUB sticks to 30 seconds all the time.
For more advanced GRUB trickery, see this post.
- DIDN’T: The kernel console boot output is given with some weird blocks instead of characters (The Matrix style) until the framebuffer module is initialized, and the screen goes into graphical mode, after which all looks OK. This can be fixed by uncommenting
#GRUB_TERMINAL=console
in /etc/default/grub, and running update-grub. The result is that these two seconds of kernel messages that are OK, but GRUB’s own menu is shown painfully slow. So I remain with the Matrix kernel.
- Ditch Apparmor, because it messes up things with overlayroot, and probably would cause all kind of weirdnesses regardless, exactly like SELinux did in the past (no chkconfig in Xenial).
$ sudo update-rc.d apparmor remove
- Fix /bin/sh symlink: By default, it goes to /bin/dash (someone had a bad sense of humor). Change it /bin/bash. Won’t make any difference in the matters of this post, but it will bite sometime in the long term.
- Add LSB (Linux Standard Base), among others for getting ld symlinks in /lib64/. Once again, won’t make any difference right now, but it should be there:
# apt-get install lsb-core
- Add group eli manually:
# addgroup --gid nnn eli
- And user:
# adduser --gid nnn --uid nnnn eli
Adding user `eli' ...
Adding new user `eli' (nnnn) with group `eli' ...
Creating home directory `/home/eli' ...
Copying files from `/etc/skel' ...
- Allow no-password access of all members of sudo group (only added NOPASSWD part). From /etc/sudoers:
%sudo ALL=(ALL:ALL) NOPASSWD: ALL
- Enable auto login: Pick the “Login Window” GUI application
- Assign a fixed address on DHCP daemon and a host name at /etc/hosts
- Install NFS client:
# apt-get install nfs-common
- Copy my own .bashrc stuff into “eli” and “root” users
- Install Xemacs:
# apt-get install xemacs21
- Allowing X-sessions over ssh from root user, following this post. As root,
# xauth merge /home/eli/.Xauthority
xauth: file /root/.Xauthority does not exist
# cd ~/
# ln -s /home/eli/.Xauthority
Note that I symlinked the file only because it didn’t exist before, and copying it makes the trick only until the next reboot.
- Set the power options (never turn off screen) and screensaver (ditto, it’s done separately)
- Install kodi, MythTV and mplayer (vlc was already installed):
$ sudo apt-get install --install-suggests kodi
$ sudo apt-get install --install-suggests mythtv
$ sudo apt-get install --install-suggests mplayer
- Install PVRs (mythtv and dvbviewer were probably redundant)
# apt-get install kodi-pvr-iptvsimple
# apt-get install --install-suggests kodi-pvr-dvbviewer
# apt-get install kodi-pvr-mythtv
# apt-get install kodi-pvr-hts
- Install DVB command-line utilities, which didn’t prove very useful (previously dvb-utils)
# apt-get install dvb-apps
- Install w_scan, useful for DVB channel scanning
# apt-get install w-scan
- After changing the resolution to 1920x1080 the menu panel at the bottom was too low down, making it almost invisible. This was the Samsung monitor’s fault, which needed to be adjusted to show the full screen.
- (No need to install kernel headers nor Make / compiler. All included out of the box)
- Install apt-file:
$ sudo apt-get install apt-file
$ apt-file update
- Installed intel-graphics-update-tool_2.0.2 (I messed up a bit with it, see below. Problem was that I attempted to install 2.0.3, which relates to Ubuntu 16.10, but Mint 18.1 is Ubuntu 16.04). It refused, as it looked at /etc/lsb-release, founding out that the distro is Mint. So I faked it.
$ sudo dpkg -i intel-graphics-update-tool_2.0.2_amd64.deb
$ sudo apt-get -f install
$ sudo intel-graphics-update-tool
- Some DVB command line tests (see separate post)
- Overlay root (see separate post)
- Turn off ureadahead service (for faster boot, but does it make sense on a SSD device?)
# systemctl disable ureadahead
- Install Tvheadend (see separate post)
- Add Kodi and Terminal as Startup Applications (using the desktop’s Menu > Startup Applications). Don’t. It’s important to see the desktop screen so the computer’s mode is clear.
Kodi setup
General note: Kodi remembers the navigation position in sub-menus when a category is entered. If things get too confusing, just restart Kodi, so the navigation paths in the docs match.
- Change setting level to Advanced
- System > Settings > Enable TV
- Enable and Configure PVR IPTV Simple Client with a local file playlist (System > Settings > Add-ons > My add-ons > PVR Clients > PVR IPTV Simple Client). Grab a playlist from http://localhost:9981/playlist and append hand-picked entries. In the EPG Settings tab, set Location to Remote Path, and XMLTV URL to http://localhost:9981/xmltv. This requires a recent (post-4.0.9) version of Tvheadend.
- On exit, use Ctrl-Alt-F1 and then Ctrl-Alt-F7 to get back from the blank screen it leaves (if necessary)
- Don’t install a TVheadend plugin. It got the original Tvheadend stuck, and the newer one requires picking a channel twice to view it. Besides, the Simple Client covers all needs with a single playlist.
Fixing graphics issues
Downloaded intel-graphics-update-tool_2.0.3_amd64.deb from Intel Graphics for Linux and ran
$ sudo dpkg -i intel-graphics-update-tool_2.0.3_amd64.deb
but that failed due to a dependency problem:
Selecting previously unselected package intel-graphics-update-tool.
(Reading database ... 207385 files and directories currently installed.)
Preparing to unpack intel-graphics-update-tool_2.0.3_amd64.deb ...
Unpacking intel-graphics-update-tool (2.0.3) ...
dpkg: dependency problems prevent configuration of intel-graphics-update-tool:
intel-graphics-update-tool depends on libpackagekit-glib2-18 (>= 0.9.4); however:
Package libpackagekit-glib2-18 is not installed.
intel-graphics-update-tool depends on fonts-ancient-scripts; however:
Package fonts-ancient-scripts is not installed.
dpkg: error processing package intel-graphics-update-tool (--install):
dependency problems - leaving unconfigured
Processing triggers for gnome-menus (3.13.3-6ubuntu3.1) ...
Processing triggers for desktop-file-utils (0.22-1ubuntu5) ...
Processing triggers for mime-support (3.59ubuntu1) ...
Errors were encountered while processing:
intel-graphics-update-tool
Rumor has it that apt-get -f install can fix that, but it said:
$ sudo apt-get -f install
Reading package lists... Done
Building dependency tree
Reading state information... Done
Correcting dependencies... Done
The following packages will be REMOVED:
intel-graphics-update-tool
0 upgraded, 0 newly installed, 1 to remove and 128 not upgraded.
Remove? Why? What have I done? How about this:
$ sudo apt-get install --fix-missing
Reading package lists... Done
Building dependency tree
Reading state information... Done
You might want to run 'apt-get -f install' to correct these.
The following packages have unmet dependencies:
intel-graphics-update-tool : Depends: libpackagekit-glib2-18 (>= 0.9.4) but it is not installable
Depends: fonts-ancient-scripts but it is not installed
So I ran apt-get -f install and dropped intel-graphics-update-tool.
Directories that change
In the event of restoring the entire root filesystem from backup, try to retain these:
- /.git (the main git repository)
- /home/eli (of course)
- /home/eli/.kodi (included in /home/eli, but this is where Kodi keeps its info)
- /home/hts/ (where Tvheadend keeps its settings and logs)