Raspberry Pi 3 notes

These are messy, random notes that I took while setting up my little Raspberry Pi 3. Odds are that by the time you read this, I’ve replaced it with a mini-PC. So if you ask a question, my answer will probably be “I don’t remember”.

Even though the Pi is cool, it didn’t supply what I really wanted from it, which is simultaneous output on SDTV and HDMI. It also turns out that it’s unable to handle a large portion of the video streams and apps out there on the web, seemingly because the lack of processing power vs. the resolution of these streams (running Kodi, which I suppose is the best optimized application for the Pi). So as a catch-all media center attached to a TV set, it’s rather useless.


  • Used the 2016-11-25-raspbian-jessie.zip image
  • Raspbian: To get remote access over ssh, do “service ssh start” and login as “pi” with password “raspberry”. Best to remove these before really working. To make ssh permanent, go
    # systemctl enable ssh
  • Cheap USB charger from Ebay didn’t hold the system up, and a reboot occurred every time the system attempted to boot up. The original LG G4 charger is strong enough, though.
  • Kodi installed cleanly with
    # apt-get update
    # apt-get install --install-suggests kodi
    # apt-get install --install-suggests vlc
  • … but it seems like vlc doesn’t use video acceleration, and I tried a lot to make it work. It didn’t. So it’s quite useless.
  • Enabling Composite output: Use a four-lead 3.5mm plug (a stereo plug doesn’t work). The Samsung screen refused to work, but Radiance detected the signal OK.
    I used old Canon Powershot’s video cable, but attached to the RED plug for video, and not the yellow.
    In /boot/config.txt, uncomment 


    however composite video is disabled when an HDMI monitor is detected, and Q&A’s on the web seem to suggest that simultaneous outputs is not possible. Following this page, I tried setting (so that the HDMI output matches)


    and got 576i (PAL) on the HDMI output but the signals on the composite output were dead (checked with a scope).

  • Added “eli” as a user:
    # adduser --gid 500 --uid 1010 eli
  • Add “eli as sudoer”. Add the file /etc/sudoers.d/010_eli-nopasswd saying
  • Manually edit /etc/groups, find all the places it says “pi” and add “eli” — so they have some groups. Compare “id” outputs.
  • Add ssh keys for password-less access (use ssh-copy-id)
  • Change the timezone
    $ sudo raspi-config

    pick “4 Internationalisation Options” and change the timezone to Jerusalem

  • Set “eli” as the default login: One possibility would have been to change the config script (usr/bin/raspi-config) as suggested on this page. Or change /etc/lightdm/lightdm.conf so it says

    as for console login, the key line in raspi-config is

    ln -fs /etc/systemd/system/autologin@.service /etc/systemd/system/getty.target.wants/getty@tty1.service

    so the change is to edit /etc/systemd/system/autologin@.service so it says

    ExecStart=-/sbin/agetty --autologin eli --noclear %I $TERM
  • Turn off scrrensaver / blanking: First check the current situation (from an ssh session, therefore specific about display)
    $ xset -display :0 q
    Keyboard Control:
      auto repeat:  on    key click percent:  0    LED mask:  00000000
      XKB indicators:
        00: Caps Lock:   off    01: Num Lock:    off    02: Scroll Lock: off
        03: Compose:     off    04: Kana:        off    05: Sleep:       off
        06: Suspend:     off    07: Mute:        off    08: Misc:        off
        09: Mail:        off    10: Charging:    off    11: Shift Lock:  off
        12: Group 2:     off    13: Mouse Keys:  off
      auto repeat delay:  500    repeat rate:  33
      auto repeating keys:  00ffffffdffffbbf
      bell percent:  50    bell pitch:  400    bell duration:  100
    Pointer Control:
      acceleration:  20/10    threshold:  10
    Screen Saver:
      prefer blanking:  yes    allow exposures:  yes
      timeout:  600    cycle:  600
      default colormap:  0x20    BlackPixel:  0x0    WhitePixel:  0xffffff
    Font Path:
    DPMS (Energy Star):
      Standby: 600    Suspend: 600    Off: 600
      DPMS is Enabled
      Monitor is On

    So turn it off, according to this thread. Edit /etc/kbd/config to say (in different places of the file)


    and then append these lines to ~/.config/lxsession/LXDE-pi/autostart (this is a per-user thing):

    @xset s noblank
    @xset s off
    @xset -dpms

More jots

  • The script that warns against the unchanged password for “pi” user is at /etc/xdg/lxsession/LXDE-pi/sshpwd.sh, and is launched by /etc/xdg/autostart/pprompt.desktop
  • For some info on how to run a power fail safe system by mounting most of the filesystem as readonly, see this page.
  • Obtaining the .config file (from the kernel build guide):
    On Raspberry’s official kernel, check out git ID 4eb9a81002485a7abfa53a334dde5bc10328079f (as 4.4.34), and go 

    $ make ARCH=arm bcm2709_defconfig

Kodi setup

  • Change setting level to Advanced
  • System > Settings > Enable TV
  • System > Settings > System > Power savings, set Shutdown function to Minimise (actually, it didn’t help regarding the blackout of the screen on exit)
  • Enable and Configure PVR IPTV Simple Client
  • On exit, use Ctrl-Alt-F1 and then Ctrl-Alt-F7 to get back from the blank screen it leaves (/bin/chvt should do this as well?)

Video issues

I wanted to get a simultaneous SDTV / HDMI output. Everyone says it’s impossible, but I wanted to give it a try. I mean, it’s the drivers that say no-no, but one can find a combination of registers that gets it working. The alternative is an external HDMI splitter, and then an HDMI to CVBS converter. Spoiler: I gave up in the end. Not saying it’s impossible, only that it’s not worth the bother. So:

Broadcom implements the OpenMAX API, which seems to have a limited set of GPGPU capabilities. For example see firmware/opt/vc/src/hello_pi/hello_fft/ in Raspberry’s official git repo. The QPU is documented in VideoCoreIV-AG100-R.pdf, and there’s an open source assembler for it, vc4asm. and possibly this one is better, mentioned on this page. Also look at this blog.

This page details the VideoCore interface for Raspberry.

An utility for switching between HDMI/SDTV outputs (in hindsight, I would go for the official tvservice instead, but this is what I did):

$ git clone https://github.com/adammw/rpi-output-swapper.git

But that didn’t work:

eli@raspberrypi:~/rpi-output-swapper $ make
cc -Wall -DHAVE_LIBBCM_HOST -DUSE_EXTERNAL_LIBBCM_HOST -DUSE_VCHIQ_ARM -I/opt/vc/include/ -I/opt/vc/include/interface/vcos/pthreads -I./ -g -c video_swap.c -o video_swap.o -Wno-deprecated-declarations
cc -o video_swap.bin -Wl,--whole-archive video_swap.o -L/opt/vc/lib/ -lbcm_host -lvcos -lvchiq_arm -Wl,--no-whole-archive -rdynamic
rm video_swap.o
eli@raspberrypi:~/rpi-output-swapper $ sudo ./video_swap.bin --status
failed to connect to tvservice

which comes from this part in tvservice_init():

    if ( vc_vchi_tv_init( vchi_instance, &vchi_connections, 1) != 0) {
        fprintf(stderr, "failed to connect to tvservice\n");

which is implemented in userland/interface/vmcs_host/vc_vchi_tvservice.c, header file vc_tvservice.h in same directory (Raspberry’s official git repo).

After a lot of back and forth, I compared with the official repo’s tvservice utitlity and discovered that it doesn’t check vc_vhci_tv_init()’s return value. So I ditched the check on video_swap as well, and it worked. But the results on the screen were so messy, that I didn’t want to pursue this direction.

In what follows, some things I found out while trying to solve the problem: The program opens /dev/vchiq on bcm_host_init(), and performs a lot of ioctl()’s on it. The rest of tvservice_init() until the error message causes no system calls at all!

/dev/vchiq had major/minor 248/0 on my system. According to /proc/devices, it belongs to the vchiq module (not a big surprise…). Drivers are at drivers/misc/vc04_services/interface/vchiq_arm/ Seemingly with vchiq_arm.c as the top level file, and are enabled with CONFIG_BCM2708_VCHIQ.

There’s a utility, vcgencmd , for setting a lot of different things, log levels among them, but I didn’t manage to figure out where the log messages go to.

Vivado: Finding the “maximal frequency” after synthesis


Somewhere at the bottom of ISE’s xst synthesizer’s report, it says what the maximal frequency is, along with an outline the slowest path. This is a rather nice feature, in particular when attempting to optimize a specific module. There is no such figure given after a regular Vivado synthesis, possibly because the guys at Xilinx thought this “maximal frequency” could be misleading. If so, they had two good reasons for that:

  • There is no such thing as a “maximal frequency”: The tools pay attention to the timing constraints and do their best accordingly. Put shortly, you might not get frequency X unless you ask for it.
  • In a typical design, there are many clocks with different frequencies. The slowest path might belong to a clock that’s slow anyhow.

And still, it’s sometimes useful to get an idea of where things stand before the Via Dolorosa of a full implementation.

How to do it in Vivado

First and foremost: Set the timing constraints according to your expectations. Or at least, in a way that makes it clear which clock is important, and which can be slow. Then synthesize the design.

After the synthesis has completed successfully, open the synthesized design (clicking “Open Synthesized Design” on the left bar or with the Tcl command “open_run synth_1″).

In the Tcl window, issue the command

report_timing_summary -file mytiming.rpt

which writes a full post-synthesis timing report into mytiming.rpt. Just “report_timing_summary” prints it out to the console.

There’s also a “Report Timing Summary” option under “Synthesized Design” on the left bar, but I find it difficult to navigate my way to getting information in the GUI representation of the report.

Reading the report

RULE #1: The synthesis report is no more than a rough estimation. The routing delays are guesses. It might report timing failures where the implementation will succeed to fix things, and it might say all is fine where the implementation will fail colossally (in particular when the FPGA’s logic usage goes close to 100%).

Now to action: The first thing to look at is the clock summary and Intra Clock Table, and get to know how Vivado has named which clock. For example,

| Clock Summary
| -------------

Clock                  Waveform(ns)         Period(ns)      Frequency(MHz)
-----                  ------------         ----------      --------------
clk_fpga_1             {0.000 5.000}        10.000          100.000
gclk                   {0.000 4.000}        8.000           125.000
  audio_mclk_OBUF      {0.000 41.667}       83.333          12.000
  clk_fb               {0.000 20.000}       40.000          25.000
  vga_clk_ins/clk_fb   {0.000 20.000}       40.000          25.000
  vga_clk_ins/clkout0  {0.000 1.538}        3.077           325.000
  vga_clk_ins/clkout1  {0.000 7.692}        15.385          65.000
  vga_clk_ins/clkout2  {0.000 7.692}        15.385          65.000          

| Intra Clock Table
| -----------------

Clock                      WNS(ns)      TNS(ns)  TNS Failing Endpoints  TNS Total Endpoints      WHS(ns)      THS(ns)  THS Failing Endpoints  THS Total Endpoints     WPWS(ns)     TPWS(ns)  TPWS Failing Endpoints  TPWS Total Endpoints
-----                      -------      -------  ---------------------  -------------------      -------      -------  ---------------------  -------------------     --------     --------  ----------------------  --------------------
clk_fpga_1                   3.791        0.000                      0                12474        0.135        0.000                      0                12474        3.750        0.000                       0                  5021
gclk                                                                                                                                                                     6.751        0.000                       0                     2
  audio_mclk_OBUF                                                                                                                                                       76.667        0.000                       0                     1
  clk_fb                                                                                                                                                                12.633        0.000                       0                     2
  vga_clk_ins/clk_fb                                                                                                                                                    38.751        0.000                       0                     2
  vga_clk_ins/clkout0                                                                                                                                                    1.410        0.000                       0                    10
  vga_clk_ins/clkout1       10.747        0.000                      0                  215       -0.029       -0.229                      8                  215        6.712        0.000                       0                   195
  vga_clk_ins/clkout2        3.990        0.000                      0                  415        0.135        0.000                      0                  415        7.192        0.000                       0                   211

If the clock frequencies listed in the Clock Summary (which are derived from the timing constraints) don’t help matching between a clock and a name, the TNS Total Endpoints of each clock in the Intra Clock Table helps telling which clock is which. So once the name of the clock of interest is nailed down, search for it in the file, and find something like this:

Max Delay Paths
Slack (MET) :             3.791ns  (required time - arrival time)
  Source:                 xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_offset_limit_1/C
                            (rising edge-triggered cell FDRE clocked by clk_fpga_1  {rise@0.000ns fall@5.000ns period=10.000ns})
  Destination:            xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0/D
                            (rising edge-triggered cell FDRE clocked by clk_fpga_1  {rise@0.000ns fall@5.000ns period=10.000ns})
  Path Group:             clk_fpga_1
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            10.000ns  (clk_fpga_1 rise@10.000ns - clk_fpga_1 rise@0.000ns)
  Data Path Delay:        6.077ns  (logic 2.346ns (38.605%)  route 3.731ns (61.395%))
  Logic Levels:           8  (CARRY4=3 LUT3=1 LUT4=1 LUT6=3)
  Clock Path Skew:        -0.040ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    0.851ns = ( 10.851 - 10.000 )
    Source Clock Delay      (SCD):    0.901ns
    Clock Pessimism Removal (CPR):    0.010ns
  Clock Uncertainty:      0.154ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.300ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_fpga_1 rise edge)
                                                      0.000     0.000 r
                         PS7                          0.000     0.000 r  xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/PS7_i/FCLKCLK[1]
                         net (fo=1, unplaced)         0.000     0.000    xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/n_707_PS7_i
                         BUFG (Prop_bufg_I_O)         0.101     0.101 r  xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/buffer_fclk_clk_1.FCLK_CLK_1_BUFG/O
                         net (fo=5023, unplaced)      0.800     0.901    xillybus_ins/xillybus_core_ins/bus_clk_w
                                                                      r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_offset_limit_1/C
  -------------------------------------------------------------------    -------------------
                         FDRE (Prop_fdre_C_Q)         0.496     1.397 f  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_offset_limit_1/Q
                         net (fo=5, unplaced)         0.834     2.231    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_offset_limit[1]
                         LUT4 (Prop_lut4_I0_O)        0.289     2.520 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_lutdi/O
                         net (fo=1, unplaced)         0.000     2.520    xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_lutdi
                         CARRY4 (Prop_carry4_DI[0]_CO[3])
                                                      0.553     3.073 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_cy[0]_CARRY4/CO[3]
                         net (fo=1, unplaced)         0.000     3.073    xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_cy[3]
                         CARRY4 (Prop_carry4_CI_CO[3])
                                                      0.114     3.187 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_cy[4]_CARRY4/CO[3]
                         net (fo=3, unplaced)         0.936     4.123    xillybus_ins/xillybus_core_ins/unitw_1_ins/Mcompar_n0037_cy[7]
                         LUT6 (Prop_lut6_I4_O)        0.124     4.247 f  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_wr_request_condition/O
                         net (fo=7, unplaced)         0.480     4.727    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_wr_request_condition
                         LUT3 (Prop_lut3_I2_O)        0.124     4.851 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_flush_condition_unitw_1_wr_request_condition_AND_179_o3_lut/O
                         net (fo=1, unplaced)         0.000     4.851    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_flush_condition_unitw_1_wr_request_condition_AND_179_o3_lut
                         CARRY4 (Prop_carry4_S[2]_CO[3])
                                                      0.398     5.249 f  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_flush_condition_unitw_1_wr_request_condition_AND_179_o2_cy_CARRY4/CO[3]
                         net (fo=21, unplaced)        0.979     6.228    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_flush_condition_unitw_1_wr_request_condition_AND_179_o
                         LUT6 (Prop_lut6_I5_O)        0.124     6.352 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/_n03401/O
                         net (fo=15, unplaced)        0.502     6.854    xillybus_ins/xillybus_core_ins/unitw_1_ins/_n0340
                         LUT6 (Prop_lut6_I5_O)        0.124     6.978 r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0_rstpot/O
                         net (fo=1, unplaced)         0.000     6.978    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0_rstpot
                         FDRE                                         r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0/D
  -------------------------------------------------------------------    -------------------

                         (clock clk_fpga_1 rise edge)
                                                     10.000    10.000 r
                         PS7                          0.000    10.000 r  xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/PS7_i/FCLKCLK[1]
                         net (fo=1, unplaced)         0.000    10.000    xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/n_707_PS7_i
                         BUFG (Prop_bufg_I_O)         0.091    10.091 r  xillybus_ins/system_i/vivado_system_i/processing_system7_0/inst/buffer_fclk_clk_1.FCLK_CLK_1_BUFG/O
                         net (fo=5023, unplaced)      0.760    10.851    xillybus_ins/xillybus_core_ins/bus_clk_w
                                                                      r  xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0/C
                         clock pessimism              0.010    10.861
                         clock uncertainty           -0.154    10.707
                         FDRE (Setup_fdre_C_D)        0.062    10.769    xillybus_ins/xillybus_core_ins/unitw_1_ins/unitw_1_end_offset_0
                         required time                         10.769
                         arrival time                          -6.978
                         slack                                  3.791

This is a rather messy piece of text, but the key elements are marked in red.

Before drawing any conclusions, make sure it’s the right part you’re looking at:

  • It’s the Max Delay Paths section. The mimimal paths section is useful for spotting hold time violation, and has no effect on the maximal frequency.
  • It’s the right clock. In the example above, it’s clk_fpga_1. The Requirement line states not only the constraint given for this clock (10 ns = 100 MHz), but also that it goes from one rising edge of clk_fpga_1 to another.

Once that’s done, let’s see what we’ve got: The requirement was 10 ns, and the slack 3.791 ns (note that it’s positive), which means that we could have asked for a clock period  3.791 ns shorter and it would still be OK. So it could have been 10 – 3.791 = 6.2090 ns, which is some 161 MHz.

So the short answer to the “maximal clock” question for clk_fpga_1 is 161 MHz. But remember that this figure might change if the constraints change.

And a final note: The Data Path Delay tells us something about what made this worst path slow or fast. How much delay went on logic, and how much on the (estimated) route delays. So does the detailed delay report that follows. For a more detailed report, consider using the “-noworst” flag when requesting the timing report, so a few worst-case paths are listed. This can help solving timing problems.

Vivado HLS and the “no function body” error: Using a C++ function in plain C code

It’s quite well-known, that in order to call a function in C, which has been compiled in a C++ source file, there’s need for an extern “C” statement.

So if this appears on a C++ source file:

void my_puts(const char *str) {

and there’s an attempt to call my_puts() in a plain C file, this will fail with a normal compiler as well as HLS.

In HLS, specifically, the function call in the C file will yield an error like

ERROR: [SYNCHK 200-71] myproject/example/src/main.c:20: function 'my_puts' has no function body.

The thing is, that just adding an extern “C” in the .cpp file will not be enough. The “no function body” error will not go away. What’s required is setting the namespace to hls as well. Something like this:

namespace hls {

extern "C" {

void my_puts(const char *str) {


The inspiration for this solution came from the source files of the HLS suite itself. I don’t know if this is really a good idea, only that it works.

Using exiftool to manually create a Google Map / Waze link from a JPG’s GPS position

If you’re into Linux, and you ever find yourself in a place you’d like to return to with Waze (in the middle of some road, or some not-so-well-mapped village, a campus etc.), just take a photo with your cellular. Assuming that it stores the GPS info.

Alternatively, the “My GPS Coordinates” Android app can be usedful to obtain, SMS, and share the coordinates. But I’ll stick to the photo method.

Use exiftool to extract the coordinates from the image. The -c flag makes sure the coordinates are in plain format:

$ exiftool -c "%.6f degrees" 20160328_160309.jpg 
ExifTool Version Number         : 8.00
File Name                       : 20160328_160309.jpg
Directory                       : .
File Size                       : 5.2 MB
File Modification Date/Time     : 2016:06:14 15:43:43+03:00
File Type                       : JPEG
MIME Type                       : image/jpeg

[ ... ]

GPS Altitude                    : 0 m Above Sea Level
GPS Date/Time                   : 2016:03:28 13:02:58Z
GPS Latitude                    : 32.777351 degrees N
GPS Longitude                   : 35.024139 degrees E
GPS Position                    : 32.777351 degrees N, 35.024139 degrees E
Image Size                      : 5312x2988
Shutter Speed                   : 1/50

[ ... ]

Aha! Now a manual edit of the part marked in red. The link is


I’m lucky enough to live in the North-East part of the world. Had it been south or west, just put negative numbers.

Or, create a Waze link, which can be tapped on the phone to get me to that place:


This opens the web browser, which in turn opens Waze, which started telling me what to do to get there…

The following link can also be used to open Waze directly, however it has to be part of a link on a page like this:


As plain text on a mail message, SMS or in Kepp,  it didn’t work on my LG G4 Android, because the “waze:” prefix didn’t turn it into a link in these apps. It’s still useful within a website (or an HTMLed web message?)


My golden Makefile for compiling single-file C programs

This is the Makefile I use for compiling a lot of simple utility programs, one .c file per utility:

CC=    gcc
FLAGS=  -Wall -O3 -g

ALL=    broadclient broadserver multicastclient multicastserver
all:    $(ALL)

      rm -f $(ALL)
      rm -f `find . -name "*~"`

%:    %.c Makefile
      $(CC) $(FLAGS) $< -o $@

The ALL variable contains the list of output files, each have a corresponding *.c file. Just an example above.

The last implicit rule (%: %.c) tells Make how to create an extension-less executable file from a *.c file. It’s almost redundant, since Make attempts to compile the corresponding C file anyhow, if it sees a target file with no extension (try “make –debug=v”). If the rule is removed, and the CFLAGS variable is set to the current value of FLAGS, it will work the same, except that the Makefile itself won’t be dependent on.

PCIe over fiber optics notes (using SFP+)


As part of a larger project, I was required to set up a PCIe link between a host and some FPGAs through a fiber link, in order to ensure medical-grade electrical isolation of a high-bandwidth video data link + allow for control over the same link.

These are a few jots on carrying a 1x Gen2 PCI Express link over a plain SFP+ fiber optics interface. PCIe is, after all, just one GTX lane going in each direction, so it’s quite natural to carry each Gigabit Transceiver lane on an optical link.

When a general-purpose host computer is used, at least one PCIe switch is required in order to ensure that the optical link is based upon a steady, non-spread spectrum clock. If an FPGA is used as a single endpoint at the other side of the link, it can be connected directly to the SFP+ adapter, with the condition that the FPGA’s PCIe block is set to asynchronous clock mode.

Since my project involved more than one endpoint on the far end (an FPGA and USB 3.0 chip), I went for the solution of one PCIe switch on each end. Avago’s PEX 8606, to be specific.

All in all, there are two issues that really require attention:

  • Clocking: Making sure that the clocks on both sides are within the required range (and it doesn’t hurt if they’re clean from jitter)
  • Handling the receiver detect issue, detailed below

How each signal is handled

  • Tx/Rx lanes: Passed through with fiber. The differential pair is simply connected to the SFP+ respective data input and output.
  • PERST: Signaled by turning off laser on the upstream side and issuing PERST to everything on the downstream side on (a debounced) LOS (Loss of Signal).
  • Clock: Not required. Keep both clocks clean, and within 250 ppm.
  • PRSNT: Generated locally, if this is at all relevant
  • All other PCIe signals are not mandatory

Some insights

  • It’s as easy (or difficult) as setting up a PCIe switch on both sides. The optical link itself is not adding any particular difficulty.
  • Dual clock mode on the PCIe switches is mandatory (hence only certain devices are suitable). The isolated clock goes to a specific lane (pair?), and not all configurations are possible (e.g. not all 1x on PEX8606).
  • According to PCIe spec, the LTSSM goes to Polling if a receiver has been detected (that is, a load is sensed), but Polling returns to Detect if there is no proper training sequence received from the other end. So apparently there is no problem with a fiber optic transceiver, even though it presents itself as a false load in the absence of a link partner at the other side of the fiber: The LTSSM will just keep looping between Detect and Polling until such partner appears.
  • The SFP+ RD pins are transmitters on the PCIe wire pair, and the TD are receivers. Don’t get confused.
  • AC coupling: All lane wires must have an 100 nF capacitor in series. External connectors (e.g. PCIe fingers) must have an capacitor on PET side (but must not have one on the ingoing signal).
  • Turn off ASPM wherever possible. Most BIOSes and many Linux kernels volunteer doing that automatically, but it’s worth making sure ASPM is never turned on in any usage scenario. A lot of errors are related to the L0s state (which is invoked by ASPM) in both switches and endpoints.

PEX 86xx notes

  • PEX_NT_RESETn is an output signal (but shouldn’t be used anyhow)
  • It seems like the PLX device cares about nothing that happened before the reset: A lousy voltage ramp-up or the absence of clock. All if forgotten and forgiven.
  • A fairly new chipset and BIOS are required on the motherboard, say from year 2012 and on, or the switch isn’t handled properly by the host.
  • On a Gigabyte Technology Co., Ltd. G31M-ES2L/G31M-ES2L, BIOS FH 04/30/2010, the motherboard’s BIOS stopped the clock short after powering up (it gave up, probably), and that made the PEX clockless, probably, leading to completely weird behavior.
  • There’s a difference between the lane numbering a port numbering (the latter used in function numbers of the “virtual” endpoints created with respect to each port). For example, on 8606 running a 2x-1x-1x-1x-1x configuration, lanes 0-1, 4, 5, 6 and 7 are mapped to ports 0, 1, 5, 7 and 9 respectively. Port 4 is lane 1 in an all-1x configuration (with other ports mapped the same).
  • The PEX doesn’t detect an SFP+ transceiver as a receiver on the respective PET lane, which prevents bringup of the fiber lane, unless the SerDes X Mask Receiver Not Detected bit is enabled in the relevant register (e.g. bit 16 at address 0x204). The lane still produces its receiver detection pattern, but ignores the fact it didn’t feel any receiver at the other end. See below.
  • In dual-clock mode, the switch works even if the main REFCLK is idle, given that the respective lane is unused (needless to say, the other clock must work).
  • Read the errata of the device before picking one. It’s available on PLX’ site on the same page that the Data Book is downloaded.
  • Connect an EEPROM on custom board designs, and be prepared to use it. It’s a lifesaver.

Why receiver detect is an issue

Before attempting to train a lane, the PCIe spec requires the transmitter to check if there is any receiver on the other side. The spec requires that the receiver should have a single-ended impedance of 40-60 Ohm on each of the P/N wires at DC (and a differential impedance of 80-120 Ohms, but that’s not relevant). The transmitter’s single-ended impedance isn’t specified, only the differential impedance must be 80-120. The coupling capacitor may range between 75-200 nF, and is always on the transmitter’s side (this is relevant only when there’s a plug connection between Tx and Rx).

The transmitter performs a receiver detect by creating an upward common mode pulse of up to 600 mV on both lane wires, and measuring the voltage on these.This pulse lasts for 100 us or so. As the time constant for 50 Ohms combined with 100 nF is 5 us, a charging capacitor’s voltage pattern is expected. Note that the common mode impedance of the transmitter is not defined by the spec, but the transmitter’s designer knows it. Either way, if a flat pulse is observed on the lane wires, there’s no receiver sensed.

Now to SFP+ modules: The SFP+ specification requires a nominal 100 Ohm differential impedance on its receivers, but “does not require any common mode termination at the receiver. If common mode terminations are provided, it may reduce common mode voltage and EMI” (SFF-8431, section 3.4). Also, it requires DC-blocking capacitors on both transmitter and receiver lane wires, so there’s some extra capacitance on the PCIe-to-SFP+ direction (where the SFP+ is the PCIe receiver) which is not expected. But the latter issue is negligible compared with the possible absence of common mode termination.

As the common-mode termination on the receiver is optional, some modules may be detected by the PCIe transmitter, and some may not.

This is what one of the PCIe lane’s wires looks like when the PEX8606 switch is set to ignore the absence of receiver (with the SerDes X Mask Receiver Not Detected bit): It still runs the receiver detect test (the large pulse), but then goes to link training despite that no load was detected (that’s the noisy part after the pulse). In the shown case, the training kept failing (no response on the other side), so it goes back and forth between detection and training.

Oscilloscope plot of receiver detect of PLX8606

This capture was done with a plain digital oscilloscope (~ 200 MHz bandwidth).

Vivado’s component.xml: IP-XACT dissection jots

These are a few jots I wrote down as I wrote some code that generates component.xml files automatically. The XML convention of this file IP-XACT format, a specification by the SPIRIT Consortium which can be downloaded free from IEEE. The “spirit:” prefixes all over the XML file indicates that the keywords are defined in the IP-XACT spec.

Block design files (*.bd), which is the only essential source Vivado needs for defining a block design, are also given by IP-XACT convention, however they serve a different purpose, and have a different format.

An IP-XACT file can be opened directly in Vivado (File -> Open IP-XACT File… or the ipx::open_ipxact_file Tcl command on earlier Vivados) and there are plenty of Tcl commands (try “help ipx::” at Tcl prompt, yes, with two colons).


Everything is under the <spirit:component> entry.

  • Vendor, library version etc
  • busInterfaces: Each businterface groups ports (to be listed later on) into interfaces such as AXI, AXI Streaming etc. These interfaces are one of those known to Vivado, and it seems like it’s not possible to add a custom interface in a sensible way.
  • model: views and ports, see below
  • fileSets: Each fileset lists the files that are relevant for one particular view. The pairing is done by matching the view’s fileSetRef attribute with the fileset’s name attribute.
  • description: This is some text that is displayed to the user. It can be long
  • parameters
  • vendorExtensions (Xilinx taxonomy, basic stuff, note the supportedFamilies entry)

The “model” entry has two subentries:

  • views: Different ways to consume the files of the IP: Synthesis in Verilog, synthesis in VHDL, synthesis in any language, files for describing GUI etc. It seems like Vivado is looking at the envIdentifier attribute in particular, and the fileSetRef for linking with a fileset.
    A view doesn’t have to contain all files required, but several views are used together for a given scenario. For example, when synthesizing in Verilog, the fileset linked to the view identified with “verilogSource:vivado.xilinx.com:synthesis” (typically named “xilinx_verilogsynthesis”) will probably contain the Verilog files. But if there’s also a fileset linked with a view identified with “:vivado.xilinx.com:synthesis” (typically named “xilinx_anylanguagesynthesis”), its files will be used as well as well. The latter fileset may contain netlists (ngc, edif), which the language-specific fileset may not.
  • ports. Enlists the top-level module’s ports. Input ports may have a defaultValue attribute, which defines the value in case nothing is connected to it. All ports appearing in the busInterfaces section must appear here, in which case Vivado includes them in a group. If a port doesn’t belong to any bus interface, it’s exposed as a wire on the block.


  • In the files listed in a fileset, each one is given a fileType attribute. This attribute has to be one listed in the IP-XACT standard section C.8.2 (e.g. verilogSource, vhdlSource, tclSource etc.). Other strings will be rejected by Vivado. For xci, ngc, edif etc, Vivado expects a userFileType attribute instead. One of fileType or userFileType must be present.
  • When instantiating the IP in a block design, Vivado expects the top-level module to have the name given in the modelName attribute in the relevant view. This is typically the name of one of the modules of the fileset.
  • The entries in vendorExtensions -> taxonomies is where the IP will appear in Vivado’s IP Catalog, when it’s listed by groups. The path is given as a directory path, with slashes (hence the leading slash, marking “root”). It’s fine to invent a name for a new root entry, in which case a new group is generated in the IP Catalog. Vivado accepts taxonomies it doesn’t know of.
  • Sub-core’s XCI files may go into a Verilog/VHDL Synthesis fileset, but the last file in the fileset must be in Verilog/VHDL.

Using a shell account as a manual sendmail relay

So the situation is like this: An email I attempted to send got rejected by the recipient’s mail server because my ISP (Netvision) has a poor spam reputation. And it so happens that I have a shell account (with root, possibly) on a server with an excellent reputation. So how do I use this advantage?

On my Thunderbird oldie, save the message with “Save As…” from the “Sent” folder into an .eml file.  Or from “Unsent Mail” folder, if it’s a fresh message which I haven’t even tried to send the normal way (using the “Send Later” feature).

Copy this .eml file to the server with good mail reputation.

On that server, go

$ sendmail -v -t < test.eml
"eli@picky.server.com" <eli@picky.server.com>... Connecting to [] via relay...
220 theserver.org ESMTP Sendmail 8.14.4/8.14.4; Sat, 18 Jun 2016 11:05:26 +0300
>>> EHLO theserver.org
250-theserver.org Hello localhost.localdomain [], pleased to meet you
250 HELP
>>> MAIL From:<eli@theserver.org> SIZE=864
250 2.1.0 <eli@theserver.org>... Sender ok
>>> RCPT To:<eli@picky.server.com>
>>> DATA
250 2.1.5 <eli@picky.server.com>... Recipient ok
354 Enter mail, end with "." on a line by itself
>>> .
250 2.0.0 u5I85QQq030607 Message accepted for delivery
"eli@picky.server.com" <eli@picky.server.com>... Sent (u5I85QQq030607 Message accepted for delivery)
Closing connection to []
>>> QUIT
221 2.0.0 theserver.org closing connection

The -v flag causes all the verbose output, and the -t flag makes sendmail read the headers. If there’s a Bcc: header, it’s removed before sending.

To be sure it went file, look in /var/log/maillog. A successful transmission leaves an entry like this:

Jun 18 11:06:17 theserver sendmail[30611]: u5I85QQq030607: to=<eli@picky.server.com>, ctladdr=<eli@theserver.org> (500/123), delay=00:00:51, xdelay=00:00:48, mailer=esmtp, pri=120985, relay=picky.server.com. [], dsn=2.0.0, stat=Sent (OK id=1bEBGd-0007kL-DB)

Note the mail ID, which was given by sendmail (marked in red). Finding all related log messages is done simply with e.g. (as root)

# grep u5I85QQq030607 /var/log/maillog
Jun 18 11:05:26 theserver sendmail[30607]: u5I85QQq030607: from=<eli@theserver.org>, size=985,, nrcpts=1, msgid=<57650054.90002@picky.server.com>, proto=ESMTP, daemon=MTA, relay=localhost.localdomain []
Jun 18 11:05:29 theserver sendmail[30604]: u5I85NZV030604: to="eli@picky.server.com" <eli@picky.server.com>, ctladdr=eli (500/123), delay=00:00:06, xdelay=00:00:03, mailer=relay, pri=30864, relay=[] [], dsn=2.0.0, stat=Sent (u5I85QQq030607 Message accepted for delivery)
Jun 18 11:06:17 theserver sendmail[30611]: u5I85QQq030607: to=<eli@picky.server.com>, ctladdr=<eli@theserver.org> (500/123), delay=00:00:51, xdelay=00:00:48, mailer=esmtp, pri=120985, relay=picky.server.com. [], dsn=2.0.0, stat=Sent (OK id=1bEBGd-0007kL-DB)

(the successful finale is the last message)

Using cgroups to force RAM swapping for implementing an Arria 10 design

The problem

I needed to implement an FPGA design for an Arria 10 chip with Quartus 15 on a Linux machine. According to Altera’s requirement page, (“Memory recommendations” tab), the computer should have 28-48 GB of RAM. Or, as it says on that page, one can fake it with virtual memory. It turns out the the fitter (quartus_fit) is the process that requires this much memory.

Since I have a desktop with 16 GB and a laptop with 8 GB, I set up a large swap partition on the desktop (see below) and fired off the implementation. For a reason I can’t figure out, the memory just ran out, bringing the computer to a freeze after quartus_fit ate up GB after GB until it reached 15.7GB of used physical RAM: The kernel was still responsive (computer answered to pings) but it seemed like no process was able to run (for example, attempts to connect with ssh got no response whatsoever: The TCP link was established, but no data ran through it). After several minutes of looking at a completely frozen screen, and a hard disk doing almost nothing, I reset the computer.

As for the swap partition, only a few hundred MBs of it was used. Why pages weren’t rushed into swap to avoid this freezing is beyond me. This happened on the desktop running kernel v3.12.20 as well as the laptop with a 3.13.0-35 (Ubuntu 14.04.1).

The solution

Since the swapping mechanism didn’t kick in fast enough to prevent quartus_fit from eating up all physical RAM, let cgroups do the job instead. The idea is that one can limit the amount of physical memory used. Everything else goes to swap. Since I didn’t want to mess with my desktop again, I went for a 6 GB limit on my laptop (out of the existing 8 GB). Details follow.

Setting up swap

First thing first, set up a large swap partition. I’m using LVM on the machine, so it was quite easy.

In retrospective, 64 GB is much more than needed (10 GB would have been enough) but I was lucky enough to have this much spare room in the physical volume.

So to create a new logic volume, and format it for swap, it was (vg_main is the physical volume):

# lvcreate --size 64G vg_main -n lv_bigswap
# mkswap /dev/mapper/vg_main-lv_bigswap
# lvdisplay

Turn off old swap, and enable the new one only:

# swapoff -a
# swapon /dev/mapper/vg_main-lv_bigswap

And that’s it. The swap is enabled.


Now to the interesting part. First I needed to install the cgroup tools:

# apt-get install cgroup-bin

(there was no need to reboot, as suggested elsewhere)

Following this guide: Create a group, owned by myself (eli), but this has to be done as root:

# cgcreate -a eli:eli -g:memory:quartus

This creates the /sys/fs/cgroup/memory/quartus/ subdirectory, owned by user “eli” (and everything below too, so I don’t have to be root to control anything related to it).

Note that the name “quartus” is just a name and has nothing to do with the target executable. Which is never “quartus” in my case, because I implement the project by kicking off “make” from “xemacs”.

I could have used cgexec to start a new process, for example (as root, because changing a group isn’t allowed as plain user)

# cgexec -g memory:quartus xemacs

but I went for changing the group for an existing process (root required, again. 4550 happens to be the PID of xemacs):

# cgclassify -g memory:quartus 4550

It could also make sense to target a shell process, which would limit anything executed from it.

Now drop the root privileges. They won’t be required anymore.

And indeed, the process has joined the group (as non-root):

$ cat /sys/fs/cgroup/memory/quartus/cgroup.procs

Set the memory limit to 6 GiB:

$ cat /sys/fs/cgroup/memory/quartus/memory.limit_in_bytes
$ echo 6442450944 > /sys/fs/cgroup/memory/quartus/memory.limit_in_bytes
$ cat /sys/fs/cgroup/memory/quartus/memory.limit_in_bytes 6442450944

And now launch the implementation from that xemacs process (the “Compile”) button.

For the amusement, follow the joining processes with

$ watch cat /sys/fs/cgroup/memory/quartus/cgroup.procs

Needless to say(?) any process that forks from the originating process joins the group automatically, so the limit applies to all processes. And indeed, when the memory use reaches 6 GB, it goes to swap.

This made the whole process considerably slower I suppose (CPU usage went down to almost zero for some periods of time waiting for disk I/O), but it took some 35 minutes to finish a simple implementation, which is all I needed.

Fedora 12: Displaying Emojis in Firefox / Chrome

A quick summary on how to get my old Fedora 12 to display Emojis when browsing the web (Instagram, for example).

Download the EmojiOneColor font from its Github repo.

Untar the bundle. Don’t run the installation script (maybe it works, but I prefer messing up things myself).

Create a directory named “emoji” (or any other name) in /usr/share/fonts/ and copy EmojiOneColor-SVGinOT.ttf into that directory.

Clear the font cache (as root):

# fc-cache -f

Find Emoji One as a listed font (non-root):

$ fc-list | grep -i emoji
Emoji One Color:style=Regular

That’s it, on my machine. There was no need to add a font configuration script: After restarting Firefox and Google Chrome, both started displaying Emojis instead of those empty boxes (Chrome shows them only in black and white, Firefox in color).

A font configuration file is needed if the browser sticks to the text font even for Emoji characters, ending up with rubbish or empty boxes. In this case, the font configuration file is required to set the Emoji font as default, and fall back on text fonts. All this according to the comments in the bundle I downloaded — I didn’t need this myself, so why mess with the fonts settings?