Executing user-space programs from a different Linux distro

While trying to use executables from one ARM-based distribution to another, it failed to run, even before trying to load any libraries. The ARM architectures were compatible (armhf in both cases) so it wasn’t like I was trying to run an Intel binary on an ARM. I could always cross-compile from sources, but copying binaries is much easier…

I’ll demonstrate this issue with the “ls” program. Of course I tried to adopt something more worthy.

It was just like (where the current directory’s “ls” is the binary belonging to the other distro)

# ./ls
-bash: ./ls: No such file or directory

or sometimes (depends on the distribution) it says

$ ./ls
-sh: ./ls: not found

or when attempting to run with bash:

$ bash ./ls
./ls: ./ls: cannot execute binary file

Attempting to set LD_DEBUG=all was pointless, because the error was earlier on. Strace gave an idea:

$ strace ./ls
execve("./ls", ["./ls"], [/* 13 vars */]) = -1 ENOENT (No such file or directory)
dup(2)                                  = 3
fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
fstat64(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aac9000
_llseek(3, 0, 0x7efca940, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
write(3, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory
) = 40
close(3)                                = 0
munmap(0x2aac9000, 4096)                = 0
exit_group(1)                           = ?

So execve() returns ENOENT even though the file exists. Which means, in this case, that the file is there but the kernel refuses to run it.

The reason

The crucial difference between the alien “ls” and the native one, is the where they expect to find their loader:

$ readelf -l /bin/ls

Elf file type is EXEC (Executable file)
Entry point 0xcb84
There are 7 program headers, starting at offset 52

Program Headers:
 Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 EXIDX          0x093b4c 0x0009bb4c 0x0009bb4c 0x00110 0x00110 R   0x4
 PHDR           0x000034 0x00008034 0x00008034 0x000e0 0x000e0 R E 0x4
 INTERP         0x000114 0x00008114 0x00008114 0x00013 0x00013 R   0x1
 [Requesting program interpreter: /lib/ld-linux.so.3]
 LOAD           0x000000 0x00008000 0x00008000 0x93c60 0x93c60 R E 0x8000
 LOAD           0x094000 0x000a4000 0x000a4000 0x007bd 0x02a88 RW  0x8000
 DYNAMIC        0x09400c 0x000a400c 0x000a400c 0x000f0 0x000f0 RW  0x4
 GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

 Section to Segment mapping:
 Segment Sections...
 00     .ARM.exidx
 01    
 02     .interp
 03     .interp .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.extab .ARM.exidx .eh_frame
 04     .init_array .fini_array .jcr .dynamic .got .data .bss
 05     .dynamic
 06    
$ readelf -l ./ls

Elf file type is EXEC (Executable file)
Entry point 0xb6d9
There are 9 program headers, starting at offset 52

Program Headers:
 Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 EXIDX          0x00fce8 0x00017ce8 0x00017ce8 0x00030 0x00030 R   0x4
 PHDR           0x000034 0x00008034 0x00008034 0x00120 0x00120 R E 0x4
 INTERP         0x000154 0x00008154 0x00008154 0x00027 0x00027 R   0x1
 [Requesting program interpreter: /lib/arm-linux-gnueabihf/ld-linux.so.3]
 LOAD           0x000000 0x00008000 0x00008000 0x0fd1c 0x0fd1c R E 0x8000
 LOAD           0x00fee4 0x0001fee4 0x0001fee4 0x003e4 0x01050 RW  0x8000
 DYNAMIC        0x00fef0 0x0001fef0 0x0001fef0 0x00110 0x00110 RW  0x4
 NOTE           0x00017c 0x0000817c 0x0000817c 0x00044 0x00044 R   0x4
 GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
 GNU_RELRO      0x00fee4 0x0001fee4 0x0001fee4 0x0011c 0x0011c R   0x1

 Section to Segment mapping:
 Segment Sections...
 00     .ARM.exidx
 01    
 02     .interp
 03     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.exidx .eh_frame
 04     .init_array .fini_array .jcr .dynamic .got .data .bss
 05     .dynamic
 06     .note.ABI-tag .note.gnu.build-id
 07    
 08     .init_array .fini_array .jcr .dynamic

Aha! When the native “ls” is executed, the kernel loads /lib/ld-linux.so.3 which in turn executes the required executable. When the alien “ls” was attempted, the kernel went for /lib/arm-linux-gnueabihf/ld-linux.so.3, couldn’t find it and returned “no such file”. It actually means that it didn’t find the interpreter binary (i.e. the glibc dynamic library loader).

The Solution

Create a symlink from where the executable expects the loader to where it actually is. In this case

# mkdir /lib/arm-linux-gnueabihf
# cd /lib/arm-linux-gnueabihf
# ln -s /lib/ld-linux.so.3

It’s of course quite likely that some library binaries will need to be copied along with the executable. LD_DEBUG or ldd may be helpful here, as well as “readelf -d” if there’s no ldd.

Changing the dynamic linker when compiling

Sometimes it’s possible to go the other way around: Tell gcc to pick a certain dynamic linker.

But first, to see which loader a program compiled with gcc will expect, add the -v flag in the compilation command, e.g.

$ gcc -v -O3 -Wall tryexec.c -o tryexec

and look for the -dynamic-linker flag in COLLECT_GCC_OPTIONS (could be, for example, /lib64/ld-linux-x86-64.so.2).

To change the choice of linker, pass an argument to the linker through gcc with the -Wl flag:

$ gcc -O3 -Wl,-I/lib/ld-linux.so.3 -Wall tryexec.c -o tryexec

What comes after the comma of the -Wl flag goes to the linker, so -Wl,-I/lib/ld-linux.so.3 passes “-I/lib/ld-linux.so.3″ to ld, which does the job.

Those using Eclipse (Xilinx SDK included) can add the flag in the project C/C++ Build Settings > Tool Settings > ARM Linux gcc linker > Miscellaneous > Linker Flags (write e.g. “-Wl,-I/lib/myloader.so”, without the quotes, in the text box).

Wifi Access Point on my desktop with USB dongles

Introduction

These are my rather messy notes as I set up a wireless access point on my desktop (Fedora 12) running a home-compiled 3.12.20 Linux kernel. Somewhere below (see “Rubbish starts here”) I’ve added things that I tried out but lead nowhere. Beware.

I began with two USB dongles, 8188EU and 8192CU. I got 8188EU up and running with Realtek’s hostapd and driver, but only for the 2.4 GHz band. So I bought a RaLink-based dual-band USB dongle, and ran it with the kernel’s built-in driver and an updated version of hostapd (it’s hardware neutral however). If you want it, search E-Bay for “300m USB Wifi dual band”. It should look like this, and cost some $15 or so:

Dual band Wifi USB dongle

This dongle is what I ended up using. You may skip to “Dual-band dongle” below if you don’t care about the other things I tried out before I chose this one.

The purpose is a manual setup for occasional use. There are plenty of similar writeouts, like this one.

It’s very easy to get mixed up with all those do-this-do-that howtos, and forget one simple fact: A wireless NIC is just another Ethernet card that happens not to have a cable. The authentication of a wireless link takes place with plain Ethernet packets, and once the two sides agree on talking with each other, it’s back to two Ethernet cards with a cross cable.

To make a machine serve as an access point, the NIC must support Master mode, and there must be software running that plays the role of authenticating clients and setting up encryption. But in the end of the day, that all there is to it. Linux’ daemon for doing this is hostapd.

The swiss army knives are “iw“, “iwconfig” and “iwlist”. Try “iw help” in particular.

In short

  1. Plug in device — driver autoloads
  2. Bring up the device with ifconfig (assign an IP address)
  3. Switch regulation region, if the 5 GHz band is required (and the device reports old and over-restrictive regulation rules):
    # iw reg set GD
  4. Restart dhcpd, so that it listens for requests on wlan0
  5. Start hostapd

Realtek vs. community

There are two completely different takes on getting the Wifi working. One is to use the tools that are maintained by the community: The hostapd that arrives along with distributions, and the drivers compiled in the kernel. Well, as of June 2014, that’s not a go with Realtek’s USB Wifi dongles.

The thing is that the typical distribution hostapd expects to find the kernel’s native interface, which is implemented in the cfg80211 and mac80211 kernel modules. These modules are supposed to talk with the low-level hardware drivers. Very structured and nice. Only hi-tec companies don’t always play ball with the kernel community.

Realtek, in this case, chose to compile together everything, including the higher level frontend source code, and make a single kernel module of that. Kinda makes sense when all you need is a single driver for your specific hardware (a bit like static linking of a program), but not when that hardware is just one of many to be supported.

For example, the kernel’s 8192CU driver (appears as rtl8192cu on lsmod with ~79kB) relies on the kernel’s low-level modules (which are mac80211. cfg80211, rtl8192c_common, rtl_usb, rtlwifi), but the Realtek driver has everything in a single module, which appears as 8192cu and takes ~526kB.

Now to hostapd: The distribution’s version are minded on the kernel’s native interface (“driver=nl80211″) with some partial support for Realtek’s drivers (“driver=rtl871x”), so all in all, if you use Realtek’s kernel drivers, use their hostapd as well.

My chosen solution (well, no-other-choice solution) was to compile the Realtek’s kernel modules and hostapd. With slight variations.

So first is a summary of commands when things finally work, and then the battle field (compilations from sources etc.).

ifconfig

This is necessary for the already running DHCP daemon to answer requests from wireless clients. This ifconfig command is also the moment at which the firmware is loaded (and not when the driver loads, as one could expect).

Important: Remember that routing rules apply like any Ethernet card, so don’t pick an IP address space that is already accounted for in the access point’s routing table. Doing that mistake will not just make pings fail, but the access point will also ignore ARP requests (see below).

# ifconfig wlan0 10.10.0.1 netmask 255.255.255.0
# service dhcpd restart

Starting hostapd

# service hostapd start

or running in the foreground, with a lot of debug output

# hostapd -dd /etc/hostapd/hostapd.conf

Note that when hostapd is running in the foreground and is stopped with CTRL-C, unplugging and replugging the device may be necessary before re-attempting to work with it.

What happens if you pick a bad IP address

For some reason, I had the silly idea that since my internal LAN’s subnet is 10.1.0.0/16, I should assign my wlan0 card the address 10.1.1.123, so it will natively belong to the LAN. What I didn’t realize was that another NIC is already assigned for handling 10.1.0.0/16, so wlan0 will never get packets routed to it.

Even worse, the wireless adapter will not answer to ARP requests, which kinda makes sense — the wireless adapter “knows” that it can’t work with the IP address it has, so it might as well not announce any IP connectivity. The interesting thing was that ping requests were ignored completely as well. It’s not like the replies went out on NIC to which the IP subnet belongs. There was no reply packet at all. Which again, makes sense, because pings are not supposed to go out on another NIC. That could potentially confuse someone into thinking that the link is OK (in case there was a way for the reply to reach the requester).

In grey, with a line-over, here is the description of the problem, as I saw it before I solved it. Just in case someone is stuck in the same situation.

At this point, I can connect to the Access Point from Windows XP (even with a client having poor WPA support) as well as Linux with seemingly no problem. But there’s no real internet access. The reason seems to be, that the USB dongle doesn’t seem to be connected with its IP protocol layer. Ethernet packets go through well, as can be seen in sniff dumps on both sides, and the client manages to acquire an address with DHCP, because it depends only on plain MAC packets.

Despite setting an address with ifconfig (or “ip address add” for that matter), the dongle doesn’t respond to ARP requests asking for the address it has, and doesn’t respond to pings.

ARP packets are sent properly from the dongle (acting as AP) and the responses from the client arrive fine as well (when asking for the address of the client’s Wifi NIC as well as another wired Ethernet NIC, both are answered).

# arping -I wlan0 10.1.1.166
ARPING 10.1.1.166 from 10.1.1.123 wlan0
Unicast reply from 10.1.1.166 [00:0E:2E:40:5B:11]  48.329ms
Unicast reply from 10.1.1.166 [00:0E:2E:40:5B:11]  80.612ms
Unicast reply from 10.1.1.166 [00:0E:2E:40:5B:11]  104.531ms

but not on the other way (from the client):

# arping -I wlan0 10.1.1.123

(nothing happens)

Now, if the access point sends a gratuitous ARP to the client:

# arping -A -I wlan0 10.1.1.123

the client can send ping packets to the access point. These ICMP packets appear in the sniff dump of wlan0 on both sides, but the access point doesn’t reply. So did pinging to the broadcast address. The packets were seen at the access point’s sniff dumps with all 0xff’s MAC address, but with no response:

# ping -b 10.1.255.255

This is not a firewall issue. The problem remains with the firewall taken down. Both USB dongles have this same problem.

Compiling Realtek’s driver for RTL8188EU

Possible reason why this is necessary: The USB device is V2.0 according to the package, and the newer version contains firmware. Anyhow,

$ git clone https://github.com/lwfinger/rtl8188eu.git

A plain “make” compiled the code cleanly on kernel 3.12.20 (using commit ID 63fe7cda86c2830d66335026efde7472c10bc5c2). Copy firmware (also in Git bundle):

# cp rtl8188eufw.bin /lib/firmware/rtlwifi/

(well, I ended up doing “make install”. After removing the existing driver from the staging subdirectory).

Compiling Realtek’s driver for RTL8192CU

Following this guide, went to Realtek’s site, and download something like RTL8188C_8192C_USB_linux_v4.0.2_9000.20130911.zip (ZIP??!), untarred wpa_supplicant_hostapd-0.8_rtw_r7475.20130812.tar.gz.

Tried to compile from this zip file (under “driver”). Compilation failed against my kernel (3.12) on the change of the “create_proc_entry” API. So instead, I went for

$ git clone https://github.com/pvaret/rtl8192cu-fixes.git

and compiled cleanly from commit ID f0dfbb46a891820b27942ba3e213af83f2452957.

Compiling and running Realtek’s hostapd

From the zip file that I downloaded from Realtek, went to the hostapd subdirectory in wpa_supplicant_hostapd/, and typed “make”. Compiled cleanly, and generated a “hostapd” and “hostapd_cli” executables. Yey.

And that actually worked! Note that the rtl871x driver is picked even though the “driver=” isn’t assigned at all in hostapd.conf.

# hostapd -d /etc/hostapd/hostapd.conf
random: Trying to read entropy from /dev/random
Configuration file: /etc/hostapd/hostapd.conf
ctrl_interface_group=0
eapol_version=1
drv->ifindex=35
l2_sock_recv==l2_sock_xmit=0x0x1203be0
BSS count 1, BSSID mask 00:00:00:00:00:00 (0 bits)
Completing interface initialization
Mode: IEEE 802.11g  Channel: 4  Frequency: 2427 MHz
RATE[0] rate=10 flags=0x1
RATE[1] rate=20 flags=0x1
RATE[2] rate=55 flags=0x1
RATE[3] rate=110 flags=0x1
RATE[4] rate=60 flags=0x0
RATE[5] rate=90 flags=0x0
RATE[6] rate=120 flags=0x0
RATE[7] rate=180 flags=0x0
RATE[8] rate=240 flags=0x0
RATE[9] rate=360 flags=0x0
RATE[10] rate=480 flags=0x0
RATE[11] rate=540 flags=0x0
Flushing old station entries
Deauthenticate all stations
+rtl871x_sta_deauth_ops, ff:ff:ff:ff:ff:ff is deauth, reason=2
rtl871x_set_key_ops
rtl871x_set_key_ops
rtl871x_set_key_ops
rtl871x_set_key_ops
Using interface wlan0 with hwaddr c0:4a:00:18:ef:21 and ssid 'ocho'
Deriving WPA PSK based on passphrase
SSID - hexdump_ascii(len=4):
 6f 63 68 6f                                       ocho           
PSK (ASCII passphrase) - hexdump_ascii(len=9): [REMOVED]
PSK (from passphrase) - hexdump(len=32): [REMOVED]
rtl871x_set_wps_assoc_resp_ie
rtl871x_set_wps_beacon_ie
rtl871x_set_wps_probe_resp_ie
urandom: Got 20/20 bytes from /dev/urandom
GMK - hexdump(len=32): [REMOVED]
Key Counter - hexdump(len=32): [REMOVED]
WPA: group state machine entering state GTK_INIT (VLAN-ID 0)
GTK - hexdump(len=32): [REMOVED]
WPA: group state machine entering state SETKEYSDONE (VLAN-ID 0)
rtl871x_set_key_ops
rtl871x_set_beacon_ops
rtl871x_set_hidden_ssid ignore_broadcast_ssid:0, ocho,4
rtl871x_set_acl
wlan0: Setup of interface done.

But with WPA authentication enabled, I got a lot of

hostapd: wlan0: STA 00:0e:2e:40:5b:94 IEEE 802.11: associated
hostapd: wlan0: STA 00:0e:2e:40:5b:94 IEEE 802.11: deauthenticated due to local deauth request
hostapd: wlan0: STA 00:0e:2e:40:5b:94 IEEE 802.11: disassociated

It was also evident sniffing wlan0 that EAPOL WPA key (254) frames were sent to the client, but they didn’t get answered, which is probably the reason for the whole thing, as mentioned on this page.

The solution was to restrict the protocol to version 1 with

eapol_version=1

in hostapd.conf. This problem occurred only when I used the RT2500 utility on the Windows laptop. Using Windows XP’s native wireless selection tool connected well either way.

8192CU is single band. Really.

I tried to work with the 8192CU dongle, because it supposedly supports the 5 GHz band as well. The 2.4 GHz is heavily crowded. I don’t know why I got the impression that it’s dual-band. Anyhow,

# cp 8192cu.ko /lib/modules/$(uname -r)/kernel/drivers/net/wireless/
# depmod -a

and also blacklist the kernel’s native driver by adding the following lines to /etc/modprobe.d/blacklist.conf

# Native Wifi drivers not usable as accept points
blacklist rtl8192cu
blacklist rtl8192c_common

To see the list of channels:

$ iwlist wlan0 freq

Darn, only 2.4 GHz! It even says so on Realtek’s site: “Complete 802.11n MIMO solution for 2.4GHz band” and “Single-Band 11n (2x2) WLAN USB Dongle”.

Besides, the signal it transmits appears to be really lousy. I got a really bad link quality (but hey, this is a cheapo dongle from Ebay).

Compiling hostapd from the sources

First, install libnl-devel, which is required for compiling hostapd:

# yum install libnl-devel

Download from the hostapd’s main page, copy the config file and compile:

$ git clone git://w1.fi/srv/git/hostap.git
$ cd hostap/hostapd
$ git checkout hostap_2_2
$ cp defconfig .config
$ make

Dual-band dongle

Plugged in an MediaTek (formerly RaLink) RT5572-based no-brand dongle (0x148f/0x5572) into my computer with kernel 3.12. Was detected right away. “iw list” gave a long answer, so revert to the original hostapd, and pick driver=nl80211. The driver handling it was rt2800usb, along with its dependencies, rt2800usb, rt2x00usb, rt2x00lib, mac80211 and cfg80211.

The Linux drivers MediaTek’s site were last updated in 2010, supporting kernel 2.4.0, but the rt2800usb driver seems to be maintained properly with occasional patches. So it looks like the kernel’s built-in driver is the best choice. The RT5572 was added in March 2013 to kernel 3.10.

Attempted to run hostapd, it said

# hostapd -dd /etc/hostapd/hostapd.conf
Configuration file: /etc/hostapd/hostapd.conf
ctrl_interface_group=0
eapol_version=1
ioctl[SIOCSIFFLAGS]: No such file or directory
nl80211 driver initialization failed.
wlan1: Unable to setup interface.
rmdir[ctrl_interface]: No such file or directory

That wasn’t very helpful, but looking at the system log was:

ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
ieee80211 phy0: rt2x00lib_request_firmware: Error - Failed to request Firmware

Ah, yes. A firmware file. Taken from the Linux Firmware Git repo,

# cp rt2870.bin /lib/firmware/

(note that it’s NOT to rtlwifi. The is RaLink, not RealTek).

At which point I got a lot of output from hostapd -dd, but it ended with

Could not set DTIM period for kernel driver

This seems to be an hostapd issue (I ran 0.6.9), as the driver is stable. Compiling hostapd-2.2 solved this (see just above), and the dongle works nicely as an access point.

Access point at 5 GHz

The whole point with this dual-band dongle was to run the access point at 5 GHz, and avoid all the noise from my neighbors. But alas, requesting a 5 GHz channel with hostapd -dd, says, somewhere in the middle:

channel [40] (157) is disabled for use in AP mode, flags: 0x1
wlan1: IEEE 802.11 Configured channel (157) not found from the channel list of current mode (2) IEEE 802.11a
wlan1: IEEE 802.11 Hardware does not support configured channel
Could not select hw_mode and channel. (-3)
wlan1: interface state UNINITIALIZED->DISABLED
wlan1: AP-DISABLED
wlan1: Unable to setup interface.

Hmmm… I failed twice here. The frequency isn’t allowed in Israel, and the 5 GHz band is blocked altogether.

Indeed,

$ iw list
Wiphy phy2
 Band 1:
 Capabilities: 0x2f2
 [...]
 Frequencies:
 * 2412 MHz [1] (20.0 dBm)
 * 2417 MHz [2] (20.0 dBm)
 * 2422 MHz [3] (20.0 dBm)
 * 2427 MHz [4] (20.0 dBm)
 * 2432 MHz [5] (20.0 dBm)
 * 2437 MHz [6] (20.0 dBm)
 * 2442 MHz [7] (20.0 dBm)
 * 2447 MHz [8] (20.0 dBm)
 * 2452 MHz [9] (20.0 dBm)
 * 2457 MHz [10] (20.0 dBm)
 * 2462 MHz [11] (20.0 dBm)
 * 2467 MHz [12] (20.0 dBm)
 * 2472 MHz [13] (20.0 dBm)
 * 2484 MHz [14] (disabled)
 Bitrates (non-HT):
 * 1.0 Mbps
 * 2.0 Mbps (short preamble supported)
 * 5.5 Mbps (short preamble supported)
 * 11.0 Mbps (short preamble supported)
 * 6.0 Mbps
 * 9.0 Mbps
 * 12.0 Mbps
 * 18.0 Mbps
 * 24.0 Mbps
 * 36.0 Mbps
 * 48.0 Mbps
 * 54.0 Mbps
 Band 2:
 Capabilities: 0x2f2
 HT20/HT40
 [...]
 Frequencies:
 * 5180 MHz [36] (disabled)
 * 5190 MHz [38] (disabled)
 * 5200 MHz [40] (disabled)
 * 5210 MHz [42] (disabled)
 * 5220 MHz [44] (disabled)
 * 5230 MHz [46] (disabled)
 * 5240 MHz [48] (disabled)
 * 5250 MHz [50] (disabled)
 * 5260 MHz [52] (disabled)
 * 5270 MHz [54] (disabled)
 * 5280 MHz [56] (disabled)
 * 5290 MHz [58] (disabled)
 * 5300 MHz [60] (disabled)
 * 5310 MHz [62] (disabled)
 * 5320 MHz [64] (disabled)
 * 5500 MHz [100] (disabled)
 * 5510 MHz [102] (disabled)
 * 5520 MHz [104] (disabled)
 * 5530 MHz [106] (disabled)
 * 5540 MHz [108] (disabled)
 * 5550 MHz [110] (disabled)
 * 5560 MHz [112] (disabled)
 * 5570 MHz [114] (disabled)
 * 5580 MHz [116] (disabled)
 * 5590 MHz [118] (disabled)
 * 5600 MHz [120] (disabled)
 * 5610 MHz [122] (disabled)
 * 5620 MHz [124] (disabled)
 * 5630 MHz [126] (disabled)
 * 5640 MHz [128] (disabled)
 * 5650 MHz [130] (disabled)
 * 5660 MHz [132] (disabled)
 * 5670 MHz [134] (disabled)
 * 5680 MHz [136] (disabled)
 * 5690 MHz [138] (disabled)
 * 5700 MHz [140] (disabled)
 * 5745 MHz [149] (disabled)
 * 5755 MHz [151] (disabled)
 * 5765 MHz [153] (disabled)
 * 5775 MHz [155] (disabled)
 * 5785 MHz [157] (disabled)
 * 5795 MHz [159] (disabled)
 * 5805 MHz [161] (disabled)
 * 5825 MHz [165] (disabled)
 * 4920 MHz [-16] (disabled)
 * 4940 MHz [-12] (disabled)
 * 4960 MHz [-8] (disabled)
 * 4980 MHz [-4] (disabled)
 Bitrates (non-HT):
 * 6.0 Mbps
 * 9.0 Mbps
 * 12.0 Mbps
 * 18.0 Mbps
 * 24.0 Mbps
 * 36.0 Mbps
 * 48.0 Mbps
 * 54.0 Mbps
 [...]

Are you kidding me? Disabled? Well, no wonder. The kernel thinks 5 GHz is disallowed in Israel:

$ iw reg get
country IL:
 (2402 - 2482 @ 40), (N/A, 20)

Where did it get that from? A peek on dmesg reveals the answer:

cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
usb 5-1.4: reset full-speed USB device number 9 using uhci_hcd
ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 5592, rev 0222 detected
ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 000f detected
ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
usbcore: registered new interface driver rt2800usb
cfg80211: Calling CRDA for country: IL
cfg80211: Regulatory domain changed to country: IL
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)

The thing is that according to Israel’s local regulations, the lower 5 GHz band is allowed for indoor use. My initial choice of channel 157 is probably illegal in Israel (see Wikipedia’s list). But hey, some channels are still open on the 5 GHz band! It’s also interesting to note that some of 5 GHz channels that are banned for Wifi are allowed for amateur radio (also see this and this).

As the regulations for each country is taken from some ROM on the hardware device itself, it’s probably outdated.

The ugly solution is to switch regulation country. For example, Granada has a relatively relaxed setting:

# iw reg set GD

A full list of these country codes can be found here. “BO” (for Bolivia) is also worth a try.

Now the responsibility is on me to pick a legal frequency. For example, anywhere between 36-48.


Rubbish starts here

From this point on, it’s just random stuff that I tried out, and didn’t lead anywhere. But since I write as I work, why delete it? Maybe it helps someone as is.

Plugging in a TL-WN725N before switching to Realtek’s drivers

usb 2-2.2: Product: 802.11n NIC
usb 2-2.2: Manufacturer: Realtek
usb 2-2.2: SerialNumber: 00E04C0001
r8188eu: module is from the staging directory, the quality is unknown, you have been warned.
Chip Version Info: CHIP_8188E_Normal_Chip_TSMC_D_CUT_1T1R_RomVer(0)
usbcore: registered new interface driver r8188eu

Check if it’s ready to be an access point:

# iwconfig wlan0 mode master
# iwconfig wlan0
wlan0     unassociated  Nickname:"<WIFI@REALTEK>"
 Mode:Master  Frequency=2.412 GHz  Access Point: Not-Associated  
 Sensitivity:0/0 
 Retry:off   RTS thr:off   Fragment thr:off
 Encryption key:off
 Power Management:off
 Link Quality:0  Signal level:0  Noise level:0
 Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
 Tx excessive retries:0  Invalid misc:0   Missed beacon:0

OK, so it is. :)

But this doesn’t seem very good:

# iw list
nl80211 not found.

And here comes a bit of nonsense that was fixed by compiling software from sources, as shown below.

Fixed with

# modprobe mac80211

Installing the access point daemon:

# yum install hostapd

Running manually for a test:

 

# hostapd -dd /etc/hostapd/hostapd.conf
Configuration file: /etc/hostapd/hostapd.conf
ctrl_interface_group=10 (from group name 'wheel')
nl80211 not found.
nl80211 driver initialization failed.
wlan0: Unable to setup interface.

Tried second dongle (the I bought cheap from Ebay)

usb 2-2.2: New USB device found, idVendor=0bda, idProduct=8176
usb 2-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-2.2: Product: 802.11n WLAN Adapter
usb 2-2.2: Manufacturer: Realtek
usb 2-2.2: SerialNumber: 00e04c000001
rtl8192cu: Chip version 0x10
rtl8192cu: MAC address: 00:13:ef:40:08:98
rtl8192cu: Board Type 0
rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
usbcore: registered new interface driver rtl8192cu
rtlwifi: Loading alternative firmware rtlwifi/rtl8192cufw.bin
rtlwifi: Firmware rtlwifi/rtl8192cufw_TMSC.bin not available

OK, OK, take the firmware!

# mkdir /lib/firmware/rtlwifi
# cp rtl8192cufw.bin /lib/firmware/rtlwifi/

Unplug-replug. This one went much better:

usb 2-2.2: New USB device found, idVendor=0bda, idProduct=8176
usb 2-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-2.2: Product: 802.11n WLAN Adapter
usb 2-2.2: Manufacturer: Realtek
usb 2-2.2: SerialNumber: 00e04c000001
rtl8192cu: Chip version 0x10
rtl8192cu: MAC address: 00:13:ef:40:08:98
rtl8192cu: Board Type 0
rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
rtlwifi: Loading alternative firmware rtlwifi/rtl8192cufw.bin
ieee80211 phy1: Selected rate control algorithm 'rtl_rc'
rtlwifi: wireless switch is on
cfg80211: Calling CRDA for country: IL
cfg80211: Regulatory domain changed to country: IL
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)

but

# hostapd /etc/hostapd/hostapd.conf
ioctl[SIOCSIFFLAGS]: Unknown error 132
nl80211 driver initialization failed.
rmdir[ctrl_interface]: No such file or directory

Newer hostapd

Stole the binaries from Fedora 20, including a set of necessary libraries, and created a chroot for that as follows:

# chroot . /hostapd -d /hostapd.conf

With the Ebay dongle, the AP was visible from my laptop, but I failed to connect. Nothing appears on sniffing wlan1, and strace shows nothing happens during these connection attempts, so the conclusion must be that the problem is with the dongle.

So I found the first firmware the driver was checking for,

usb 2-2.3: New USB device found, idVendor=0bda, idProduct=8176
usb 2-2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-2.3: Product: 802.11n WLAN Adapter
usb 2-2.3: Manufacturer: Realtek
usb 2-2.3: SerialNumber: 00e04c000001
rtl8192cu: Chip version 0x10
rtl8192cu: MAC address: 00:13:ef:40:08:98
rtl8192cu: Board Type 0
rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
ieee80211 phy7: Selected rate control algorithm 'rtl_rc'
rtlwifi: wireless switch is on
rtl8192cu: MAC auto ON okay!
rtl8192cu: Tx queue select: 0x05

Didn’t make any difference.

Creating a bridge

This is the really manual route, based upon this page.

Basically,

# brctl addbr br0
# brctl setfd br0 0
# brctl addif br0 eth0
# brctl addif br0 wlan0
# ifconfig br0 10.1.1.123 netmask 255.255.255.0
# ifconfig br0 up

The second command sets the forward delay to zero, to prevent problems on the first connection, as mentioned on this page.

One can take a look on the status with

# brctl show
bridge name    bridge id        STP enabled    interfaces
br0        8000.00241dd37e38    no        eth0
                                          wlan0

To remove the bridge:

# ifconfig br0 down
# brctl delbr br0

Plain-text mail from Thunderbird (under Linux)

Introduction

I’ve been annoyed for quite a while by Thunderbird’s strong inclination towards HTML mail. To the extent that if I don’t really, really verify that a mail goes out in plain text, it’s probably going to slip out in HTML. This is bad in particular when sending mails to Linux-related mailing lists. They don’t like it. And the truth is that I’m not very fond of them either, but I usually don’t care.

There’s an add-on for this, Outgoing Message Format, but I run a version of Thunderbird that is too old for that, and trying to fool Thunderbird into installing it by changing the add-on’s version requirement field ended up with an add-on that does nothing.

Upgrading was not an attractive direction: If I’m happy with a tool except for one thing, I’ll fix that thing. Upgrading tends to fix that thing but create a new problem. On a good day.

It turned out to be extremely difficult to convince Thunderbird stopping with that. My notes while trying below.

Note to self: To find the entire hack history, search your “Sent” box for “Thunderbird plain text hacks” in the subject.

Remove the HTML composition capability completely

Ths method makes it impossible for a certain mail identity to compose HTML mails. Go to Preferences > General > Config Editor… and agree to be careful.

mail.identity.id1.compose_html: Set from true to false.

In internal JavaScript code, these preferences are fetched with getPref() commands.

Fixing Thunderbird from within

After wasting a lot of time on this, I reached the conclusion, that the problem was that quite a few components in Thunderbird’s script environment push the HTML format for various reasons. These are apparently ugly hacks that solved a problem for someone in the far past, and remained there, because noone noticed them or understood exactly what they do, possibly including whoever wrote them in the first place.

The solution was a counter-hack. Basically, hide the relevant menu’s IDs from other scripts and set the default to “Plain text”. This requires opening a JAR, making a few fixes in a couple of files, and packing it up again.

So let’s get to it. In a fresh directory,

$ jar xf /usr/lib64/thunderbird-3.0/chrome/messenger.jar

and edit ./content/messenger/messengercompose/messengercompose.xul, in the part saying

<menu id="outputFormatMenu" label="&outputFormatMenu.label;" accesskey="&outputFormatMenu.accesskey;" oncommand="OutputFormatMenuSelect(event.target)">
 <menupopup id="outputFormatMenuPopup">
 <menuitem type="radio" name="output_format" label="&autoFormatCmd.label;" accesskey="&autoFormatCmd.accesskey;" id="format_auto" checked="true"/>
 <menuitem type="radio" name="output_format" label="&plainTextFormatCmd.label;" accesskey="&plainTextFormatCmd.accesskey;" id="format_plain"/>
 <menuitem type="radio" name="output_format" label="&htmlFormatCmd.label;" accesskey="&htmlFormatCmd.accesskey;" id="format_html"/>
 <menuitem type="radio" name="output_format" label="&bothFormatCmd.label;" accesskey="&bothFormatCmd.accesskey;" id="format_both"/>
 </menupopup>
 </menu>

The idea is to hide the elements from any script, except the one that responds to changes in this menu. Also, change the default from “Auto detect” to “plain text”. After the change we have

<menu id="my_outputFormatMenu" label="&outputFormatMenu.label;" accesskey="&outputFormatMenu.accesskey;" oncommand="OutputFormatMenuSelect(event.target)">
 <menupopup id="outputFormatMenuPopup">
 <menuitem type="radio" name="output_format" label="&autoFormatCmd.label;" accesskey="&autoFormatCmd.accesskey;" id="my_format_auto"/>
 <menuitem type="radio" name="output_format" label="&plainTextFormatCmd.label;" accesskey="&plainTextFormatCmd.accesskey;" id="my_format_plain" checked="true"/>
 <menuitem type="radio" name="output_format" label="&htmlFormatCmd.label;" accesskey="&htmlFormatCmd.accesskey;" id="my_format_html"/>
 <menuitem type="radio" name="output_format" label="&bothFormatCmd.label;" accesskey="&bothFormatCmd.accesskey;" id="my_format_both"/>
 </menupopup>
 </menu>

Note the “my_” prefixes on the IDs + that the “checked” attribute has moved.

This leaves a few changes in the only script that should deal with this, ./content/messenger/messengercompose/MsgComposeCommands.js: In

In ComposeStartup(),

document.getElementById("outputFormatMenu").setAttribute("hidden", true);

is replaced with

document.getElementById("my_outputFormatMenu").setAttribute("hidden", true);

and likewise, in OutputFormatMenuSelect()

if (msgCompFields)
 switch (target.getAttribute('id'))
 {
 case "format_auto":  gSendFormat = nsIMsgCompSendFormat.AskUser;     break;
 case "format_plain": gSendFormat = nsIMsgCompSendFormat.PlainText;   break;
 case "format_html":  gSendFormat = nsIMsgCompSendFormat.HTML;        break;
 case "format_both":  gSendFormat = nsIMsgCompSendFormat.Both;        break;
 }

is replaced with

if (msgCompFields)
 switch (target.getAttribute('id'))
 {
 case "my_format_auto":  gSendFormat = nsIMsgCompSendFormat.AskUser;     break;
 case "my_format_plain": gSendFormat = nsIMsgCompSendFormat.PlainText;   break;
 case "my_format_html":  gSendFormat = nsIMsgCompSendFormat.HTML;        break;
 case "my_format_both":  gSendFormat = nsIMsgCompSendFormat.Both;        break;
 }

Finally remove a single line that fiddles with the default (harmless now, but why leave it there…). In the definition of gComposeRecyclingListener, remove this line

document.getElementById("format_auto").setAttribute("checked", "true");

And that’s it.

and then repackage the Jar archive

$ jar cf messenger.jar content

Close Thunderbird, overwrite the original Jar file with the amended one (make a backup copy first, of course) and restart Thunderbird.

I should add, that there are several reasons to be surprised that this is enough. For example, while working on this, I noted that there are several direct calls to OutputFormatMenuSelect(), that attempt to fake a click on one of the HTML-enabling radio buttons. In the aftermath, plain text messages are generated even if this isn’t addressed directly.

Other stuff

During the process of figuring out how to solve this issue, I found a few tricks that may be useful in the future. So here they are

Open all jars you can find

$ find /usr/lib64/thunderbird-3.0/ -iname '*.jar' | while read i ; do ( mkdir "${i##*/}" && cd "${i##*/}" && jar xf "$i" ; ) done

This opens each jar in a directory holding its name (including the .jar suffix)

Set the default HTML format

mail.default_html_action: Set from 3 to 1. Seems not to have a significant effect.

Enabling the dump() command

dump() is used in internal Javascript code to produce debug messages, which are printed to stdout. This requires running Thunderbird from the command line.

In the Config Editor mentioned above, add the boolean browser.dom.window.dump.enabled and set it to true. Otherwise nothing is printed.

Creating stack traces

function DumpTrace()
{
 var err = new Error();

 dump("\nStack trace:\n" + err.stack + "\n\n");
}

The stack trace is pretty ugly, and contains a DumpTrace() too, but it’s good enough to find out why a certain function is called.

 

Wine: Picasa failed to start Fedora 12 after a kernel upgrade

I upgraded my kernel from 2.6.35 to 3.12, and Picasa 2.7 failed to start. Instead of starting, tons of winedbg processes were created at a rapid speed. If I didn’t kill everything related to Picasa within a minute or so (that is, all winedbg processes and any process having “picasa” in the string of “ps ax”) a reboot became imminent, as the system’s process table became full.

Since I’m not very fond of upgrades in general (Fedora 12, anyone?), and Google has ceased to support Picasa for Linux anyhow, I worked a bit on this.

It turned out, that going

$ wine "/opt/picasa/wine/drive_c/Program Files/Picasa2/Picasa2.exe"

did actually run Picasa, but with all my settings gone. No wonder. By default, Wine keeps its fake Windows environment under $HOME/.wine but $HOME/.picasa is the place used by Picasa.

So what about

$ WINEPREFIX=$HOME/.picasa wine "/opt/picasa/wine/drive_c/Program Files/Picasa2/Picasa2.exe"

YEY, it worked! But there were a few error messages, that may or may not be an issue:

wine: cannot find L"C:\\windows\\system32\\wineboot.exe"
err:process:start_wineboot failed to start wineboot, err 2
fixme:actctx:parse_depend_manifests Could not find dependent assembly L"Microsoft.Windows.Common-Controls" (6.0.0.0)
fixme:ntdll:find_reg_tz_info Can't find matching timezone information in the registry for bias -120, std (d/m/y): 26/10/2014, dlt (d/m/y): 28/03/2014
fixme:ole:CoResumeClassObjects stub
fixme:win:FlashWindowEx 0x329e54
Bogus message code 1006
Bogus message code 1006
Bogus message code 1006
Bogus message code 1006
Not a JPEG file: starts with 0x91 0x6b
Not a JPEG file: starts with 0xb5 0xea

[ ... tons of these ... ]

fixme:wininet:InternetSetOptionW Option INTERNET_OPTION_CONNECT_TIMEOUT (10000): STUB
fixme:wininet:InternetSetOptionW INTERNET_OPTION_SEND/RECEIVE_TIMEOUT 10000
fixme:wininet:set_cookie httponly not handled (L"HttpOnly")
fixme:wininet:set_cookie httponly not handled (L"HttpOnly")
fixme:wininet:InternetSetOptionW Option INTERNET_OPTION_CONNECT_TIMEOUT (10000): STUB
fixme:wininet:InternetSetOptionW INTERNET_OPTION_SEND/RECEIVE_TIMEOUT 10000
err:wininet:NETCON_secure_connect SSL_connect failed: 12045
fixme:wininet:INET_QueryOption Stub for 32
fixme:file:MoveFileWithProgressW MOVEFILE_WRITE_THROUGH unimplemented
fixme:file:MoveFileWithProgressW MOVEFILE_WRITE_THROUGH unimplemented
fixme:file:MoveFileWithProgressW MOVEFILE_WRITE_THROUGH unimplemented

[ .. quite a few of these too ... ]

And still, I wondered why it failed before.

Picasa’s log file

When executed using Picasa’s as in the default installation (running /opt/picasa/bin/picasa), a diagnostic log is created in $HOME/.picasa/picasa.log which said

modify_ldt: Invalid argument
modify_ldt: Invalid argument
modify_ldt: Invalid argument
modify_ldt: Invalid argument
modify_ldt: Invalid argument
wine: Unhandled page fault on write access to 0x00000010 at address 0x7ee95bd2 (thread 001d), starting debugger...

An error and then a debugger starting. Sounds familiar…

It turns out that there is indeed an issue with recent kernels, as one can see here, here and here (the two last links are to the same thread in LKML). Indeed, there’s a patch called “x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels” by H. Peter Anvin, which was targeted for 3.13 kernels, but also made its way to 3.12 stable, which I’m using. So much for “stable”. It’s a security patch, I suppose, and yet I would have lived well without it.

What the offending patch is about

The problem is that the IRET x86 machine instruction only changes the lower 16 bits of the stack pointer, if it returns to a 16-bit execution context. Since IRET is used to switch back from kernel mode to any user space program, any user space program running in 16 bit mode could get to know where the kernel’s stack is mapped (except for the lower 16 bits) just by calling the modify_ldt system call. This information could be a significant piece in the puzzle for a kernel injection exploit (but doesn’t pose a threat by itself).

This is a x86 processor bug, which has a workaround for 32-bit systems. But this workaround can be applied to 64-bit machines, which is why this patch was made. Later, Linus offered a patch that allows users to choose at runtime if they want to allow modift_ldt on 64-bit machines.

Automatic mount stops after kernel upgrade and sysfs

I really have this thing about backward compatibility, which is why I chose to enable the CONFIG_SYSFS_DEPRECATED and CONFIG_SYSFS_DEPRECATED_V2 kernel flags when compiling kernel 3.12 for Fedora 12. After all, an old distribution with a new kernel.

This turned out to be wrong: The distribution isn’t all that old, and automounting stopped to work for USB SD cards (and possibly other stuff) as a result of this. Even though running the HAL daemon (an oldie, yes…) with debug info

# hald --daemon=no --verbose=yes

showed that it detected the insertion of the USB device. And still, no automount.

On my system, two things happen when CONFIG_SYSFS_DEPRECATED and CONFIG_SYSFS_DEPRECATED_V2 are enabled:

The directories for block devices for hard disks disappear, e.g. /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda and also those for non-disk devices, e.g. /sys/devices/virtual/block/md0/ and /sys/devices/virtual/block/ram0.

Instead, they appear as e.g. /sys/block/sda, with a completely different outline of files. According to the Kconfig help, the parameters that are exposed are more prone to changes over time. In the non-deprecated format, there’s a symbolic link to the respective directory in /sys/devices/ instead of the full-blown directories that appear there in the deprecated mode.

So the CONFIG_SYSFS_DEPRECATED_V2 enables this feature by default, which was a mistake. To spare myself the kernel recompilation, I added sysfs.deprecated=0 to the kernel command line, and things went back to normal.

VMware Player or Workstation: Patching for Linux kernel 3.12 (or so)

For a reason not so clear to me, VMware doesn’t keep its drivers up to date with newer kernels, so they fail to compile against newer kernels. Consequently, there’s an insane race to patch them up. It starts with a compilation failure at the GUI level, and sooner or later it becomes clear that there’s no choice but to get down to work.

The procedure, if you don’t want to install VMware on the computer you’re working on, is to first extract the files with something like (as a non-root user)

$ bash VMware-Player-6.0.2-1744117.x86_64.bundle --extract newplayer

which just writes the files into a new directory newplayer/.

Next step is to untar the relevant directories into a fresh working directory. May I suggest:

for i in /path/to/newplayer/vmware-vmx/lib/modules/source/*.tar ; do tar -xvf $i ; done

This creates five new directories, each containing a Makefile and kernel module sources. In principle, the goal is to type “make” in all five and not have an error.

That’s where the headache is. I’ve packed up a set of patches that took me from Player 6.0.2 (or Workstation 10.0.2) to kernel 3.12 (in a rather messy way, but hey, nothing about it is really in order): Here they are as a tarball.

Once that is done, wrap it all up with

$ for i in vmblock vmci vmmon vmnet vsock ; do tar -cf $i.tar $i-only ; done

and copy the *.tar files into /usr/lib/vmware/modules/source/ (actually, replace the existing files) in an existing installation.

And then just run VMware as usual, and behave nicely when it wants to install modules.

Or if you want to see it with your own eyes (as root):

# vmware-modconfig --console --install-all

VMCI issues

As if it wasn’t enough as is, there was a problem with running the VMCI:

Starting VMware services:
 Virtual machine monitor                                 [  OK  ]
 Virtual machine communication interface                 [FAILED]
 VM communication interface socket family                [  OK  ]
 Blocking file system                                    [  OK  ]
 Virtual ethernet                                        [  OK  ]
 VMware Authentication Daemon                            [  OK  ]

And of course, so virtual machine would agree to run this way.

After a lot of messing around, it turned out that for some reason, the module wasn’t installed. Go figure.

The solution was to go back to the vmci-only directory (one of the untarred one above), compile it with “make” (it should work by now, after all), and then copy it to the running kernel’s module repository:

# cp vmci.ko /lib/modules/$(uname -r)/kernel/drivers/misc/
# depmod -a

Or maybe just create a kernel/drivers/vmware directory and copy all *.ko files that were compiled, and depmod.

I. Never. Want. To hear. About. This. Again.

Kernel compilation without extra “+” or other markers in the version string

So there’s this “+” sign added to the kernel version (as displayed with uname -r) when the kernel is compiled with a git tree that doesn’t sit on an official version (or more precisely, not on an annotated tag). Which kinda makes sense to tell the kernel’s users that the kernel isn’t exactly the vanilla thing. Only it annoys me. Partly because modules, which are compiled under the kernel headers that are extracted from this kernel (if stored separately) will not match — their version will lack the “+” sign, since there is no git repo attached to the headers.

For a while I used to move the .git directory to .gitt before compiling, and then back again afterwards to prevent this plus sign from appearing on my kernels. And then I decided to really fix it.

I’d also mention that assigning KERNELVERSION on the make command doesn’t prevent the “+” sign.

So this is what to do: Edit scripts/setlocalversion and add the line marked in red, somewhere around line 38.

[ ... ]

scm_version()
{
 local short
 short=false

 cd "$srctree"

 return; # Do nothing.

  [ ... ]
}

That’s it. The git’s state is ignored, no more “+” anymore. The trick is to return immediately from the function that is supposed to add these extra version markers as necessary. So it doesn’t get the chance.

Checking my mouse wheel on Linux

I had some problem with my mouse wheel (on Microsoft Wireless Mobile mouse 3500, by the way): The scrolling would sometimes be jerky. I was turning the wheel to scroll down, and there were small jumps upwards. How annoying is that?

But how could I know if it’s the mouse’s fault, or maybe a software issue? Look at the raw mice events, of course (as root).

# hexdump -v -e '3/1 "%02x " "\n"' /dev/input/mice

But this is no good. The data that represents the mouse movements appears to be useful, but turning the mouse wheel it says

08 00 00
08 00 00
08 00 00
08 00 00

regardless of whether it’s rolled up or down. That’s because the three-byte PS/2 protocol doesn’t support a mouse wheel. Hmmm…

So the EVDEV interface has to be used.

Program follows, suppose it’s saved as event.c. Note that it filters out anything but mouse wheel events.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <linux/input.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
  int fd;
  struct input_event event_data;

  if((fd = open(argv[1], O_RDONLY)) == -1) {
    perror("opening device");
    exit(EXIT_FAILURE);
  }

  while (read(fd, &event_data, sizeof(event_data))) {
    if ((event_data.code == 8) && (event_data.type == 2) &&
	((event_data.value == 1) || (event_data.value == -1)))
      printf("time %ld.%06ld: Value=%d\n",
	     event_data.time.tv_sec, event_data.time.tv_usec,
	     event_data.value);
  }
  return 0;
}

Compile it with (say)

$ gcc -o event -O3 -Wall event.c

And then, as root, first find which of the event files is related to the mouse by using hexdump and moving the mouse. When choosing the right one, a lot of text appears. In my case, it was

# hexdump -C /dev/input/event6

So now run with the right argument:

./event /dev/input/event6

and in my case I rolled the wheel downwards getting

time 1400332306.969728: Value=-1
time 1400332306.985722: Value=-1
time 1400332307.001774: Value=-1
time 1400332307.018793: Value=-1
time 1400332307.057738: Value=1
time 1400332307.465760: Value=-1
time 1400332307.506717: Value=-1
time 1400332307.521738: Value=-1
time 1400332307.545747: Value=-1
time 1400332307.570738: Value=-1

Aha! Time to buy a new mouse!

Vivado: An ISE guy’s exploration notes

Just a few things I wrote down for myself, as I got acquainted with Vivado 2014.1 (having worked with ISE until now).

Kicking off in Linux

$ cd trashcan-directory
$ . /path/to/Vivado_2014.1/Vivado/2014.1/settings64.sh
$ vivado &

or running Vivado with a certain Tcl script

$ vivado -mode batch -source /path/to/project.tcl

or implementing a project without GUI:

$ vivado -mode tcl

and then at Vivado’s Tcl prompt:

open_project /path/to/example.xpr
launch_runs -jobs 8 impl_1 -to_step write_bitstream
wait_on_run impl_1
exit

Of course, this can be the content of a file, in which case the -batch is used to run it (the “exit” at the end is then redundant).

In batch mode, Tcl code is printed out as it’s executed. The -notrace flag on vivado’s command line shuts this off. This flag also applies to the “source” command inside a Tcl script.

Jots

  • Vivado is very sensitive to which directory it’s run from. Not just journal files, but also Tcl script executions. New projects are created in that directory by default.
  • One can’t just move a Vivado project. Many file dependencies are internally stated as absolute paths (IP in particular)
  • “reset_project” in Vivado’s Tcl console is probably the parallel to “Cleanup Project files”. Not really, but close.
  • The instantiation template of a Block Design (e.g. “system”) is given as e.g. hdl/system_wrapper.v in the block design’s dedicated directory.
  • Make sure that the flow navigator is at the left. Or restore it with “View > Show Flow Navigator”. Otherwise it gets confusing.
  • Vivado has no problem importing NGC files (synthesized netlists from ISE) and EDIFs. As a matter of fact, NGC files are translated into EDIFs by Vivado automatically as the Vivado project is implemented.
  • DCP = Design checkpoint. It’s a file blob resulting from the partial implementation of a the project or its sub-blocks (e.g. IPs). It’s a plain ZIP file, so you can change its suffix to .zip and open it as usual. Some of the information inside is readable, e.g. an EDIF representation of the design (in post-synthesis DCPs), XDC files containing the constraints as interpreted by Vivado, stub files etc.
  • For the optimistic: To import some settings from an XPS .xml file: Right-click the Zynq block, pick Customize block, and pick Import XPS Settings (at the top), and pick .xml file. Use the file in system/data, because the exported file is empty(!). Only some attributes are imported. This doesn’t work very well.
  • To import an IP from another path (packed as IP), go to IP Catalog, right-click > IP Settings… > Add Repository
  • In an IP, all information is stored in component.xml. Interfaces, ports, file names. Everything. More about it in this post.
  • But the file to be included in a project for defining the IP is the XCI file. The component.xml file is generated as Vivado “unpacks” the project.
  • Vivado’s IP core definitions are at /path/to/Vivado_2014.1/Vivado/2014.1/data/ip/xilinx/
  • When packaging an existing core, use “package your project” and not “Create a new AXI4 peripheral”, even if a new AXI peripheral is created. The latter creates an empty core.
  • Definitions of bus interfaces (“conduits”) can be found in /path/to/Vivado_2014.1/Vivado/2014.1/data/ip/interfaces/. According to this thread and this one, there is probably no sane way to make custom interfaces.
  • Unlike ISE, don’t include a black box HDL file in the sources when an IP is used. Just add the IP’s XCI file. There should be no system.v, no fifo32x512.v etc.
  • In XCI/XML files, $PPRDIR can be used to represent the project’s root directory (where the .xpr is), possibly in conjunction with ../../-kind of paths for directories above it. This is true for paths given without quotes as well (e.g. USER_REPO_PATHS in XCI files).
  • In a block design involving a Zynq processor, protocol converters are found in the .bd file as “axi_protocol_converter” and with names going e.g. “system_auto_pc_19″. In general, “pc” and “auto_pc” are related to protocol conversion.
  • Unlike ISE, there is no such thing as “unrelated clocks” in Vivado. If two clocks are from origins with unknown relations, they are considered “Timed” and “unsafe” in the Clock Interaction report. The relevant paths will be calculated, and are likely to fail timing. To handle this, either make a false path between each pair of such clocks, or use set_clock_groups to tell set up groups of clock that are mutually unrelated.
  • There was a problem with connecting KC705 to Vivado 2015.2 over JTAG. Xilinx’ JTAG cable worked fine, and KCU105 too. Comparing with my laptop computer, which worked fine with the same Vivado version, it turned out that when things work properly, /dev/ttyUSB0 appears briefly and is soon removed, and only /dev/ttyUSB1 is left when the KC705 card is connected. On the computer it didn’t work on, both /dev/ttyUSB0 and /dev/ttyUSB1 were left. The dmesg log showed that the proper sequence is that the first USB interface is “disconnected” almost immediately after connection. This sounds very much like a Cypress EasyUSB re-enumeration that didn’t complete properly on the faulty computer. My hunch is that’s it’s a firmware loading issue, but I didn’t follow this up. On both computers the permissions were the same (rw for root user and group, the latter was dialout). It’s not the issue.

Some Tcl insights

There’s a post about setting up a Vivado project from scratch using Tcl. And another post on XDC constraints.

  • File > Write Project tcl… is very useful for getting the project in a nutshell. There’s also File > Export… > Export Block Design, which creates a Tcl file that sets up the block design (command write_bd_tcl).
  • The real Tcl documentation is with “help” commands at Tcl prompt, e.g. “help startgroup”. Or UG835.
  • Run a Tcl script:
    source {/path/to/script.tcl}
  • To test “get_ports” and such, open “Elaborated Design” (on the Flow navigator) or use “open_run synth_1 -name netlist_1″. get_ports and get_pins etc. work on that set. Or all properties with
    set x [get_port "DDR_WEB"]; foreach p [list_property $x] { puts "$p = [ get_property $p $x ]" }
  • The Tcl script for a certain run is given as e.g. project.runs/impl_1/ptoject.tcl. There’s also a runme.sh file in the same directory, which can run standalone for that run.
  • Using launch_runs in batch mode will cause the shell prompt to be returned before the implementation is done. Use wait_on_run to block until the run is done.
  • Obtain a timing report on paths (50 worst) going through a certain set of nets:
    report_timing -name my_timing_report -nworst 50 -through [get_nets -hier -filter {name=~*/wr_client_0_ins/*}]

    or from a set of cells:

    report_timing -name my_timing_report -nworst 50 -from [get_cells -hier -filter {name=~*/wr_client_0_ins/*}]

    Note that the -name parameter causes the output to go to a timing tab in the GUI. Without it, a textual report goes to the console. Or if -file is specified, to a file.
    And report all I/O ports’ timing:

    report_timing -delay_type min_max -max_paths 100 -name timing_1 -to [get_ports -filter {direction==out}]
    report_timing -delay_type min_max -max_paths 100 -name timing_2 -from [get_ports -filter {direction==in}]
  • Search for nets matching a name patters (where “*” may substitute slashes):
    join [ get_nets -match_style ucf */sys_clk* ] "\n"
  • Get the cells touching a net (including the hierarchy) or the BELs:
    join [get_cells -of_objects [ get_nets -match_style ucf */pcie_ref_clk* ] ] "\n"
    join [get_bels -of_objects [ get_nets -match_style ucf */pcie_ref_clk* ] ] "\n"

The PS pins bug

The ARM processor in a Zynq architecture (“PS”) has many pins that are connected directly to the processor’s hard IP core, without any relation to the FPGA (“PL”) logic fabric. The obvious evidence to this fact is that the processor is able to boot before the PL part is configured. The attributes of these pins are controlled completely by the ARM processor’s registers, and regardless of the PL configuration, if any.

And still, Xilinx has chosen to include these pins in the processor’s model in the logic design. In sample HDL designs, these pins are wired from the processor’s instance to the toplevel ports, as if it mattered. This wiring is completely meaningless to the bitstream that is compiled from the HDL. In fact, it’s probably meaningless in any respect.

As for Vivado, it seems like four signals, PS_CLK, PS_PORB, PS_SRSTB, and DDR_WEB (plus sometimes DDR_CLK) are treated, at some stages, as general I/Os and not direct connections to the PS7. This is evident in the toplevel schematic view of the project, where IBUFs an OBUFs are added to the PS_ ports, and DDR_CLK’s port is left floating. This causes Vivado to allocate “real” PL I/O pins to these signals, and then complain about them being unplacable and/or unroutable. The simple solution to this is to disconnect these signals from the toplevel port, possibly by turning them to wires, rather than ports, on the toplevel module. Vivado responds with treating these as useless signals, and therefore doesn’t bother itself any more with them. Except for complaining that the placement constraints in the processor-dedicated XDC constraint file can’t be applied.

In fact, there is no reason to connect any of these PS signals to real ports in Vivado. Having said that, ISE related tools may complain if these signals aren’t connected to ports, in particular when the toplevel module is in VHDL.

So the name of the game is to connect those PS signals to the toplevel port, or disconnect them, whatever makes the tools happy. It doesn’t make any real difference anyhow.

Getting the frequency of a clock

I wanted to be sure that the DDR memory controller’s user interface frequency was what I thought it was. That turned out no to be so simple. I implemented the example design, and tried to find what stands behind c0_clk or aclk or c0_ui_clk or any of the names it has. But in Vivado, nets tend to have many names (“aliases”), depending on their names in different modules. Someone must have thought it’s a great idea.

True, it’s possible to get a list of clocks in the timing report, or use the Tcl command

report_clocks

but there was no sign of anything resembling the known names mentioned above. Now, there must be a way better than what I did, but this is what I can suggest: Pick a register that is clocked by the desired clock. For example, in mig_7series_v2_0_tg.v, it says

  always @(posedge clk) begin
    if (tg_state[TG_IDLE] | tg_state[TG_UPDT_CNTR])
      curr_rd_ptr <= 1'b0;
    else if (cmd_rd_en)
      curr_rd_ptr <= ~curr_rd_ptr;
  end

and clk is known to be the AXI clock.

Now look it up. Method #1 is to find the clocks attached to a cell: In the Tcl console

get_clocks -of_objects [get_cells -hierarchical -filter {name=~*cmd_rd_en*}]

Assuming that only one clock is given, look it up in the list of clocks of the Timing Summary Report.

To be absolutely sure, let’s go to method #2: Produce a mini timing report for the cell. Again, in the Tcl console (the get_cell query is equivalent, given here for variety):

join [ get_cells -match_style ucf *cmd_rd_en* ] "/n"
c0_u_axi4_tg_inst/traffic_gen_inst/cmd_rd_en_i_1/nc0_u_axi4_tg_inst/traffic_gen_inst/cmd_rd_en_reg/nc1_u_axi4_tg_inst/traffic_gen_inst/cmd_rd_en_i_1__0/nc1_u_axi4_tg_inst/traffic_gen_inst/cmd_rd_en_reg

OK nice, there’s one single register with matching that name. It’s a bit hard to be sure we’re exactly on the spot, but that is soon resolved.

To get the clock information, get specific timing info for that specific cell:

report_timing -from [ get_cells -match_style ucf *cmd_rd_en* ]

[ ... ]

Timing Report

Slack (MET) :             5.327ns  (required time - arrival time)
  Source:                 c1_u_axi4_tg_inst/traffic_gen_inst/cmd_rd_en_reg/C
                            (rising edge-triggered cell FDRE clocked by clk_pll_i_1  {rise@0.000ns fall@3.000ns period=6.000ns})
  Destination:            c1_u_axi4_tg_inst/traffic_gen_inst/wr_proc_reg/D
                            (rising edge-triggered cell FDRE clocked by clk_pll_i_1  {rise@0.000ns fall@3.000ns period=6.000ns})
  Path Group:             clk_pll_i_1
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            6.000ns  (clk_pll_i_1 rise@6.000ns - clk_pll_i_1 rise@0.000ns)
  Data Path Delay:        0.633ns  (logic 0.266ns (42.005%)  route 0.367ns (57.995%))
  Logic Levels:           1  (LUT4=1)

[ ... ]

The full output also showed the worst timing path, but that’s not the point.

What we have here is the source and destination points in human-readable format, so there’s no confusion about which register we’re talking about. And there’s the definition of the clock, and its period, 6 ns (scroll to the far right). So there we have it. The AXI clock, in this case, runs at 166.66 MHz.

Getting a list of paths between clocks

Knowing which clock is which, this information can be used to list paths that cross clock domains. This is useful to verify that no mixed-clock path has sneaked in inadvertently. For example,

join [get_timing_paths -filter {name=~*/rd_client_7_ins/*} -max_paths 100000 -from [get_clocks clk_pll_i] -to [get_clocks clkout2]] "\n"
WARNING: [Timing 38-164] This design has multiple clocks. Inter clock paths are considered valid unless explicitly excluded by timing constraints such as set_clock_groups or set_false_path.
{mem_top_ins/rd_client_7_ins/localreset_reg/C --> mem_top_ins/rd_client_7_ins/localreset_sync_reg/D}
{mem_top_ins/rd_client_7_ins/localreset_reg/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/rstblk/ngwrdrst.grst.g7serrst.rd_rst_asreg_reg/PRE}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/wr_pntr_gc_reg[2]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].rd_stg_inst/Q_reg_reg[2]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/wr_pntr_gc_reg[4]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].rd_stg_inst/Q_reg_reg[4]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/wr_pntr_gc_reg[6]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].rd_stg_inst/Q_reg_reg[6]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/wr_pntr_gc_reg[7]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].rd_stg_inst/Q_reg_reg[7]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/wr_pntr_gc_reg[1]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].rd_stg_inst/Q_reg_reg[1]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/wr_pntr_gc_reg[0]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].rd_stg_inst/Q_reg_reg[0]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/wr_pntr_gc_reg[3]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].rd_stg_inst/Q_reg_reg[3]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/wr_pntr_gc_reg[5]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].rd_stg_inst/Q_reg_reg[5]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/wr_pntr_gc_reg[8]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].rd_stg_inst/Q_reg_reg[8]/D}
join [get_timing_paths -filter {name=~*/rd_client_7_ins/*} -max_paths 100000 -to [get_clocks clk_pll_i] -from [get_clocks clkout2]] "\n"
WARNING: [Timing 38-164] This design has multiple clocks. Inter clock paths are considered valid unless explicitly excluded by timing constraints such as set_clock_groups or set_false_path.
{mem_top_ins/rd_client_7_ins/consumed_toggle_reg/C --> mem_top_ins/rd_client_7_ins/consumed_toggle_sync_reg[0]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_reg[8]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_reg_reg[8]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_reg[5]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_reg_reg[5]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_reg[6]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_reg_reg[6]/D}
{mem_top_ins/reposition_7_ins/allow_new_frame_reg/C --> mem_top_ins/rd_client_7_ins/allow_new_frame_d_reg/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_reg[4]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_reg_reg[4]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_reg[1]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_reg_reg[1]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_reg[3]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_reg_reg[3]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_reg[7]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_reg_reg[7]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_reg[0]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_reg_reg[0]/D}
{mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_reg[2]/C --> mem_top_ins/rd_client_7_ins/fifo_wide_rd/U0/inst_fifo_gen/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_reg_reg[2]/D}

Important: It seems like -max_paths counts paths prior to filtering (on Vivado 2014.1), which is why several paths shown above didn’t appear when it was set to 20. So to be sure to cover all paths, eventually give a very large number, as shown above.

Note that there’s an implicit -hier flag enabled, so * matches across hierarchies. As seen in the example above, it’s enough that one of the endpoints match the string. The paths shown in the example above are ignored for timing, so this proves that all paths should appear, even those not timed.

Programming the FPGA on Linux

The interface in Vivado with hardware cables is through two servers: vcse_serv which listens to TCP port 60001 and hw_server listening to TCP port 3121. These are usually started automatically when Vivado is launched, but they can be kicked off manually with the hw_server command, in particular when the server needs to be accessed from a remote computer.

A common problem is that the server runs with the privileges of the user who started Vivado, but accessing the device files that are related to the configuration cable requires root access. Vivado will say that “There is no valid target connected to the server”. In other words, no card was detected — because the server wasn’t granted access.

Recent development boards arrive with a USB plug for direct connection to the PC, so a PC cable isn’t required. For the KC705 board, the USB support origins from Digilent.

There is an explanation for how to install the drivers on this page. It tells me to change directory to {some path}/cable_drivers/lin64/digilent/ and run ./install_digilent.sh as root. I don’t run stranger’s scripts as root. But in this case it isn’t too bad. It just says

cp -f 52-xilinx-digilent-usb.rules /etc/udev/rules.d/52-xilinx-digilent-usb.rules
chmod 644 /etc/udev/rules.d/52-xilinx-digilent-usb.rules

Looking at the udev file in question, it merely allows read-write access to everyone (mode 0666) for device files generated by devices with vendor ID 0x1443 or vendor ID 0x0403 and the Manufacturer string set to “Digilent”. Seems harmless enough to me.

After installing the udev rule, disconnect and reconnect the cable, and kill the two server processes. After this, Vivado detects the card OK. In fact, going

$ killall hw_server

seems to be necessary every now and then, when the card has gone off and on again, and isn’t detected.

Or maybe… I should have run the all-driver installation script to begin with? First fxload and libusb (is it necessary?)

# apt-get install fxload libusb-dev

Under Vivado/2014.1/data/xicom/cable_drivers/lin64/install_script/install_drivers/, as root:

# ./install_drivers
--Install log = /.xinstall/install.log
--Installing cable drivers.
--Script name = ./install_drivers
--HostName = .....
--Current working dir = ...
--Kernel version = 3.13.0-35-generic.
--Arch = x86_64.
--Installer version = 1101
--Unsetting ARCH environment variable.
--User has root permission.
--Installing USB drivers------------------------------------------
--File /etc/hotplug/usb/xusbdfwu.fw/xusbdfwu.hex does not exist.
--File version of /etc/hotplug/usb/xusbdfwu.fw/xusbdfwu.hex = 0000.
--Updating xusbdfwu.hex file.
--File /etc/hotplug/usb/xusbdfwu.fw/xusb_xlp.hex does not exist.
--File version of /etc/hotplug/usb/xusbdfwu.fw/xusb_xlp.hex = 0000.
--Updating xusb_xlp.hex file.
--File /etc/hotplug/usb/xusbdfwu.fw/xusb_emb.hex does not exist.
--File version of /etc/hotplug/usb/xusbdfwu.fw/xusb_emb.hex = 0000.
--Updating xusb_emb.hex file.
--File /etc/hotplug/usb/xusbdfwu.fw/xusb_xpr.hex does not exist.
--File version of /etc/hotplug/usb/xusbdfwu.fw/xusb_xpr.hex = 0000.
--Updating xusb_xpr.hex file.
--File /etc/hotplug/usb/xusbdfwu.fw/xusb_xup.hex does not exist.
--File version of /etc/hotplug/usb/xusbdfwu.fw/xusb_xup.hex = 0000.
--Updating xusb_xup.hex file.
--File /etc/hotplug/usb/xusbdfwu.fw/xusb_xp2.hex does not exist.
--File version of /etc/hotplug/usb/xusbdfwu.fw/xusb_xp2.hex = 0000.
--Updating xusb_xp2.hex file.
--File /etc/hotplug/usb/xusbdfwu.fw/xusb_xse.hex does not exist.
--File version of /etc/hotplug/usb/xusbdfwu.fw/xusb_xse.hex = 0000.
--Updating xusb_xse.hex file.
cat: /etc/hotplug/usb.usermap: No such file or directory
--Adding Product ID 0007 to the usermap.
--Adding Product ID 0009 to the usermap.
--Adding Product ID 000d to the usermap.
--Adding Product ID 000f to the usermap.
--Adding Product ID 0013 to the usermap.
--Adding Product ID 0015 to the usermap.
--Adding Product ID 0008 to the usermap.

--Digilent Return code = 0
--Xilinx Return code = 0
--Return code = 0
--Driver installation successful.

But that’s based upon /etc/hotplug, which Ubuntu doesn’t seem to care much about. But reading through /etc/hotplug/usb/xusbdfwu it’s quite evident how to match between the firmware files and the device IDs, leading to the conclusion that my red JTAG platform cable should be programmed with the firmware with

# fxload -v -t fx2 -D /dev/bus/usb/002/010 -I /etc/hotplug/usb/xusbdfwu.fw/xusb_xp2.hex

The bus path 002/010 was derived from its appearance in lsusb:

Bus 002 Device 010: ID 03fd:0013 Xilinx, Inc.

The interesting thing is that the device’s LED goes on only following the load of the firmware.

So let’s make a udev file of this, based upon this post: Copy-paste this into 52-xilinx-jtag-platform.rules

# udev rules for programming Xilinx' JTAG platform cables
ATTRS{idVendor}=="03fd", ATTRS{idProduct}=="0008", MODE="666"
SUBSYSTEM=="usb", ACTION=="add", ATTRS{idVendor}=="03fd", ATTRS{idProduct}=="0007", RUN+="/sbin/fxload -t fx2 -I /etc/hotplug/usb/xusbdfwu.fw/xusbdfwu.hex -D $tempnode"
SUBSYSTEM=="usb", ACTION=="add", ATTRS{idVendor}=="03fd", ATTRS{idProduct}=="0009", RUN+="/sbin/fxload -t fx2 -I /etc/hotplug/usb/xusbdfwu.fw/xusb_xup.hex -D $tempnode"
SUBSYSTEM=="usb", ACTION=="add", ATTRS{idVendor}=="03fd", ATTRS{idProduct}=="000d", RUN+="/sbin/fxload -t fx2 -I /etc/hotplug/usb/xusbdfwu.fw/xusb_emb.hex -D $tempnode"
SUBSYSTEM=="usb", ACTION=="add", ATTRS{idVendor}=="03fd", ATTRS{idProduct}=="000f", RUN+="/sbin/fxload -t fx2 -I /etc/hotplug/usb/xusbdfwu.fw/xusb_xlp.hex -D $tempnode"
SUBSYSTEM=="usb", ACTION=="add", ATTRS{idVendor}=="03fd", ATTRS{idProduct}=="0013", RUN+="/sbin/fxload -t fx2 -I /etc/hotplug/usb/xusbdfwu.fw/xusb_xp2.hex -D $tempnode"
SUBSYSTEM=="usb", ACTION=="add", ATTRS{idVendor}=="03fd", ATTRS{idProduct}=="0015", RUN+="/sbin/fxload -t fx2 -I /etc/hotplug/usb/xusbdfwu.fw/xusb_xse.hex -D $tempnode"

The platform cable’s LED should go on soon after plugging it into the computer. This should work on any platform cable.

If this doesn’t work, make sure manually that the programming with fxload works.

Programming the QSPI flash

Starting with Vivado 2014.1, the mcs file can be created with Vivado. Before this version, ISE’s promgen had to be used. Let’s assume a QPSI flash of 128M (e.g. N25Q128) connected to a Virtex device.

The command for creating an MCS file is

write_cfgmem -format mcs -interface SPIx4 -size 128 -loadbit {up 0x0 /path/to/virtex.bit} starter-virtex.mcs
Creating config memory files...
Creating bitstream load up from address 0x00000000
Loading bitfile /path/to/virtex.bit
Writing file /path/to/starter-virtex.mcs

As mentioned in the second post of this thread, the following lines should be added to a constraint file (.xdc) before implementing the design:

set_property BITSTREAM.Config.SPI_BUSWIDTH 4 [current_design]
set_property BITSTREAM.CONFIG.CONFIGRATE 50 [current_design]

The second line sets the configuration rate to 50 MHz, which is probably desirable for a quick load. The first line is required for the MCS file creating. Failing to have it will lead to the following error when attempting to build the mcs file:

write_cfgmem -format mcs -interface SPIx4 -size 128 -loadbit {up 0x0 /path/to/virtex.bit} starter-virtex.mcs
Creating config memory files...
Creating bitstream load up from address 0x00000000
Loading bitfile /path/to/virtex.bit
ERROR: [Vivado 12-3619] Cannot create SPIX4 PROM for bitfile /path/to/virtex.bit with SPI_buswidth setting of "None".
ERROR: [Common 17-39] 'write_cfgmem' failed due to earlier errors.

For those who forgot to do this, and don’t feel like re-implementing the entire design, open the implemented design, and create the bitfile with the property set.

set_property BITSTREAM.Config.SPI_BUSWIDTH 4 [current_design]
set_property BITSTREAM.CONFIG.CONFIGRATE 50 [current_design]
write_bitstream /path/to/virtex.bit

Note that using the GUI button for this purpose will not work.

It’s also possible to set the SPI_BUSWIDTH and CONFIGRATE properties by opening Tools > Edit Device Properties… while having an implemented design opened (otherwise this menu items doesn’t appear).

Open the hardware manager, and open the target, just like for programming a bitfile. Then pick “Add Configuration Memory Device” at the bottom of the Flow Navigator to the left, and choose the FPGA for which the flash should be programmed. The next popup window suggests to program the device immediately. Go for it. In the following window pick the MCS file generated above, and pick the “Erase”, “Write” and “Verify” options (“Blank Check” can be left unselected).

The programming runs in four phases:

  • Programming the FPGA with a bitfile that turns the FPGA into a programmer for its flash
  • “Step 1″: Erasing the FPGA
  • “Step 2″: Writing to the FPGA
  • “Step 3″: Verifying

All in all, 15 minutes for a 7V330 device isn’t unusual.

 

ALSA’s file plugin for playing back a raw pipe file

Motivation

On an embedded system, I have a device file /dev/xillybus_audio, which can be opened for read and/or write. One can write raw (signed 16 bit Little Endian Rate 48000 Hz stereo) samples to this file, and they’re played on a “headphones out” plug, and one can read samples of the same time, which are captured from a “mic input” plug. Clean and simple. Now let’s use that as an ALSA sound interface. This is where it doesn’t get all that simple.

How about a kernel driver for that interface? Nice idea, but the stream interface is already there. Besides, this is useful for piping with programs etc.

Attempt I

To make along story short, making /etc/asound.conf read like this, and playing back works (but capturing doesn’t!).

pcm.xillybus {
    type asym
    playback.pcm {
        type plug
        slave {
            pcm {
                type file
                file "/dev/xillybus_audio"
                slave.pcm null
                format raw
            }
            rate 48000
            format s16_le
            channels 2
        }
    }
    capture.pcm {
        type plug
        slave {
            pcm {
                type file
                file "/dev/null"
                infile "/dev/xillybus_audio"
                slave.pcm null
            }
            rate 48000
            format s16_le
            channels 2
        }
    }
}

Playback works with rates other than 48000 Hz (and other formats), because of the wrapping with the “plug” plugin.

# aplay -D "xillybus" rate8000.wav
Playing WAVE 'rate8000.wav' : Signed 16 bit Little Endian, Rate 8000 Hz, Stereo

Note that “file” — which defines the output file — must be defined or arecord (or whatever program is used) quits on a segmentation fault. Not very polished.

Capturing doesn’t work at all, however. It’s just a silence file, which grows way too fast. It has been said that the slave of the capturing device shouldn’t be null, and indeed this probably the issue.

Diving into it

The problem seems to lie in the implementation of the file capture routine. Taken from alsa-lib-1.0.27.2/src/pcm/pcm_file.c:

static snd_pcm_sframes_t snd_pcm_file_readi(snd_pcm_t *pcm, void *buffer, snd_pcm_uframes_t size)
{
    snd_pcm_file_t *file = pcm->private_data;
    snd_pcm_channel_area_t areas[pcm->channels];
    snd_pcm_sframes_t n;

    n = snd_pcm_readi(file->gen.slave, buffer, size);
    if (n <= 0)
         return n;
    if (file->ifd >= 0) {
        n = read(file->ifd, buffer, n * pcm->frame_bits / 8);
        if (n < 0)
             return n;
        return n * 8 / pcm->frame_bits;
    }
    snd_pcm_areas_from_buf(pcm, areas, buffer);
    snd_pcm_file_add_frames(pcm, areas, 0, n);
    return n;
}

This is the method, which the plugin exposes for reading samples. Note that it attempts to read the desired amount of samples from the slave first, and then attempts to fetch the same number of samples it got from the slave, from the file. This probably makes sense when reading from a plain file, because it would otherwise slurp the entire file in no-time. The slave is used as a data rate controller. Great.

Attempt II

To come around this, I changed /etc/asound.conf to this:

pcm.xillybus_raw {
                type file
                file "/dev/xillybus_audio"
                slave.pcm null
                format raw
}
pcm.xillybus_play {
        type plug
        slave {
            pcm "xillybus_raw"
            rate 48000
            format s16_le
            channels 2
        }
}

pcm.xillybus {
    type asym
    playback.pcm "xillybus_play"
    capture.pcm {
        type plug
        slave {
            pcm {
                type file
                file "/dev/null"
                infile "/dev/xillybus_audio"
                slave.pcm "xillybus_play"
            }
            rate 48000
            format s16_le
            channels 2
        }
    }
}

This isn’t perfect either. When attempting

# arecord -D "xillybus" --rate 48000 --channels 2 --format s16_le try.wav

sound is indeed recorded into try.wav. The captured sound is echoed in the headphones (with a delay), so the output interface is now busy and noisy. But worst of all, this only works if the parameters are set exactly to the sound interface’s. So I could have read directly from /dev/xillybus_audio as well.

Changes in pcm_file.c

Based upon alsa-lib-1.0.25, the following functions were changed in pcm_file.c. The intention of these changes is to detach the I/O operations from the slave, which is null in the setting of Attempt I above.

static int snd_pcm_file_drop(snd_pcm_t *pcm)
{
	return 0;
}

static int snd_pcm_file_drain(snd_pcm_t *pcm)
{
	return 0;
}

static snd_pcm_sframes_t snd_pcm_file_readi(snd_pcm_t *pcm, void *buffer, snd_pcm_uframes_t size)
{
	snd_pcm_file_t *file = pcm->private_data;
	snd_pcm_channel_area_t areas[pcm->channels];
	snd_pcm_sframes_t n;

	n = read(file->ifd, buffer, size * pcm->frame_bits / 8);
	if (n < 0)
		return n;
	return n * 8 / pcm->frame_bits;
}

static snd_pcm_sframes_t snd_pcm_file_readn(snd_pcm_t *pcm, void **bufs, snd_pcm_uframes_t size)
{
	snd_pcm_file_t *file = pcm->private_data;
	snd_pcm_channel_area_t areas[pcm->channels];
	snd_pcm_sframes_t n;

	SNDERR("DEBUG: Noninterleaved read not yet implemented.\n");
	return 0;	/* TODO: Noninterleaved read */
}

(these functions don’t appear one after the other in the source file)

Compiling to obtain libasound.so

After making the changes in pcm_file.c, compiled natively on the embedded board

# ./configure
# make -j 2

and then copy the result to the library directory:

# cp src/.libs/libasound.so.2.0.0 /usr/lib/arm-linux-gnueabihf/

This overwrites the previous file.

Plugin library issue

When attempting to use a sound interface with the new libasound, the following error occurs:

# aplay -D "xillybus" snip.wav
ALSA lib conf.c:3314:(snd_config_hooks_call) Cannot open shared library libasound_module_conf_pulse.so
ALSA lib pcm.c:2217:(snd_pcm_open_noupdate) Unknown PCM xillybus
aplay: main:682: audio open error: No such file or directory

this is because the plugin loader looks in the wrong directories: It does look in /usr/lib/arm-linux-gnueabihf/, but not in /usr/lib/arm-linux-gnueabihf/alsa-lib/.

Trying configure parameters didn’t help:

# ./configure --libdir=/usr/lib/arm-linux-gnueabihf/
# ./configure --with-plugindir=/usr/lib/arm-linux-gnueabihf/alsa-lib/

The dirty solution was to create symbolic links to all files in alsa-lib/ that aren’t symbolic links themselves with

# cd /usr/lib/arm-linux-gnueabihf
# for i in `find alsa-lib/ -type f -a ! -type l` ; do ln -s "$i" ; done

Not the most elegant solution, but after spending a couple of hours on trying to figure this out, at least it works.

This is the list of files that were symlinked:

alsa-lib/libasound_module_pcm_speex.so
alsa-lib/libasound_module_ctl_oss.so
alsa-lib/libasound_module_ctl_pulse.so
alsa-lib/libasound_module_pcm_usb_stream.so
alsa-lib/libasound_module_pcm_pulse.so
alsa-lib/libasound_module_rate_samplerate.so
alsa-lib/libasound_module_ctl_bluetooth.so
alsa-lib/libasound_module_pcm_jack.so
alsa-lib/libasound_module_pcm_upmix.so
alsa-lib/libasound_module_pcm_bluetooth.so
alsa-lib/libasound_module_conf_pulse.so
alsa-lib/libasound_module_pcm_oss.so
alsa-lib/libasound_module_rate_speexrate.so
alsa-lib/libasound_module_ctl_arcam_av.so
alsa-lib/libasound_module_pcm_vdownmix.so
alsa-lib/smixer/smixer-sbase.so
alsa-lib/smixer/smixer-ac97.so
alsa-lib/smixer/smixer-hda.so

Current position

Both record and playback work with the first asound.conf above (a.k.a. Attempt I), as long as the parameters for recording are the same. For playback, the parameters must be the 48000 Hz, s16_le but it’s fine to work in mono and stereo. If other parameters are attempted, a huge file which is filled with click sounds is created.

So

# arecord -D "xillybus" --rate 48000 --format s16_le --channels 2 good.wav
Recording WAVE 'good.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo

makes a nice sound file, but

# arecord -D "xillybus" --format s16_le --channels 2 justclicks.wav
Recording WAVE 'justclicks.wav' : Signed 16 bit Little Endian, Rate 8000 Hz, Stereo

creates a huge file with just clicks. The intriguing thing about those failing just-click scenarios, is that snd_pcm_file_readi() is never called when this happens. It looks like something in the data flow goes wrong when a rate resampler is pushed into the system. When the rate is the same, snd_pcm_file_readi() has been observed to be called in a steady way.

Both of the following are OK:

# aplay -D "xillybus" good.wav
Playing WAVE 'good.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
# aplay -D "xillybus" rate8000.wav
Playing WAVE 'rate8000.wav' : Signed 16 bit Little Endian, Rate 8000 Hz, Stereo