SOLVED: Lenovo Yoga 2 13″ with “hardware-disabled” Wifi

Overview

Having a Lenovo Yoga 2 13″ (non-pro) running Ubuntu 14.04.1, I couldn’t get Wireless LAN up and running, as the WLAN NIC appeared to be “hardware locked”. This is the summary of how I solved this issue. If you’re not interested in the gory details, you may jump right to bottom, where I offer a replacement module that fixes it. At least for me.

Environment details: Distribution kernel 3.13.0-32-generic on an Intel i5-4210U CPU @ 1.70GHz. The Wifi device is an Intel Dual Band Wireless-AC 7260 (8086:08b1) connected to the PCIe bus, taken care of by the iwlwifi driver.

The problem

Laptops have a mechanism for working in “flight mode” which means turning off any device that could emit RF power, so that the airplane can crash for whatever different reason. Apparently, some laptops have a physical on-off switch to request this, but on Lenovo Yoga 13, the arrangement is to press a button on the keyboard with an airplane drawn on it. The one shared with F7.

It seems to be, that on Lenovo Yoga 13, the ACPI interface, which is responsible for reporting the Wifi’s buttons state, always reports that it’s in flight mode. So Linux turns off Wifi, and on the desktop’s Gnome network applet it says “Wi-Fi is disabled by hardware switch”.

In the dmesg log one can tell the problem with a line like

iwlwifi 0000:01:00.0: RF_KILL bit toggled to disable radio.

which is issued by the interrupt request handler defined in drivers/net/wireless/iwlwifi/pcie/rx.c, which responds to an interrupt from the device that informs the host that the hardware RF kill bit is set. So the iwlwifi module is not to blame here — it just responds to a request from the ACPI subsystem.

rfkill

The management of RF-related devices is handled by the rfkill subsystem. On my laptop, before solving the problem, a typical output went

$ rfkill list all
0: ideapad_wlan: Wireless LAN
        Soft blocked: yes
        Hard blocked: yes
1: ideapad_bluetooth: Bluetooth
        Soft blocked: no
        Hard blocked: yes
6: hci0: Bluetooth
        Soft blocked: no
        Hard blocked: no
7: phy1: Wireless LAN
        Soft blocked: yes
        Hard blocked: yes

So there are different entities that can be controlled with rfkill, enumerated and assigned soft and hard blocks. Each of these relate to a directory in /sys/class/rfkill/. For example, the last device, “phy7″ enumerated as 7 corresponds to /sys/class/rfkill/rfkill7, where the “hard” and “soft” pseudo-files signify the status with “0″ or “1″ values.

The soft block can be changed by “rfkill unblock 0″ or “rfkill unblock 7″, but this doesn’t really help with the hardware block. Both has to be “off” to use the device.

As can be seen easily from the rkfill list above, each of the physical devices are registered twice as rfkill devices: Once by their driver, and a second time by the ideapad_laptop driver. This will be used in the solution below.

The ideapad_laptop module

The ideapad-laptop module is responsible for talking with the ACPI layer on machines that match “VPC2004″ as a platform (as in /sys/devices/platform/VPC2004:00, or /sys/bus/acpi/devices/VPC2004:00, but doesn’t fit anything found in /sys/class/dmi/id/).

Blacklisting this module has been suggested for Yoga laptops all over the web. In particular this post suggests to insmod the module once with a hack that forces the Wifi on, and then blacklist it.

But by blacklisting ideapad-laptop, the computer loses some precious functionality, including disabling Wifi and the touchpad by pressing a button. So this is not an appealing solution.

Ideapad’s two debugfs output files go:

# cat /sys/kernel/debug/ideapad/cfg
cfg: 0x017DE014

Capability: Bluetooth Wireless Camera
Graphic:
# cat /sys/kernel/debug/ideapad/status
Backlight max:	16
Backlight now:	9
BL power value:	On
=====================
Radio status:	Off(0)
Wifi status:	Off(0)
BT status:	On(1)
3G status:	Off(0)
=====================
Touchpad status:Off(0)
Camera status:	On(1)

So the Radio and Wifi statuses, which are read from the ACPI registers, are off. This makes the ideapad_laptop module conclude that everything should go off.

The solution

In essence, the solution for the problem is to take the ideapad_laptop’s hands off the Wifi hardware, except for turning the hardware block off when it’s loaded. It consists of making the following changes in drivers/platform/x86/ideapad-laptop.c:

  • First, remove the driver’s rfkill registration. Somewhere at the beginning of the file, change
    #define IDEAPAD_RFKILL_DEV_NUM	(3)

    to

    #define IDEAPAD_RFKILL_DEV_NUM	(2)

    and in the definition of ideapad_rfk_data[], remove the line saying

    { "ideapad_wlan", CFG_WIFI_BIT, VPCCMD_W_WIFI, RFKILL_TYPE_WLAN }

    This prevents the driver from presenting an rfkill interface, so it keeps its hands off.

  • There is however a chance that the relevant bit in the ACPI layer already has the hardware block on. So let’s turn it off every time the driver loads. In ideapad_acpi_add(), after the call to ideapad_sync_rfk_state(), more or less, add the following two lines:
    pr_warn("Hack: Forcing WLAN hardware block off\n");
    write_ec_cmd(priv->adev->handle, VPCCMD_W_WIFI, 1);
  • And finally, solve a rather bizarre phenomenon, that when reading for the RF state with a VPCCMD_R_RF command, the Wifi interface is hardware blocked for some reason. Note that radio is always in off mode, so it’s a meaningless register on Yoga 2. This is handled in two places. First, empty ideapad_sync_rfk_state() completely, by turning it into
    static void ideapad_sync_rfk_state(struct ideapad_private *priv)
    {
    }

    This function reads VPCCMD_R_RF and calls rfkill_set_hw_state() accordingly, but on Yoga 2 it will always block everything, so what’s the point?
    Next, in debugfs_status_show() which prints out /sys/kernel/debug/ideapad/status, remove the following three lines:

    if (!read_ec_data(priv->adev->handle, VPCCMD_R_RF, &value))
      seq_printf(s, "Radio status:\t%s(%lu)\n",
        value ? "On" : "Off", value);

Having these changes made, the Wifi works properly, regardless of it was previously reported hardware blocked.

This can’t be submitted as a patch to the kernel, because presumably some laptops need the rfkill interface for Wifi through ideapad_laptop (or else, why was it put there in the first place?).

Also, maybe I should have done this for Bluetooth too? Don’t know. I don’t use Bluetooth right now, and the desktop applet seems to say all is fine with it anyhow.

Download the driver fix

For the lazy ones, I’ve prepared a little kit for compiling the relevant driver. I’ve taken the driver as it appears in kernel 3.16, more or less, and applied the changes above. And I then added a Makefile to make it compile easily. Since the kernel API changes rather rapidly, this will probably work well for kernels around 3.16 (that includes 3.13), and then you’ll have to apply the changes manually. If it isn’t fixed in the kernel itself by then.

Download it from here, unzip it, change directory, and compile it with typing “make”. This works only if you have the kernel headers and gcc compiler installed, which is usually the case in recent distributions. So a session like this is expected:

$ make
make -C /lib/modules/3.13.0-32-generic/build SUBDIRS=/home/eli/yoga-wifi-fix modules
make[1]: Entering directory `/usr/src/linux-headers-3.13.0-32-generic'
  CC [M]  /home/eli/yoga-wifi-fix/ideapad-laptop.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /home/eli/yoga-wifi-fix/ideapad-laptop.mod.o
  LD [M]  /home/eli/yoga-wifi-fix/ideapad-laptop.ko
make[1]: Leaving directory `/usr/src/linux-headers-3.13.0-32-generic'

Then replace the fresh ideapad-laptop.ko with the one the kernel uses. First, let’s figure out where to. The modinfo command help here:

$ modinfo ideapad_laptop
filename:       /lib/modules/3.13.0-32-generic/kernel/drivers/platform/x86/ideapad-laptop.ko
license:        GPL
description:    IdeaPad ACPI Extras
author:         David Woodhouse <dwmw2@infradead.org>
srcversion:     BA339D663FA3B10105A1DC0
alias:          acpi*:VPC2004:*
depends:        sparse-keymap
vermagic:       3.13.0-32-generic SMP mod_unload modversions
parm:           no_bt_rfkill:No rfkill for bluetooth. (bool)

So the directory is now known (marked in red). This leaves us with copying it into the right place:

$ sudo cp ideapad-laptop.ko /lib/modules/3.13.0-32-generic/kernel/drivers/platform/x86/

The new module is valid on the next reboot. Or the next insmod/modprobe, if you’re have the same allergy as myself regarding rebooting a Linux system.

Thunderbird / Linux: Re-sending a sent mail

The idea is to take a mail that has already been send (and is hence in the “sent” folder and send it again with sendmail. Why? In my case the idea is that Thunderbird and sendmail connect to different relay servers, and the one used by Thunderbird 3.0.7 is blacklisted by the destination (I got a reject message).

It’s simple: Find the message in Thunderbird’s “Sent” folder, and save it as an .eml file, say, Trying.eml.

Possibly edit the file, and remove the first three lines (even though there’s probably no problem leaving them there):

X-Mozilla-Status: 0001
X-Mozilla-Status2: 00800000
X-Mozilla-Keys:

Possibly add yourself as a Bcc: after the From: line with

Bcc: Myself <myself@example.com>

And then send the message with

$ sendmail -t < Trying.eml

The -t flag means to find the recipient’s address in the message’s body, which is usually what we want.

i.MX: SDMA not working? Strange things happen? Maybe it’s all about power management.

I ran into a weird problem while attempting to enable SDMA for UARTs on an i.MX53 processor running Freescale’s 2.6.35.3 Linux kernel: To begin with, the UART would only transmit 48 bytes, which is probably a result of only one watermark event arriving (the initial kickoff filled the UART’s FIFO with 32 bytes, and then one SDMA event occurred when the FIFO reached 16 bytes’ fill, so another 16 bytes were sent).

So it seemed like the SDMA core misses the UART’s watermark events. More scrutinized experiments with my own test scripts revealed a variety of weird behaviors, including what appeared to be preemption of the SDMA script’s process, even though the reference manual is quite clear about it: Context switching of SDMA scripts is voluntary. And still, the flow of data on the UART’s tx lines was stopped for 5-6 ms periods randomly, even when I ran a busy-wait loop in the SDMA script, polling the “not full” flag of the UART’s transmission FIFO.

So it looked like something stopped the SDMA script from running in the middle of the loop (which included no “yield” nor “done” command). Or maybe a completely different issue? Maybe the peripheral bus wasn’t completely coherent? Anything seemed possible at some point.

As the title implies, the problem was power management, and poor settings of the SDMA’s behavior during low power modes.

It goes like this: Every time the Linux kernel’s scheduler has no process to run, it executes an WFI ARM processor command, halting the processor until an interrupt arrives (from a peripheral or just the scheduler’s tick clock). But before doing that, the kernel calls an architecture-dependent function, arch_idle(), which possibly shuts down or slows down clocks in order to increase power savings.

The kernel I used didn’t configure the SDMA’s behavior in the lower-power WAIT mode correctly, causing it halt and miss events while the processor was in this mode. The word is that to overcome this, the CCM_CCGR bits for SDMA clocks should be set to 11 (bits 31-30 in CCM_CCGR4). There is probably also a need to enable aips_tz1_clk to keep the SDMA and aips_tz1 clocks running. But since the application I worked on didn’t have any power restrictions, I decided to avoid these power mode switches altogether.

This was done by editing arch/arm/mach-mx5/system.c in the kernel tree, where it said:

void arch_idle(void)
{
 if (likely(!mxc_jtag_enabled)) {
   if (ddr_clk == NULL)
     ddr_clk = clk_get(NULL, "ddr_clk");
   if (gpc_dvfs_clk == NULL)
     gpc_dvfs_clk = clk_get(NULL, "gpc_dvfs_clk");
   /* gpc clock is needed for SRPG */
   clk_enable(gpc_dvfs_clk);
   mxc_cpu_lp_set(arch_idle_mode);

and delete the last line in the listing above — the call to mxc_cpu_lp_set(), which changes the processor’s power mode.

This solved the SDMA problem for me.

As a matter of fact, I would suggest commenting out this line during the development phase of any i.MX-based system, and return it once everything works. True, this shouldn’t be an issue if the clocks are properly configured. But if they’re not, something will fail, and the natural tendency is to focus the drivers of the failing functionality, and not looking for power management issues.

When the power reduction function is re-enabled at some later point, it’s quite evident what the problem is, if something fails then. So even if the target product is battery-driven, do yourself a favor, and drop that line in system.c until you’re finished struggling with other things.

Simple GPIO on Zybo using command-line on Linux

Running Xillinux on the Zybo board, this is how I toggled a GPIO pin from a plain one-liner bash script in Linux. The same technique can be used for other Zynq-7000 boards (Zedboard in particular) to easily control GPIO pins.

First, I looked up which GPIO pin it is. The pin assignments can be found in the FPGA bundle, in xillydemo.ucf (or in xillydemo.sdc, if Vivado was used to build the project).

So I choose to connect to PMOD header JB, first pin, and the PMOD’s GND.

In the UCF file there’s a line saying

## Pmod Header JB
NET PS_GPIO[32]       LOC=T20 | IOSTANDARD=LVCMOS33; #IO_L15P_T2_DQS_34

and its counterpart in the SDC file is

## Pmod Header JB
set_property -dict "PACKAGE_PIN T20 IOSTANDARD LVCMOS33" [get_ports "PS_GPIO[32]"]

So it’s quite clear and cut that the PS_GPIO[32] signal is connected to PMOD B. It doesn’t hurt taking a look on the board’s schematics as well, if you’re convenient with those drawings, and see that the Zynq device’s pin T20 indeed goes to PMOD B, and which pin.

Hooked up as shown in this pic (click to enlarge):

The offset between PS_GPIO numbers and those designated by Linux is 54. So this pin is found as number 32+54=86.

Hence

# echo 86 > /sys/class/gpio/export
# echo out > /sys/class/gpio/gpio86/direction

And then poor man’s oscillator:

# while [ 1 ] ; do echo 1 > /sys/class/gpio/gpio86/value ; echo 0 > /sys/class/gpio/gpio86/value ; done

This runs at a staggering 2.9 kHz. Pretty impressive for the slowest form of programming one can think about.

Manually installing launcher icons for Xilinx tools on a Gnome desktop

So I installed Vivado on my Centos 6.5 64-bit Linux machine, and even though it promised to install icons on my desktop, it didn’t. This is how I installed them manually. There is surely a simpler way, as the special launch bash scripts I created must be somewhere. But I didn’t bother looking.

So it consists of generating four files, all in all, as follows.

First, as root, create these two files, and make them executable by all:

/usr/local/bin/run-vivado as follows:

#!/bin/bash
. /opt/Xilinx/Vivado/2014.1/settings64.sh
vivado &

And /usr/local/bin/run-sdk:

#!/bin/bash
. /opt/Xilinx/SDK/2014.1/settings64.sh
xsdk &

The path to Xilinx’ installation is /opt/Xilinx, of course. Adjust this to where your installation was made, and you should pick the settings32.sh file if you’re running on a 32-bit machine.

And next, we have the launchers, both to be placed in the Desktop directory of the ordinary user who should have these on the desktop.

The file named “Vivado 2014.1.desktop” goes

[Desktop Entry]
Version=1.0
Type=Application
Terminal=false
Icon=/opt/Xilinx/Vivado/2014.1/doc/images/vivado_logo.ico
Name[en_US]=Vivado 2014.1
Exec=/usr/local/bin/run-vivado
Path=/home/myself/vivado-outputs/
Name=Vivado 2014.1
StartupNotify=true

and “Xilinx SDK.desktop” is

[Desktop Entry]
Version=1.0
Type=Application
Terminal=false
Icon=/opt/Xilinx/SDK/2014.1/data/sdk/images/sdk_logo.ico
Name[en_US]=Xilinx SDK
Exec=/usr/local/bin/run-sdk
Name=Xilinx SDK
StartupNotify=true

I’ve marked the StartupNotify assignment in red, because this is what makes the mouse pointer turn into “busy” when the program is launched, until the splash window appears. It’s important for Vivado in particular, which takes some time to start up.

Also, the Path assignment in the Vivado launcher sets the directory at which Vivado runs, which should be changed to a directory that exists, and is a convenient place to dump all log files that Vivado generates.

A list of possible assignments in desktop launchers can be found on this page.

Booting Vivado / EDK mixed FSBL on Zynq-7000

Background

This is yet another war story about making the FSBL boot on a Zynq processor.

I had prepared an FSBL for a certain target using SDK 14.6, and then someone needed it in a Vivado package, using the SDK attached to Vivado 2014.1. In a perfect world, I would have exported the system’s configuration from XPS 14.6 to Vivado as an XML file, and generated the FSBL there. But experience shows that nothing really guarantees that the processor’s configuration will be adopted correctly in Vivado. As a matter of fact, I’ve seen that Vivado imports some parameters, and others are ignored.

But hey, I could just copy the existing FSBL source files to a new workspace in the target SDK? After all, it’s just C code!

This is in fact possible, going File > Import… > General > Existing Projects into Workspace. Then navigate to the path of the original project’s workspace. And don’t forget marking “Copy projects into workspace” so that the old one can be moved or deleted. A popup will allow selecting which projects to import, and it’s done!

Well, not. Selecting the three projects in an FSBL source set (fsbl, fsbl_bsp and system_hw_platform) will indeed create a fresh FSBL project, but it fails compiling (saying that it can’t find libxilffs as required by the -lxilffs or something like that).

To work around this, I imported only the system_hw_platform project, and generated the FSBL project in Vivado’s SDK, as usual: File > New > Application Project. Set the name to “fsbl”, make sure that the underlying hardware project it system_hw_platform. Click “Next” and pick “Zynq FSBL” as the template.

This makes sense, because the FSBL project relies on the C sources that were generated when XPS exported the project to SDK. So the hardware configuration remains correct, and the FSBL is new. No reason why this shouldn’t work, in theory.

The project compiled right away, and an fsbl.elf was ready for mixing into a boot.bin file.

Hurray! Not. It didn’t boot.

Despair not

The immediate measure for these cases in compiling the FSBL with the -DFSBL_DEBUG compilation parameter (which defines the FSBL_DEBUG compilation variable, turning on debug messages). With some luck, something informative will show up on the serial console, even if it appeared dead before.

I was one of those lucky bas#$%*s. I got:

PS7_INIT_FAIL : PS7 initialization successful
FSBL Status = 0xA012

Hmmm… That sounds like a mixed-up error message. It failed because it was successful? Well, in fact, the message itself represents the confusion causing the problem.

The FSBL status 0xA012 is returned when the call to ps7_init() fails in main.c. Or more precisely, when the returned value isn’t FSBL_PS7_INIT_SUCCESS. By the way, the FSBL generated by SDK 14.6 doesn’t even bother to check the return value of ps7_init(), but that’s irrelevant here.

Anyhow, note that ps7_init() is defined in the system_hw_platform, which consists of sources generated by XPS 14.6, but called by the FSBL, which was generated by Vivado.

This is a bit delicate, because ps7_init() returns PS7_INIT_SUCCESS when successful (see ps7_init.c), which happens to be defined in ps7_init.h as

#define PS7_INIT_SUCCESS   (0)    // 0 is success in good old C

and non-zero values meaning failure. This is the classic UNIX convention.

For some reason, this is what one finds in fsbl.h:

#ifdef NEW_PS7_ERR_CODE
#define FSBL_PS7_INIT_SUCCESS	PS7_INIT_SUCCESS
#else
#define FSBL_PS7_INIT_SUCCESS	(1)
#endif

In short: FSBL_PS7_INIT_SUCCESS=1, PS7_INIT_SUCCESS=0. A problem indeed.

So this is a direct consequence of mixing an old hardware project with a new FSBL. They changed the error code values somewhere in the middle.

Solution

The clean way to fix this is defining NEW_PS7_ERR_CODE during compilation. The less clean method is just remove this #ifdef statement and leave it as

#define FSBL_PS7_INIT_SUCCESS	PS7_INIT_SUCCESS

And with this FSBL booted correctly and all was well.

I know that getting the FSBL to boot is a recurring problem. Please don’t turn to me for help if your board doesn’t boot — there’s no secret trick, just good old debugging that takes time and effort.

Executing user-space programs from a different Linux distro

While trying to use executables from one ARM-based distribution to another, it failed to run, even before trying to load any libraries. The ARM architectures were compatible (armhf in both cases) so it wasn’t like I was trying to run an Intel binary on an ARM. I could always cross-compile from sources, but copying binaries is much easier…

I’ll demonstrate this issue with the “ls” program. Of course I tried to adopt something more worthy.

It was just like (where the current directory’s “ls” is the binary belonging to the other distro)

# ./ls
-bash: ./ls: No such file or directory

or sometimes (depends on the distribution) it says

$ ./ls
-sh: ./ls: not found

or when attempting to run with bash:

$ bash ./ls
./ls: ./ls: cannot execute binary file

Attempting to set LD_DEBUG=all was pointless, because the error was earlier on. Strace gave an idea:

$ strace ./ls
execve("./ls", ["./ls"], [/* 13 vars */]) = -1 ENOENT (No such file or directory)
dup(2)                                  = 3
fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
fstat64(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aac9000
_llseek(3, 0, 0x7efca940, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
write(3, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory
) = 40
close(3)                                = 0
munmap(0x2aac9000, 4096)                = 0
exit_group(1)                           = ?

So execve() returns ENOENT even though the file exists. Which means, in this case, that the file is there but the kernel refuses to run it.

The reason

The crucial difference between the alien “ls” and the native one, is the where they expect to find their loader:

$ readelf -l /bin/ls

Elf file type is EXEC (Executable file)
Entry point 0xcb84
There are 7 program headers, starting at offset 52

Program Headers:
 Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 EXIDX          0x093b4c 0x0009bb4c 0x0009bb4c 0x00110 0x00110 R   0x4
 PHDR           0x000034 0x00008034 0x00008034 0x000e0 0x000e0 R E 0x4
 INTERP         0x000114 0x00008114 0x00008114 0x00013 0x00013 R   0x1
 [Requesting program interpreter: /lib/ld-linux.so.3]
 LOAD           0x000000 0x00008000 0x00008000 0x93c60 0x93c60 R E 0x8000
 LOAD           0x094000 0x000a4000 0x000a4000 0x007bd 0x02a88 RW  0x8000
 DYNAMIC        0x09400c 0x000a400c 0x000a400c 0x000f0 0x000f0 RW  0x4
 GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

 Section to Segment mapping:
 Segment Sections...
 00     .ARM.exidx
 01    
 02     .interp
 03     .interp .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.extab .ARM.exidx .eh_frame
 04     .init_array .fini_array .jcr .dynamic .got .data .bss
 05     .dynamic
 06    
$ readelf -l ./ls

Elf file type is EXEC (Executable file)
Entry point 0xb6d9
There are 9 program headers, starting at offset 52

Program Headers:
 Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 EXIDX          0x00fce8 0x00017ce8 0x00017ce8 0x00030 0x00030 R   0x4
 PHDR           0x000034 0x00008034 0x00008034 0x00120 0x00120 R E 0x4
 INTERP         0x000154 0x00008154 0x00008154 0x00027 0x00027 R   0x1
 [Requesting program interpreter: /lib/arm-linux-gnueabihf/ld-linux.so.3]
 LOAD           0x000000 0x00008000 0x00008000 0x0fd1c 0x0fd1c R E 0x8000
 LOAD           0x00fee4 0x0001fee4 0x0001fee4 0x003e4 0x01050 RW  0x8000
 DYNAMIC        0x00fef0 0x0001fef0 0x0001fef0 0x00110 0x00110 RW  0x4
 NOTE           0x00017c 0x0000817c 0x0000817c 0x00044 0x00044 R   0x4
 GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
 GNU_RELRO      0x00fee4 0x0001fee4 0x0001fee4 0x0011c 0x0011c R   0x1

 Section to Segment mapping:
 Segment Sections...
 00     .ARM.exidx
 01    
 02     .interp
 03     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.exidx .eh_frame
 04     .init_array .fini_array .jcr .dynamic .got .data .bss
 05     .dynamic
 06     .note.ABI-tag .note.gnu.build-id
 07    
 08     .init_array .fini_array .jcr .dynamic

Aha! When the native “ls” is executed, the kernel loads /lib/ld-linux.so.3 which in turn executes the required executable. When the alien “ls” was attempted, the kernel went for /lib/arm-linux-gnueabihf/ld-linux.so.3, couldn’t find it and returned “no such file”. It actually means that it didn’t find the interpreter binary (i.e. the glibc dynamic library loader).

The Solution

Create a symlink from where the executable expects the loader to where it actually is. In this case

# mkdir /lib/arm-linux-gnueabihf
# cd /lib/arm-linux-gnueabihf
# ln -s /lib/ld-linux.so.3

It’s of course quite likely that some library binaries will need to be copied along with the executable. LD_DEBUG or ldd may be helpful here, as well as “readelf -d” if there’s no ldd.

Changing the dynamic linker when compiling

Sometimes it’s possible to go the other way around: Tell gcc to pick a certain dynamic linker.

But first, to see which loader a program compiled with gcc will expect, add the -v flag in the compilation command, e.g.

$ gcc -v -O3 -Wall tryexec.c -o tryexec

and look for the -dynamic-linker flag in COLLECT_GCC_OPTIONS (could be, for example, /lib64/ld-linux-x86-64.so.2).

To change the choice of linker, pass an argument to the linker through gcc with the -Wl flag:

$ gcc -O3 -Wl,-I/lib/ld-linux.so.3 -Wall tryexec.c -o tryexec

What comes after the comma of the -Wl flag goes to the linker, so -Wl,-I/lib/ld-linux.so.3 passes “-I/lib/ld-linux.so.3″ to ld, which does the job.

Those using Eclipse (Xilinx SDK included) can add the flag in the project C/C++ Build Settings > Tool Settings > ARM Linux gcc linker > Miscellaneous > Linker Flags (write e.g. “-Wl,-I/lib/myloader.so”, without the quotes, in the text box).

Wifi Access Point on my desktop with USB dongles

Introduction

These are my rather messy notes as I set up a wireless access point on my desktop (Fedora 12) running a home-compiled 3.12.20 Linux kernel. Somewhere below (see “Rubbish starts here”) I’ve added things that I tried out but lead nowhere. Beware.

I began with two USB dongles, 8188EU and 8192CU. I got 8188EU up and running with Realtek’s hostapd and driver, but only for the 2.4 GHz band. So I bought a RaLink-based dual-band USB dongle, and ran it with the kernel’s built-in driver and an updated version of hostapd (it’s hardware neutral however). If you want it, search E-Bay for “300m USB Wifi dual band”. It should look like this, and cost some $15 or so:

Dual band Wifi USB dongle

This dongle is what I ended up using. You may skip to “Dual-band dongle” below if you don’t care about the other things I tried out before I chose this one.

The purpose is a manual setup for occasional use. There are plenty of similar writeouts, like this one.

It’s very easy to get mixed up with all those do-this-do-that howtos, and forget one simple fact: A wireless NIC is just another Ethernet card that happens not to have a cable. The authentication of a wireless link takes place with plain Ethernet packets, and once the two sides agree on talking with each other, it’s back to two Ethernet cards with a cross cable.

To make a machine serve as an access point, the NIC must support Master mode, and there must be software running that plays the role of authenticating clients and setting up encryption. But in the end of the day, that all there is to it. Linux’ daemon for doing this is hostapd.

The swiss army knives are “iw“, “iwconfig” and “iwlist”. Try “iw help” in particular.

In short

  1. Plug in device — driver autoloads
  2. Bring up the device with ifconfig (assign an IP address)
  3. Switch regulation region, if the 5 GHz band is required (and the device reports old and over-restrictive regulation rules):
    # iw reg set GD
  4. Restart dhcpd, so that it listens for requests on wlan0
  5. Start hostapd

Realtek vs. community

There are two completely different takes on getting the Wifi working. One is to use the tools that are maintained by the community: The hostapd that arrives along with distributions, and the drivers compiled in the kernel. Well, as of June 2014, that’s not a go with Realtek’s USB Wifi dongles.

The thing is that the typical distribution hostapd expects to find the kernel’s native interface, which is implemented in the cfg80211 and mac80211 kernel modules. These modules are supposed to talk with the low-level hardware drivers. Very structured and nice. Only hi-tec companies don’t always play ball with the kernel community.

Realtek, in this case, chose to compile together everything, including the higher level frontend source code, and make a single kernel module of that. Kinda makes sense when all you need is a single driver for your specific hardware (a bit like static linking of a program), but not when that hardware is just one of many to be supported.

For example, the kernel’s 8192CU driver (appears as rtl8192cu on lsmod with ~79kB) relies on the kernel’s low-level modules (which are mac80211. cfg80211, rtl8192c_common, rtl_usb, rtlwifi), but the Realtek driver has everything in a single module, which appears as 8192cu and takes ~526kB.

Now to hostapd: The distribution’s version are minded on the kernel’s native interface (“driver=nl80211″) with some partial support for Realtek’s drivers (“driver=rtl871x”), so all in all, if you use Realtek’s kernel drivers, use their hostapd as well.

My chosen solution (well, no-other-choice solution) was to compile the Realtek’s kernel modules and hostapd. With slight variations.

So first is a summary of commands when things finally work, and then the battle field (compilations from sources etc.).

ifconfig

This is necessary for the already running DHCP daemon to answer requests from wireless clients. This ifconfig command is also the moment at which the firmware is loaded (and not when the driver loads, as one could expect).

Important: Remember that routing rules apply like any Ethernet card, so don’t pick an IP address space that is already accounted for in the access point’s routing table. Doing that mistake will not just make pings fail, but the access point will also ignore ARP requests (see below).

# ifconfig wlan0 10.10.0.1 netmask 255.255.255.0
# service dhcpd restart

Starting hostapd

# service hostapd start

or running in the foreground, with a lot of debug output

# hostapd -dd /etc/hostapd/hostapd.conf

Note that when hostapd is running in the foreground and is stopped with CTRL-C, unplugging and replugging the device may be necessary before re-attempting to work with it.

What happens if you pick a bad IP address

For some reason, I had the silly idea that since my internal LAN’s subnet is 10.1.0.0/16, I should assign my wlan0 card the address 10.1.1.123, so it will natively belong to the LAN. What I didn’t realize was that another NIC is already assigned for handling 10.1.0.0/16, so wlan0 will never get packets routed to it.

Even worse, the wireless adapter will not answer to ARP requests, which kinda makes sense — the wireless adapter “knows” that it can’t work with the IP address it has, so it might as well not announce any IP connectivity. The interesting thing was that ping requests were ignored completely as well. It’s not like the replies went out on NIC to which the IP subnet belongs. There was no reply packet at all. Which again, makes sense, because pings are not supposed to go out on another NIC. That could potentially confuse someone into thinking that the link is OK (in case there was a way for the reply to reach the requester).

In grey, with a line-over, here is the description of the problem, as I saw it before I solved it. Just in case someone is stuck in the same situation.

At this point, I can connect to the Access Point from Windows XP (even with a client having poor WPA support) as well as Linux with seemingly no problem. But there’s no real internet access. The reason seems to be, that the USB dongle doesn’t seem to be connected with its IP protocol layer. Ethernet packets go through well, as can be seen in sniff dumps on both sides, and the client manages to acquire an address with DHCP, because it depends only on plain MAC packets.

Despite setting an address with ifconfig (or “ip address add” for that matter), the dongle doesn’t respond to ARP requests asking for the address it has, and doesn’t respond to pings.

ARP packets are sent properly from the dongle (acting as AP) and the responses from the client arrive fine as well (when asking for the address of the client’s Wifi NIC as well as another wired Ethernet NIC, both are answered).

# arping -I wlan0 10.1.1.166
ARPING 10.1.1.166 from 10.1.1.123 wlan0
Unicast reply from 10.1.1.166 [00:0E:2E:40:5B:11]  48.329ms
Unicast reply from 10.1.1.166 [00:0E:2E:40:5B:11]  80.612ms
Unicast reply from 10.1.1.166 [00:0E:2E:40:5B:11]  104.531ms

but not on the other way (from the client):

# arping -I wlan0 10.1.1.123

(nothing happens)

Now, if the access point sends a gratuitous ARP to the client:

# arping -A -I wlan0 10.1.1.123

the client can send ping packets to the access point. These ICMP packets appear in the sniff dump of wlan0 on both sides, but the access point doesn’t reply. So did pinging to the broadcast address. The packets were seen at the access point’s sniff dumps with all 0xff’s MAC address, but with no response:

# ping -b 10.1.255.255

This is not a firewall issue. The problem remains with the firewall taken down. Both USB dongles have this same problem.

Compiling Realtek’s driver for RTL8188EU

Possible reason why this is necessary: The USB device is V2.0 according to the package, and the newer version contains firmware. Anyhow,

$ git clone https://github.com/lwfinger/rtl8188eu.git

A plain “make” compiled the code cleanly on kernel 3.12.20 (using commit ID 63fe7cda86c2830d66335026efde7472c10bc5c2). Copy firmware (also in Git bundle):

# cp rtl8188eufw.bin /lib/firmware/rtlwifi/

(well, I ended up doing “make install”. After removing the existing driver from the staging subdirectory).

Compiling Realtek’s driver for RTL8192CU

Following this guide, went to Realtek’s site, and download something like RTL8188C_8192C_USB_linux_v4.0.2_9000.20130911.zip (ZIP??!), untarred wpa_supplicant_hostapd-0.8_rtw_r7475.20130812.tar.gz.

Tried to compile from this zip file (under “driver”). Compilation failed against my kernel (3.12) on the change of the “create_proc_entry” API. So instead, I went for

$ git clone https://github.com/pvaret/rtl8192cu-fixes.git

and compiled cleanly from commit ID f0dfbb46a891820b27942ba3e213af83f2452957.

Compiling and running Realtek’s hostapd

From the zip file that I downloaded from Realtek, went to the hostapd subdirectory in wpa_supplicant_hostapd/, and typed “make”. Compiled cleanly, and generated a “hostapd” and “hostapd_cli” executables. Yey.

And that actually worked! Note that the rtl871x driver is picked even though the “driver=” isn’t assigned at all in hostapd.conf.

# hostapd -d /etc/hostapd/hostapd.conf
random: Trying to read entropy from /dev/random
Configuration file: /etc/hostapd/hostapd.conf
ctrl_interface_group=0
eapol_version=1
drv->ifindex=35
l2_sock_recv==l2_sock_xmit=0x0x1203be0
BSS count 1, BSSID mask 00:00:00:00:00:00 (0 bits)
Completing interface initialization
Mode: IEEE 802.11g  Channel: 4  Frequency: 2427 MHz
RATE[0] rate=10 flags=0x1
RATE[1] rate=20 flags=0x1
RATE[2] rate=55 flags=0x1
RATE[3] rate=110 flags=0x1
RATE[4] rate=60 flags=0x0
RATE[5] rate=90 flags=0x0
RATE[6] rate=120 flags=0x0
RATE[7] rate=180 flags=0x0
RATE[8] rate=240 flags=0x0
RATE[9] rate=360 flags=0x0
RATE[10] rate=480 flags=0x0
RATE[11] rate=540 flags=0x0
Flushing old station entries
Deauthenticate all stations
+rtl871x_sta_deauth_ops, ff:ff:ff:ff:ff:ff is deauth, reason=2
rtl871x_set_key_ops
rtl871x_set_key_ops
rtl871x_set_key_ops
rtl871x_set_key_ops
Using interface wlan0 with hwaddr c0:4a:00:18:ef:21 and ssid 'ocho'
Deriving WPA PSK based on passphrase
SSID - hexdump_ascii(len=4):
 6f 63 68 6f                                       ocho           
PSK (ASCII passphrase) - hexdump_ascii(len=9): [REMOVED]
PSK (from passphrase) - hexdump(len=32): [REMOVED]
rtl871x_set_wps_assoc_resp_ie
rtl871x_set_wps_beacon_ie
rtl871x_set_wps_probe_resp_ie
urandom: Got 20/20 bytes from /dev/urandom
GMK - hexdump(len=32): [REMOVED]
Key Counter - hexdump(len=32): [REMOVED]
WPA: group state machine entering state GTK_INIT (VLAN-ID 0)
GTK - hexdump(len=32): [REMOVED]
WPA: group state machine entering state SETKEYSDONE (VLAN-ID 0)
rtl871x_set_key_ops
rtl871x_set_beacon_ops
rtl871x_set_hidden_ssid ignore_broadcast_ssid:0, ocho,4
rtl871x_set_acl
wlan0: Setup of interface done.

But with WPA authentication enabled, I got a lot of

hostapd: wlan0: STA 00:0e:2e:40:5b:94 IEEE 802.11: associated
hostapd: wlan0: STA 00:0e:2e:40:5b:94 IEEE 802.11: deauthenticated due to local deauth request
hostapd: wlan0: STA 00:0e:2e:40:5b:94 IEEE 802.11: disassociated

It was also evident sniffing wlan0 that EAPOL WPA key (254) frames were sent to the client, but they didn’t get answered, which is probably the reason for the whole thing, as mentioned on this page.

The solution was to restrict the protocol to version 1 with

eapol_version=1

in hostapd.conf. This problem occurred only when I used the RT2500 utility on the Windows laptop. Using Windows XP’s native wireless selection tool connected well either way.

8192CU is single band. Really.

I tried to work with the 8192CU dongle, because it supposedly supports the 5 GHz band as well. The 2.4 GHz is heavily crowded. I don’t know why I got the impression that it’s dual-band. Anyhow,

# cp 8192cu.ko /lib/modules/$(uname -r)/kernel/drivers/net/wireless/
# depmod -a

and also blacklist the kernel’s native driver by adding the following lines to /etc/modprobe.d/blacklist.conf

# Native Wifi drivers not usable as accept points
blacklist rtl8192cu
blacklist rtl8192c_common

To see the list of channels:

$ iwlist wlan0 freq

Darn, only 2.4 GHz! It even says so on Realtek’s site: “Complete 802.11n MIMO solution for 2.4GHz band” and “Single-Band 11n (2x2) WLAN USB Dongle”.

Besides, the signal it transmits appears to be really lousy. I got a really bad link quality (but hey, this is a cheapo dongle from Ebay).

Compiling hostapd from the sources

First, install libnl-devel, which is required for compiling hostapd:

# yum install libnl-devel

Download from the hostapd’s main page, copy the config file and compile:

$ git clone git://w1.fi/srv/git/hostap.git
$ cd hostap/hostapd
$ git checkout hostap_2_2
$ cp defconfig .config
$ make

Dual-band dongle

Plugged in an MediaTek (formerly RaLink) RT5572-based no-brand dongle (0x148f/0x5572) into my computer with kernel 3.12. Was detected right away. “iw list” gave a long answer, so revert to the original hostapd, and pick driver=nl80211. The driver handling it was rt2800usb, along with its dependencies, rt2800usb, rt2x00usb, rt2x00lib, mac80211 and cfg80211.

The Linux drivers MediaTek’s site were last updated in 2010, supporting kernel 2.4.0, but the rt2800usb driver seems to be maintained properly with occasional patches. So it looks like the kernel’s built-in driver is the best choice. The RT5572 was added in March 2013 to kernel 3.10.

Attempted to run hostapd, it said

# hostapd -dd /etc/hostapd/hostapd.conf
Configuration file: /etc/hostapd/hostapd.conf
ctrl_interface_group=0
eapol_version=1
ioctl[SIOCSIFFLAGS]: No such file or directory
nl80211 driver initialization failed.
wlan1: Unable to setup interface.
rmdir[ctrl_interface]: No such file or directory

That wasn’t very helpful, but looking at the system log was:

ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
ieee80211 phy0: rt2x00lib_request_firmware: Error - Failed to request Firmware

Ah, yes. A firmware file. Taken from the Linux Firmware Git repo,

# cp rt2870.bin /lib/firmware/

(note that it’s NOT to rtlwifi. The is RaLink, not RealTek).

At which point I got a lot of output from hostapd -dd, but it ended with

Could not set DTIM period for kernel driver

This seems to be an hostapd issue (I ran 0.6.9), as the driver is stable. Compiling hostapd-2.2 solved this (see just above), and the dongle works nicely as an access point.

Access point at 5 GHz

The whole point with this dual-band dongle was to run the access point at 5 GHz, and avoid all the noise from my neighbors. But alas, requesting a 5 GHz channel with hostapd -dd, says, somewhere in the middle:

channel [40] (157) is disabled for use in AP mode, flags: 0x1
wlan1: IEEE 802.11 Configured channel (157) not found from the channel list of current mode (2) IEEE 802.11a
wlan1: IEEE 802.11 Hardware does not support configured channel
Could not select hw_mode and channel. (-3)
wlan1: interface state UNINITIALIZED->DISABLED
wlan1: AP-DISABLED
wlan1: Unable to setup interface.

Hmmm… I failed twice here. The frequency isn’t allowed in Israel, and the 5 GHz band is blocked altogether.

Indeed,

$ iw list
Wiphy phy2
 Band 1:
 Capabilities: 0x2f2
 [...]
 Frequencies:
 * 2412 MHz [1] (20.0 dBm)
 * 2417 MHz [2] (20.0 dBm)
 * 2422 MHz [3] (20.0 dBm)
 * 2427 MHz [4] (20.0 dBm)
 * 2432 MHz [5] (20.0 dBm)
 * 2437 MHz [6] (20.0 dBm)
 * 2442 MHz [7] (20.0 dBm)
 * 2447 MHz [8] (20.0 dBm)
 * 2452 MHz [9] (20.0 dBm)
 * 2457 MHz [10] (20.0 dBm)
 * 2462 MHz [11] (20.0 dBm)
 * 2467 MHz [12] (20.0 dBm)
 * 2472 MHz [13] (20.0 dBm)
 * 2484 MHz [14] (disabled)
 Bitrates (non-HT):
 * 1.0 Mbps
 * 2.0 Mbps (short preamble supported)
 * 5.5 Mbps (short preamble supported)
 * 11.0 Mbps (short preamble supported)
 * 6.0 Mbps
 * 9.0 Mbps
 * 12.0 Mbps
 * 18.0 Mbps
 * 24.0 Mbps
 * 36.0 Mbps
 * 48.0 Mbps
 * 54.0 Mbps
 Band 2:
 Capabilities: 0x2f2
 HT20/HT40
 [...]
 Frequencies:
 * 5180 MHz [36] (disabled)
 * 5190 MHz [38] (disabled)
 * 5200 MHz [40] (disabled)
 * 5210 MHz [42] (disabled)
 * 5220 MHz [44] (disabled)
 * 5230 MHz [46] (disabled)
 * 5240 MHz [48] (disabled)
 * 5250 MHz [50] (disabled)
 * 5260 MHz [52] (disabled)
 * 5270 MHz [54] (disabled)
 * 5280 MHz [56] (disabled)
 * 5290 MHz [58] (disabled)
 * 5300 MHz [60] (disabled)
 * 5310 MHz [62] (disabled)
 * 5320 MHz [64] (disabled)
 * 5500 MHz [100] (disabled)
 * 5510 MHz [102] (disabled)
 * 5520 MHz [104] (disabled)
 * 5530 MHz [106] (disabled)
 * 5540 MHz [108] (disabled)
 * 5550 MHz [110] (disabled)
 * 5560 MHz [112] (disabled)
 * 5570 MHz [114] (disabled)
 * 5580 MHz [116] (disabled)
 * 5590 MHz [118] (disabled)
 * 5600 MHz [120] (disabled)
 * 5610 MHz [122] (disabled)
 * 5620 MHz [124] (disabled)
 * 5630 MHz [126] (disabled)
 * 5640 MHz [128] (disabled)
 * 5650 MHz [130] (disabled)
 * 5660 MHz [132] (disabled)
 * 5670 MHz [134] (disabled)
 * 5680 MHz [136] (disabled)
 * 5690 MHz [138] (disabled)
 * 5700 MHz [140] (disabled)
 * 5745 MHz [149] (disabled)
 * 5755 MHz [151] (disabled)
 * 5765 MHz [153] (disabled)
 * 5775 MHz [155] (disabled)
 * 5785 MHz [157] (disabled)
 * 5795 MHz [159] (disabled)
 * 5805 MHz [161] (disabled)
 * 5825 MHz [165] (disabled)
 * 4920 MHz [-16] (disabled)
 * 4940 MHz [-12] (disabled)
 * 4960 MHz [-8] (disabled)
 * 4980 MHz [-4] (disabled)
 Bitrates (non-HT):
 * 6.0 Mbps
 * 9.0 Mbps
 * 12.0 Mbps
 * 18.0 Mbps
 * 24.0 Mbps
 * 36.0 Mbps
 * 48.0 Mbps
 * 54.0 Mbps
 [...]

Are you kidding me? Disabled? Well, no wonder. The kernel thinks 5 GHz is disallowed in Israel:

$ iw reg get
country IL:
 (2402 - 2482 @ 40), (N/A, 20)

Where did it get that from? A peek on dmesg reveals the answer:

cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
usb 5-1.4: reset full-speed USB device number 9 using uhci_hcd
ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 5592, rev 0222 detected
ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 000f detected
ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
usbcore: registered new interface driver rt2800usb
cfg80211: Calling CRDA for country: IL
cfg80211: Regulatory domain changed to country: IL
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)

The thing is that according to Israel’s local regulations, the lower 5 GHz band is allowed for indoor use. My initial choice of channel 157 is probably illegal in Israel (see Wikipedia’s list). But hey, some channels are still open on the 5 GHz band! It’s also interesting to note that some of 5 GHz channels that are banned for Wifi are allowed for amateur radio (also see this and this).

As the regulations for each country is taken from some ROM on the hardware device itself, it’s probably outdated.

The ugly solution is to switch regulation country. For example, Granada has a relatively relaxed setting:

# iw reg set GD

A full list of these country codes can be found here. “BO” (for Bolivia) is also worth a try.

Now the responsibility is on me to pick a legal frequency. For example, anywhere between 36-48.


Rubbish starts here

From this point on, it’s just random stuff that I tried out, and didn’t lead anywhere. But since I write as I work, why delete it? Maybe it helps someone as is.

Plugging in a TL-WN725N before switching to Realtek’s drivers

usb 2-2.2: Product: 802.11n NIC
usb 2-2.2: Manufacturer: Realtek
usb 2-2.2: SerialNumber: 00E04C0001
r8188eu: module is from the staging directory, the quality is unknown, you have been warned.
Chip Version Info: CHIP_8188E_Normal_Chip_TSMC_D_CUT_1T1R_RomVer(0)
usbcore: registered new interface driver r8188eu

Check if it’s ready to be an access point:

# iwconfig wlan0 mode master
# iwconfig wlan0
wlan0     unassociated  Nickname:"<WIFI@REALTEK>"
 Mode:Master  Frequency=2.412 GHz  Access Point: Not-Associated  
 Sensitivity:0/0 
 Retry:off   RTS thr:off   Fragment thr:off
 Encryption key:off
 Power Management:off
 Link Quality:0  Signal level:0  Noise level:0
 Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
 Tx excessive retries:0  Invalid misc:0   Missed beacon:0

OK, so it is. :)

But this doesn’t seem very good:

# iw list
nl80211 not found.

And here comes a bit of nonsense that was fixed by compiling software from sources, as shown below.

Fixed with

# modprobe mac80211

Installing the access point daemon:

# yum install hostapd

Running manually for a test:

 

# hostapd -dd /etc/hostapd/hostapd.conf
Configuration file: /etc/hostapd/hostapd.conf
ctrl_interface_group=10 (from group name 'wheel')
nl80211 not found.
nl80211 driver initialization failed.
wlan0: Unable to setup interface.

Tried second dongle (the I bought cheap from Ebay)

usb 2-2.2: New USB device found, idVendor=0bda, idProduct=8176
usb 2-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-2.2: Product: 802.11n WLAN Adapter
usb 2-2.2: Manufacturer: Realtek
usb 2-2.2: SerialNumber: 00e04c000001
rtl8192cu: Chip version 0x10
rtl8192cu: MAC address: 00:13:ef:40:08:98
rtl8192cu: Board Type 0
rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
usbcore: registered new interface driver rtl8192cu
rtlwifi: Loading alternative firmware rtlwifi/rtl8192cufw.bin
rtlwifi: Firmware rtlwifi/rtl8192cufw_TMSC.bin not available

OK, OK, take the firmware!

# mkdir /lib/firmware/rtlwifi
# cp rtl8192cufw.bin /lib/firmware/rtlwifi/

Unplug-replug. This one went much better:

usb 2-2.2: New USB device found, idVendor=0bda, idProduct=8176
usb 2-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-2.2: Product: 802.11n WLAN Adapter
usb 2-2.2: Manufacturer: Realtek
usb 2-2.2: SerialNumber: 00e04c000001
rtl8192cu: Chip version 0x10
rtl8192cu: MAC address: 00:13:ef:40:08:98
rtl8192cu: Board Type 0
rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
rtlwifi: Loading alternative firmware rtlwifi/rtl8192cufw.bin
ieee80211 phy1: Selected rate control algorithm 'rtl_rc'
rtlwifi: wireless switch is on
cfg80211: Calling CRDA for country: IL
cfg80211: Regulatory domain changed to country: IL
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)

but

# hostapd /etc/hostapd/hostapd.conf
ioctl[SIOCSIFFLAGS]: Unknown error 132
nl80211 driver initialization failed.
rmdir[ctrl_interface]: No such file or directory

Newer hostapd

Stole the binaries from Fedora 20, including a set of necessary libraries, and created a chroot for that as follows:

# chroot . /hostapd -d /hostapd.conf

With the Ebay dongle, the AP was visible from my laptop, but I failed to connect. Nothing appears on sniffing wlan1, and strace shows nothing happens during these connection attempts, so the conclusion must be that the problem is with the dongle.

So I found the first firmware the driver was checking for,

usb 2-2.3: New USB device found, idVendor=0bda, idProduct=8176
usb 2-2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-2.3: Product: 802.11n WLAN Adapter
usb 2-2.3: Manufacturer: Realtek
usb 2-2.3: SerialNumber: 00e04c000001
rtl8192cu: Chip version 0x10
rtl8192cu: MAC address: 00:13:ef:40:08:98
rtl8192cu: Board Type 0
rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
ieee80211 phy7: Selected rate control algorithm 'rtl_rc'
rtlwifi: wireless switch is on
rtl8192cu: MAC auto ON okay!
rtl8192cu: Tx queue select: 0x05

Didn’t make any difference.

Creating a bridge

This is the really manual route, based upon this page.

Basically,

# brctl addbr br0
# brctl setfd br0 0
# brctl addif br0 eth0
# brctl addif br0 wlan0
# ifconfig br0 10.1.1.123 netmask 255.255.255.0
# ifconfig br0 up

The second command sets the forward delay to zero, to prevent problems on the first connection, as mentioned on this page.

One can take a look on the status with

# brctl show
bridge name    bridge id        STP enabled    interfaces
br0        8000.00241dd37e38    no        eth0
                                          wlan0

To remove the bridge:

# ifconfig br0 down
# brctl delbr br0

Plain-text mail from Thunderbird (under Linux)

Introduction

I’ve been annoyed for quite a while by Thunderbird’s strong inclination towards HTML mail. To the extent that if I don’t really, really verify that a mail goes out in plain text, it’s probably going to slip out in HTML. This is bad in particular when sending mails to Linux-related mailing lists. They don’t like it. And the truth is that I’m not very fond of them either, but I usually don’t care.

There’s an add-on for this, Outgoing Message Format, but I run a version of Thunderbird that is too old for that, and trying to fool Thunderbird into installing it by changing the add-on’s version requirement field ended up with an add-on that does nothing.

Upgrading was not an attractive direction: If I’m happy with a tool except for one thing, I’ll fix that thing. Upgrading tends to fix that thing but create a new problem. On a good day.

It turned out to be extremely difficult to convince Thunderbird stopping with that. My notes while trying below.

Note to self: To find the entire hack history, search your “Sent” box for “Thunderbird plain text hacks” in the subject.

Remove the HTML composition capability completely

Ths method makes it impossible for a certain mail identity to compose HTML mails. Go to Preferences > General > Config Editor… and agree to be careful.

mail.identity.id1.compose_html: Set from true to false.

In internal JavaScript code, these preferences are fetched with getPref() commands.

Fixing Thunderbird from within

After wasting a lot of time on this, I reached the conclusion, that the problem was that quite a few components in Thunderbird’s script environment push the HTML format for various reasons. These are apparently ugly hacks that solved a problem for someone in the far past, and remained there, because noone noticed them or understood exactly what they do, possibly including whoever wrote them in the first place.

The solution was a counter-hack. Basically, hide the relevant menu’s IDs from other scripts and set the default to “Plain text”. This requires opening a JAR, making a few fixes in a couple of files, and packing it up again.

So let’s get to it. In a fresh directory,

$ jar xf /usr/lib64/thunderbird-3.0/chrome/messenger.jar

and edit ./content/messenger/messengercompose/messengercompose.xul, in the part saying

<menu id="outputFormatMenu" label="&outputFormatMenu.label;" accesskey="&outputFormatMenu.accesskey;" oncommand="OutputFormatMenuSelect(event.target)">
 <menupopup id="outputFormatMenuPopup">
 <menuitem type="radio" name="output_format" label="&autoFormatCmd.label;" accesskey="&autoFormatCmd.accesskey;" id="format_auto" checked="true"/>
 <menuitem type="radio" name="output_format" label="&plainTextFormatCmd.label;" accesskey="&plainTextFormatCmd.accesskey;" id="format_plain"/>
 <menuitem type="radio" name="output_format" label="&htmlFormatCmd.label;" accesskey="&htmlFormatCmd.accesskey;" id="format_html"/>
 <menuitem type="radio" name="output_format" label="&bothFormatCmd.label;" accesskey="&bothFormatCmd.accesskey;" id="format_both"/>
 </menupopup>
 </menu>

The idea is to hide the elements from any script, except the one that responds to changes in this menu. Also, change the default from “Auto detect” to “plain text”. After the change we have

<menu id="my_outputFormatMenu" label="&outputFormatMenu.label;" accesskey="&outputFormatMenu.accesskey;" oncommand="OutputFormatMenuSelect(event.target)">
 <menupopup id="outputFormatMenuPopup">
 <menuitem type="radio" name="output_format" label="&autoFormatCmd.label;" accesskey="&autoFormatCmd.accesskey;" id="my_format_auto"/>
 <menuitem type="radio" name="output_format" label="&plainTextFormatCmd.label;" accesskey="&plainTextFormatCmd.accesskey;" id="my_format_plain" checked="true"/>
 <menuitem type="radio" name="output_format" label="&htmlFormatCmd.label;" accesskey="&htmlFormatCmd.accesskey;" id="my_format_html"/>
 <menuitem type="radio" name="output_format" label="&bothFormatCmd.label;" accesskey="&bothFormatCmd.accesskey;" id="my_format_both"/>
 </menupopup>
 </menu>

Note the “my_” prefixes on the IDs + that the “checked” attribute has moved.

This leaves a few changes in the only script that should deal with this, ./content/messenger/messengercompose/MsgComposeCommands.js: In

In ComposeStartup(),

document.getElementById("outputFormatMenu").setAttribute("hidden", true);

is replaced with

document.getElementById("my_outputFormatMenu").setAttribute("hidden", true);

and likewise, in OutputFormatMenuSelect()

if (msgCompFields)
 switch (target.getAttribute('id'))
 {
 case "format_auto":  gSendFormat = nsIMsgCompSendFormat.AskUser;     break;
 case "format_plain": gSendFormat = nsIMsgCompSendFormat.PlainText;   break;
 case "format_html":  gSendFormat = nsIMsgCompSendFormat.HTML;        break;
 case "format_both":  gSendFormat = nsIMsgCompSendFormat.Both;        break;
 }

is replaced with

if (msgCompFields)
 switch (target.getAttribute('id'))
 {
 case "my_format_auto":  gSendFormat = nsIMsgCompSendFormat.AskUser;     break;
 case "my_format_plain": gSendFormat = nsIMsgCompSendFormat.PlainText;   break;
 case "my_format_html":  gSendFormat = nsIMsgCompSendFormat.HTML;        break;
 case "my_format_both":  gSendFormat = nsIMsgCompSendFormat.Both;        break;
 }

Finally remove a single line that fiddles with the default (harmless now, but why leave it there…). In the definition of gComposeRecyclingListener, remove this line

document.getElementById("format_auto").setAttribute("checked", "true");

And that’s it.

and then repackage the Jar archive

$ jar cf messenger.jar content

Close Thunderbird, overwrite the original Jar file with the amended one (make a backup copy first, of course) and restart Thunderbird.

I should add, that there are several reasons to be surprised that this is enough. For example, while working on this, I noted that there are several direct calls to OutputFormatMenuSelect(), that attempt to fake a click on one of the HTML-enabling radio buttons. In the aftermath, plain text messages are generated even if this isn’t addressed directly.

Other stuff

During the process of figuring out how to solve this issue, I found a few tricks that may be useful in the future. So here they are

Open all jars you can find

$ find /usr/lib64/thunderbird-3.0/ -iname '*.jar' | while read i ; do ( mkdir "${i##*/}" && cd "${i##*/}" && jar xf "$i" ; ) done

This opens each jar in a directory holding its name (including the .jar suffix)

Set the default HTML format

mail.default_html_action: Set from 3 to 1. Seems not to have a significant effect.

Enabling the dump() command

dump() is used in internal Javascript code to produce debug messages, which are printed to stdout. This requires running Thunderbird from the command line.

In the Config Editor mentioned above, add the boolean browser.dom.window.dump.enabled and set it to true. Otherwise nothing is printed.

Creating stack traces

function DumpTrace()
{
 var err = new Error();

 dump("\nStack trace:\n" + err.stack + "\n\n");
}

The stack trace is pretty ugly, and contains a DumpTrace() too, but it’s good enough to find out why a certain function is called.

 

Wine: Picasa failed to start Fedora 12 after a kernel upgrade

I upgraded my kernel from 2.6.35 to 3.12, and Picasa 2.7 failed to start. Instead of starting, tons of winedbg processes were created at a rapid speed. If I didn’t kill everything related to Picasa within a minute or so (that is, all winedbg processes and any process having “picasa” in the string of “ps ax”) a reboot became imminent, as the system’s process table became full.

Since I’m not very fond of upgrades in general (Fedora 12, anyone?), and Google has ceased to support Picasa for Linux anyhow, I worked a bit on this.

It turned out, that going

$ wine "/opt/picasa/wine/drive_c/Program Files/Picasa2/Picasa2.exe"

did actually run Picasa, but with all my settings gone. No wonder. By default, Wine keeps its fake Windows environment under $HOME/.wine but $HOME/.picasa is the place used by Picasa.

So what about

$ WINEPREFIX=$HOME/.picasa wine "/opt/picasa/wine/drive_c/Program Files/Picasa2/Picasa2.exe"

YEY, it worked! But there were a few error messages, that may or may not be an issue:

wine: cannot find L"C:\\windows\\system32\\wineboot.exe"
err:process:start_wineboot failed to start wineboot, err 2
fixme:actctx:parse_depend_manifests Could not find dependent assembly L"Microsoft.Windows.Common-Controls" (6.0.0.0)
fixme:ntdll:find_reg_tz_info Can't find matching timezone information in the registry for bias -120, std (d/m/y): 26/10/2014, dlt (d/m/y): 28/03/2014
fixme:ole:CoResumeClassObjects stub
fixme:win:FlashWindowEx 0x329e54
Bogus message code 1006
Bogus message code 1006
Bogus message code 1006
Bogus message code 1006
Not a JPEG file: starts with 0x91 0x6b
Not a JPEG file: starts with 0xb5 0xea

[ ... tons of these ... ]

fixme:wininet:InternetSetOptionW Option INTERNET_OPTION_CONNECT_TIMEOUT (10000): STUB
fixme:wininet:InternetSetOptionW INTERNET_OPTION_SEND/RECEIVE_TIMEOUT 10000
fixme:wininet:set_cookie httponly not handled (L"HttpOnly")
fixme:wininet:set_cookie httponly not handled (L"HttpOnly")
fixme:wininet:InternetSetOptionW Option INTERNET_OPTION_CONNECT_TIMEOUT (10000): STUB
fixme:wininet:InternetSetOptionW INTERNET_OPTION_SEND/RECEIVE_TIMEOUT 10000
err:wininet:NETCON_secure_connect SSL_connect failed: 12045
fixme:wininet:INET_QueryOption Stub for 32
fixme:file:MoveFileWithProgressW MOVEFILE_WRITE_THROUGH unimplemented
fixme:file:MoveFileWithProgressW MOVEFILE_WRITE_THROUGH unimplemented
fixme:file:MoveFileWithProgressW MOVEFILE_WRITE_THROUGH unimplemented

[ .. quite a few of these too ... ]

And still, I wondered why it failed before.

Picasa’s log file

When executed using Picasa’s as in the default installation (running /opt/picasa/bin/picasa), a diagnostic log is created in $HOME/.picasa/picasa.log which said

modify_ldt: Invalid argument
modify_ldt: Invalid argument
modify_ldt: Invalid argument
modify_ldt: Invalid argument
modify_ldt: Invalid argument
wine: Unhandled page fault on write access to 0x00000010 at address 0x7ee95bd2 (thread 001d), starting debugger...

An error and then a debugger starting. Sounds familiar…

It turns out that there is indeed an issue with recent kernels, as one can see here, here and here (the two last links are to the same thread in LKML). Indeed, there’s a patch called “x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels” by H. Peter Anvin, which was targeted for 3.13 kernels, but also made its way to 3.12 stable, which I’m using. So much for “stable”. It’s a security patch, I suppose, and yet I would have lived well without it.

What the offending patch is about

The problem is that the IRET x86 machine instruction only changes the lower 16 bits of the stack pointer, if it returns to a 16-bit execution context. Since IRET is used to switch back from kernel mode to any user space program, any user space program running in 16 bit mode could get to know where the kernel’s stack is mapped (except for the lower 16 bits) just by calling the modify_ldt system call. This information could be a significant piece in the puzzle for a kernel injection exploit (but doesn’t pose a threat by itself).

This is a x86 processor bug, which has a workaround for 32-bit systems. But this workaround can be applied to 64-bit machines, which is why this patch was made. Later, Linus offered a patch that allows users to choose at runtime if they want to allow modift_ldt on 64-bit machines.