Introduction
Needing to remove superfluous memory barriers from a Linux kernel device driver, I wondered what they actually do. The issue is discussed down to painful detail in Documentation/memory-barriers.txt, but somehow it’s quite difficult to figure out if they’re really needed and where. Most drivers rely on subsequent iowrite32′s (or writel’s) to arrive to the hardware in the same order they appear in the code, and this is backed up the following clause in memory-barriers.txt:
Inside of the Linux kernel, I/O should be done through the appropriate accessor routines – such as inb() or writel() – which know how to make such accesses appropriately sequential. Whilst this, for the most part, renders the explicit use of memory barriers unnecessary, there are a couple of situations where they might be needed:
- On some systems, I/O stores are not strongly ordered across all CPUs, and so for _all_ general drivers locks should be used and miowb() must be issued prior to unlocking the critical section.
- If the accessor functions are used to refer to an I/O memory window with relaxed memory access properties, then _mandatory_ memory barriers are required to enforce ordering.
See Documentation/DocBook/deviceiobook.tmpl for more information.
So what they’re saying is that a memory barrier should be used before releasing a lock (spinlock? mutex? both? The examples show only a spinlock) and when prefetching is allowed by hardware.
Nice. Are they doing anything?
April 2020 update: I’ve written a new post on a similar topic. Also, on top of memory-barriers.txt mentioned above, there are some excellent explanations in the kernel tree’s tools/memory-model/Documentation/explanation.txt and tools/memory-model/Documentation/recipes.txt. There are relatively new (from v4.17, beginning of 2018).
May 2021 update: I’ve also written the parallel post for Windows device driver coding, which occasionally brings up Linux.
The practical take
Since I care most about x86 and ARM, I decided to figure out what the memory barriers actually do. The driver’s code should be formally correct, but in the end, if I remove a memory barrier and then test the driver — have I really made a difference? Have I really tested anything?
Ah, and in case you wonder why I didn’t check ioread32() and readl(): I don’t use them in my driver. Odd as it may sound.
The kernel sources in this post are ~3.12 but how often does anyone dare touching those basic functions?
Spoiler
For the lazy ones, here are my conclusions:
- On x86 platforms, iowrite32() and writel() are translated to just a “mov” into memory.
- On ARM, the same functions translate into a full write synchronization barrier (stop execution until all previous writes are done), and then an “str” into memory.
- On x86, the following functions translate into nothing: mmiowb(), smp_wmb() and smp_rmb(). wmb() and rmb() translate into “sfence” and “lfence” respectively.
- On ARM, mmiowb() translates into nothing. The other barriers translate into sensible opcodes.
Trying memory barriers with iowrite32()
I wrote the following kernel module as minimodule.c. Obviously, it won’t do anything good except for being disassembled after compilation.
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/io.h>
void try_iowrite32(void) {
void __iomem *p = (void *) 0x12345678;
iowrite32(0xabcd0001, p);
iowrite32(0xabcd0001, p);
iowrite32(0xabcd0002, p);
mmiowb();
iowrite32(0xabcd0003, p);
wmb();
iowrite32(0xabcd0004, p);
rmb();
iowrite32(0xabcd0005, p);
smp_wmb();
iowrite32(0xabcd0006, p);
smp_rmb();
}
EXPORT_SYMBOL(try_iowrite32);
The idea: First repeat exactly the same write to see how that’s handled, and then add barriers to see what they turn into.
The related sources for iowrite32() on x86
I have to admit that I was surprised to find out that iowrite32() is a function in itself, as is shown later in the disassembly. My best understanding was that it’s just an alias for writel(), by virtue of a define statement. But since CONFIG_GENERIC_IOMAP is defined on my kernel, it’s not defined in include/asm-generic/io.h, but there’s just a header for it in include/asm-generic/iomap.h. It’s defined as a function in lib/iomap.c as follows:
void iowrite32(u32 val, void __iomem *addr)
{
IO_COND(addr, outl(val,port), writel(val, addr));
}
where IO_COND is previously defined in the same file as follows (the comment is in the sources):
/*
* Ugly macros are a way of life.
*/
#define IO_COND(addr, is_pio, is_mmio) do { \
unsigned long port = (unsigned long __force)addr; \
if (port >= PIO_RESERVED) { \
is_mmio; \
} else if (port > PIO_OFFSET) { \
port &= PIO_MASK; \
is_pio; \
} else \
bad_io_access(port, #is_pio ); \
} while (0)
So there we have it. iowrite32() isn’t just an alias for writel(), but it checks the address and interprets it as port I/O if that makes sense.
To be sure, iowrite32() was disassembled as follows from the kernel’s object code (32-bit version):
0020f79f <iowrite32>:
20f79f: 81 fa ff ff 03 00 cmp $0x3ffff,%edx
20f7a5: 89 d1 mov %edx,%ecx
20f7a7: 76 03 jbe 20f7ac <iowrite32+0xd>
20f7a9: 89 02 mov %eax,(%edx)
20f7ab: c3 ret
20f7ac: 81 fa 00 00 01 00 cmp $0x10000,%edx
20f7b2: 76 08 jbe 20f7bc <iowrite32+0x1d>
20f7b4: 81 e2 ff ff 00 00 and $0xffff,%edx
20f7ba: ef out %eax,(%dx)
20f7bb: c3 ret
20f7bc: ba f2 56 03 00 mov $0x356f2,%edx
20f7c1: 89 c8 mov %ecx,%eax
20f7c3: e9 41 fe ff ff jmp 20f609 <bad_io_access>
Results on x86_64
Compiled on Intel x86/64 bit:
$ objdump -d minimodule.ko
minimodule.ko: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <try_iowrite32>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <try_iowrite32+0x9>
9: be 78 56 34 12 mov $0x12345678,%esi
e: bf 01 00 cd ab mov $0xabcd0001,%edi
13: e8 00 00 00 00 callq 18 <try_iowrite32+0x18>
18: be 78 56 34 12 mov $0x12345678,%esi
1d: bf 01 00 cd ab mov $0xabcd0001,%edi
22: e8 00 00 00 00 callq 27 <try_iowrite32+0x27>
27: be 78 56 34 12 mov $0x12345678,%esi
2c: bf 02 00 cd ab mov $0xabcd0002,%edi
31: e8 00 00 00 00 callq 36 <try_iowrite32+0x36>
36: be 78 56 34 12 mov $0x12345678,%esi
3b: bf 03 00 cd ab mov $0xabcd0003,%edi
40: e8 00 00 00 00 callq 45 <try_iowrite32+0x45>
45: 0f ae f8 sfence
48: be 78 56 34 12 mov $0x12345678,%esi
4d: bf 04 00 cd ab mov $0xabcd0004,%edi
52: e8 00 00 00 00 callq 57 <try_iowrite32+0x57>
57: 0f ae e8 lfence
5a: be 78 56 34 12 mov $0x12345678,%esi
5f: bf 05 00 cd ab mov $0xabcd0005,%edi
64: e8 00 00 00 00 callq 69 <try_iowrite32+0x69>
69: be 78 56 34 12 mov $0x12345678,%esi
6e: bf 06 00 cd ab mov $0xabcd0006,%edi
73: e8 00 00 00 00 callq 78 <try_iowrite32+0x78>
78: c9 leaveq
79: c3 retq
...
Those “callq” statements are modified upon linking. To resolve what these are calling, go
$ readelf -r minimodule.ko
Relocation section '.rela.text' at offset 0xa9b0 contains 8 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000005 002300000002 R_X86_64_PC32 0000000000000000 mcount - 4
000000000014 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000023 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000032 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000041 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000053 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000065 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000074 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
(the output continues with relocation information for debug variables).
It’s quite easy to work this out: The “Offset” column tells us the offset in the object code. For example, a callq statement begins at 0x13, but the address to call starts at 0x14. The second entry in the relocation section points at offset 0x14, and says that the target is iowrite32().
So from this output we learn that all callq’s are to iowrite32(), except the first one, which goes to mcount() (which is intended for kernel call tracing).
Now to conclusions: There are no memory barriers in the code, except those generated by wmb() and rmb(), which added sfence and lfence respectively. sfence is defined as
Performs a serializing operation on all store instructions that were issued prior the SFENCE instruction. This serializing operation guarantees that every store instruction that precedes in program order the SFENCE instruction is globally visible before any store instruction that follows the SFENCE instruction is globally visible. The SFENCE instruction is ordered with respect store instructions, other SFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to load instructions or the LFENCE instruction.
and lfence as
Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction. This serializing operation guarantees that every load instruction that precedes in program order the LFENCE instruction is globally visible before any load instruction that follows the LFENCE instruction is globally visible. The LFENCE instruction is ordered with respect to load instructions, other LFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to store instructions or the SFENCE instruction.
One can feel the Intel-headache just reading this.
Results on x86 (32 bit)
Compiling this against a 32-bit kernel, with a slightly different configuration:
$ objdump -d minimodule.ko
minimodule.ko: file format elf32-i386
Disassembly of section .text:
00000000 <try_iowrite32>:
0: ba 78 56 34 12 mov $0x12345678,%edx
5: b8 01 00 cd ab mov $0xabcd0001,%eax
a: e8 fc ff ff ff call b <try_iowrite32+0xb>
f: ba 78 56 34 12 mov $0x12345678,%edx
14: b8 01 00 cd ab mov $0xabcd0001,%eax
19: e8 fc ff ff ff call 1a <try_iowrite32+0x1a>
1e: ba 78 56 34 12 mov $0x12345678,%edx
23: b8 02 00 cd ab mov $0xabcd0002,%eax
28: e8 fc ff ff ff call 29 <try_iowrite32+0x29>
2d: ba 78 56 34 12 mov $0x12345678,%edx
32: b8 03 00 cd ab mov $0xabcd0003,%eax
37: e8 fc ff ff ff call 38 <try_iowrite32+0x38>
3c: f0 83 04 24 00 lock addl $0x0,(%esp)
41: ba 78 56 34 12 mov $0x12345678,%edx
46: b8 04 00 cd ab mov $0xabcd0004,%eax
4b: e8 fc ff ff ff call 4c <try_iowrite32+0x4c>
50: f0 83 04 24 00 lock addl $0x0,(%esp)
55: ba 78 56 34 12 mov $0x12345678,%edx
5a: b8 05 00 cd ab mov $0xabcd0005,%eax
5f: e8 fc ff ff ff call 60 <try_iowrite32+0x60>
64: ba 78 56 34 12 mov $0x12345678,%edx
69: b8 06 00 cd ab mov $0xabcd0006,%eax
6e: e8 fc ff ff ff call 6f <try_iowrite32+0x6f>
73: f0 83 04 24 00 lock addl $0x0,(%esp)
78: c3 ret
79: 00 00 add %al,(%eax)
...
Disassembly of section .altinstr_replacement:
00000000 <.altinstr_replacement>:
0: 0f ae f8 sfence
3: 0f ae e8 lfence
6: 0f ae e8 lfence
$ readelf -r minimodule.ko
Relocation section '.rel.text' at offset 0xc3e0 contains 7 entries:
Offset Info Type Sym.Value Sym. Name
0000000b 00002402 R_386_PC32 00000000 iowrite32
0000001a 00002402 R_386_PC32 00000000 iowrite32
00000029 00002402 R_386_PC32 00000000 iowrite32
00000038 00002402 R_386_PC32 00000000 iowrite32
0000004c 00002402 R_386_PC32 00000000 iowrite32
00000060 00002402 R_386_PC32 00000000 iowrite32
0000006f 00002402 R_386_PC32 00000000 iowrite32
So it’s in essence the same, only the mcount() call in the beginning was skipped.
The related sources for iowrite32() on ARM
These are the key excerpts from arch/arm/include/asm/io.h:
static inline void __raw_writel(u32 val, volatile void __iomem *addr)
{
asm volatile("str %1, %0"
: "+Qo" (*(volatile u32 __force *)addr)
: "r" (val));
}
...
#define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c)
...
#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })
...
#define iowrite32(v,p) ({ __iowmb(); __raw_writel((__force __u32)cpu_to_le32(v), p); })
As for __iowmb(), it goes
/* IO barriers */
#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
#include <asm/barrier.h>
#define __iormb() rmb()
#define __iowmb() wmb()
#else
#define __iormb() do { } while (0)
#define __iowmb() do { } while (0)
#endif
so it’s down to the configuration if __iowmb() does something. And to get the full picture, these are snips from arch/arm/include/asm/barrier.h:
#if __LINUX_ARM_ARCH__ >= 7
#define isb(option) __asm__ __volatile__ ("isb " #option : : : "memory")
#define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory")
#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
...
#ifdef CONFIG_ARCH_HAS_BARRIERS
#include <mach/barriers.h>
#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb() do { dsb(); outer_sync(); } while (0)
#define rmb() dsb()
#define wmb() do { dsb(st); outer_sync(); } while (0)
#else
#define mb() barrier()
#define rmb() barrier()
#define wmb() barrier()
#endif
Results on ARM
This is what the same module compiled for ARM Cortex A9, Little Endian gives (I’ve added extra newlines in the middle for clarity):
minimodule.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <try_iowrite32>:
0: e92d4038 push {r3, r4, r5, lr}
4: f57ff04e dsb st
8: e59f2118 ldr r2, [pc, #280] ; 128 <try_iowrite32+0x128>
c: e1a04002 mov r4, r2
10: e5923018 ldr r3, [r2, #24]
14: e3530000 cmp r3, #0
18: 0a000000 beq 20 <try_iowrite32+0x20>
1c: e12fff33 blx r3
20: e59f3104 ldr r3, [pc, #260] ; 12c <try_iowrite32+0x12c>
24: e59f1104 ldr r1, [pc, #260] ; 130 <try_iowrite32+0x130>
28: e5831678 str r1, [r3, #1656] ; 0x678
2c: f57ff04e dsb st
30: e5942018 ldr r2, [r4, #24]
34: e1a05001 mov r5, r1
38: e1a04003 mov r4, r3
3c: e3520000 cmp r2, #0
40: 0a000000 beq 48 <try_iowrite32+0x48>
44: e12fff32 blx r2
48: e5845678 str r5, [r4, #1656] ; 0x678
4c: f57ff04e dsb st
50: e59f20d0 ldr r2, [pc, #208] ; 128 <try_iowrite32+0x128>
54: e1a04002 mov r4, r2
58: e5923018 ldr r3, [r2, #24]
5c: e3530000 cmp r3, #0
60: 0a000000 beq 68 <try_iowrite32+0x68>
64: e12fff33 blx r3
68: e59f30bc ldr r3, [pc, #188] ; 12c <try_iowrite32+0x12c>
6c: e59f20c0 ldr r2, [pc, #192] ; 134 <try_iowrite32+0x134>
70: e5832678 str r2, [r3, #1656] ; 0x678
74: f57ff04e dsb st
78: e5942018 ldr r2, [r4, #24]
7c: e1a04003 mov r4, r3
80: e3520000 cmp r2, #0
84: 0a000000 beq 8c <try_iowrite32+0x8c>
88: e12fff32 blx r2
8c: e59f30a4 ldr r3, [pc, #164] ; 138 <try_iowrite32+0x138>
90: e5843678 str r3, [r4, #1656] ; 0x678
94: f57ff04e dsb st
98: e59f2088 ldr r2, [pc, #136] ; 128 <try_iowrite32+0x128>
9c: e1a04002 mov r4, r2
a0: e5923018 ldr r3, [r2, #24]
a4: e3530000 cmp r3, #0
a8: 0a000000 beq b0 <try_iowrite32+0xb0>
ac: e12fff33 blx r3
b0: f57ff04e dsb st
b4: e5943018 ldr r3, [r4, #24]
b8: e3530000 cmp r3, #0
bc: 0a000000 beq c4 <try_iowrite32+0xc4>
c0: e12fff33 blx r3
c4: e59f3060 ldr r3, [pc, #96] ; 12c <try_iowrite32+0x12c>
c8: e59f206c ldr r2, [pc, #108] ; 13c <try_iowrite32+0x13c>
cc: e5832678 str r2, [r3, #1656] ; 0x678
d0: f57ff04f dsb sy
d4: f57ff04e dsb st
d8: e59f1048 ldr r1, [pc, #72] ; 128 <try_iowrite32+0x128>
dc: e1a04003 mov r4, r3
e0: e1a05001 mov r5, r1
e4: e5912018 ldr r2, [r1, #24]
e8: e3520000 cmp r2, #0
ec: 0a000000 beq f4 <try_iowrite32+0xf4>
f0: e12fff32 blx r2
f4: e59f3044 ldr r3, [pc, #68] ; 140 <try_iowrite32+0x140>
f8: e5843678 str r3, [r4, #1656] ; 0x678
fc: f57ff05a dmb ishst
100: f57ff04e dsb st
104: e5953018 ldr r3, [r5, #24]
108: e3530000 cmp r3, #0
10c: 0a000000 beq 114 <try_iowrite32+0x114>
110: e12fff33 blx r3
114: e59f3010 ldr r3, [pc, #16] ; 12c <try_iowrite32+0x12c>
118: e59f2024 ldr r2, [pc, #36] ; 144 <try_iowrite32+0x144>
11c: e5832678 str r2, [r3, #1656] ; 0x678
120: f57ff05b dmb ish
124: e8bd8038 pop {r3, r4, r5, pc}
128: 00000000 .word 0x00000000
12c: 12345000 .word 0x12345000
130: abcd0001 .word 0xabcd0001
134: abcd0002 .word 0xabcd0002
138: abcd0003 .word 0xabcd0003
13c: abcd0004 .word 0xabcd0004
140: abcd0005 .word 0xabcd0005
144: abcd0006 .word 0xabcd0006
This was a lot of code (somehow that’s what you get with ARM). There are no calls to iowrite32(), so this is done inline for ARM (consistent with the sources).
This requires some translation from ARM opcodes to human language (taken from this page):
- DSB SY — Data Synchronization Barrier: No instruction in program order after this instruction executes until all explicit memory accesses before this instruction complete, as well as all cache, branch predictor and TLB maintenance operations before this instruction complete.
- DSB ST — Like DSB SY, but waits only for data writes to complete.
- DMB ISHST — Data Memory Barrier, operation that waits only for stores to complete, and only to the inner shareable domain (whatever that “inner shareable domain” is).
- DMB ISH — Data Memory Barrier, operation that waits only to the inner shareable domain.
Now let’s decipher the assembly code, which is quite tangled. Luckily, it’s easy to spot the seven write operations as the seven “str” commands in the assembly code. It’s also easy to see that all each iowrite32() starts with an “dsb st” which forces waiting until previous writes has completed. So each iowrite32() spans from a “dsb st” to a “str”. This matches the definition of iowrite32() as __iowmb() and then __raw_writel(…).
The memory barriers are quite clear too:
- wmb() becomes “dsb st”, the full synchronization barrier for writes (which is also issued automatically before each iowrite32).
- rmb() becomes “dsb sy”, the full synchronization barrier for reads and writes
- smp_wmb() becomes “dmb ishst”, the “inner shareable domain” memory barrier for writes
- smp_rmb() becomes “dmb ish”, the “inner shareable domain” memory barrier for reads and writes
Now with writel()
So I through it would be nice to repeat all this with writel(). Spoiler: Nothing thrilling happens here.
Module code (includes omitted):
void try_writel(void) {
void __iomem *p = (void *) 0x12345678;
writel(0xabcd0001, p);
writel(0xabcd0001, p);
writel(0xabcd0002, p);
mmiowb();
writel(0xabcd0003, p);
wmb();
writel(0xabcd0004, p);
rmb();
writel(0xabcd0005, p);
smp_wmb();
writel(0xabcd0006, p);
smp_rmb();
}
EXPORT_SYMBOL(try_writel);
Assembly on 64-bit Intel:
minimodule.ko: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <try_writel>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <try_writel+0x9>
9: b8 01 00 cd ab mov $0xabcd0001,%eax
e: 89 04 25 78 56 34 12 mov %eax,0x12345678
15: 89 04 25 78 56 34 12 mov %eax,0x12345678
1c: b8 02 00 cd ab mov $0xabcd0002,%eax
21: 89 04 25 78 56 34 12 mov %eax,0x12345678
28: b8 03 00 cd ab mov $0xabcd0003,%eax
2d: 89 04 25 78 56 34 12 mov %eax,0x12345678
34: 0f ae f8 sfence
37: b8 04 00 cd ab mov $0xabcd0004,%eax
3c: 89 04 25 78 56 34 12 mov %eax,0x12345678
43: 0f ae e8 lfence
46: b8 05 00 cd ab mov $0xabcd0005,%eax
4b: 89 04 25 78 56 34 12 mov %eax,0x12345678
52: b8 06 00 cd ab mov $0xabcd0006,%eax
57: 89 04 25 78 56 34 12 mov %eax,0x12345678
5e: c9 leaveq
5f: c3 retq
OK, so writel() just translated into a couple of inline “mov” opcodes. There’s even an optimization between the first and second move, so %eax isn’t set twice. Hi-tec, I’m telling you.
And on 32-bit Intel:
minimodule.ko: file format elf32-i386
Disassembly of section .text:
00000000 <try_writel>:
0: b8 01 00 cd ab mov $0xabcd0001,%eax
5: a3 78 56 34 12 mov %eax,0x12345678
a: a3 78 56 34 12 mov %eax,0x12345678
f: b0 02 mov $0x2,%al
11: a3 78 56 34 12 mov %eax,0x12345678
16: b0 03 mov $0x3,%al
18: a3 78 56 34 12 mov %eax,0x12345678
1d: f0 83 04 24 00 lock addl $0x0,(%esp)
22: b0 04 mov $0x4,%al
24: a3 78 56 34 12 mov %eax,0x12345678
29: f0 83 04 24 00 lock addl $0x0,(%esp)
2e: b0 05 mov $0x5,%al
30: a3 78 56 34 12 mov %eax,0x12345678
35: b0 06 mov $0x6,%al
37: a3 78 56 34 12 mov %eax,0x12345678
3c: f0 83 04 24 00 lock addl $0x0,(%esp)
41: c3 ret
...
Disassembly of section .altinstr_replacement:
00000000 <.altinstr_replacement>:
0: 0f ae f8 sfence
3: 0f ae e8 lfence
6: 0f ae e8 lfence
And for ARM, it’s exactly the same code (to the byte) as iowrite32() is an alias for writel(). But I listed it here anyhow for those who don’t take my word for it:
minimodule.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <try_writel>:
0: e92d4038 push {r3, r4, r5, lr}
4: f57ff04e dsb st
8: e59f2118 ldr r2, [pc, #280] ; 128 <try_writel+0x128>
c: e1a04002 mov r4, r2
10: e5923018 ldr r3, [r2, #24]
14: e3530000 cmp r3, #0
18: 0a000000 beq 20 <try_writel+0x20>
1c: e12fff33 blx r3
20: e59f3104 ldr r3, [pc, #260] ; 12c <try_writel+0x12c>
24: e59f1104 ldr r1, [pc, #260] ; 130 <try_writel+0x130>
28: e5831678 str r1, [r3, #1656] ; 0x678
2c: f57ff04e dsb st
30: e5942018 ldr r2, [r4, #24]
34: e1a05001 mov r5, r1
38: e1a04003 mov r4, r3
3c: e3520000 cmp r2, #0
40: 0a000000 beq 48 <try_writel+0x48>
44: e12fff32 blx r2
48: e5845678 str r5, [r4, #1656] ; 0x678
4c: f57ff04e dsb st
50: e59f20d0 ldr r2, [pc, #208] ; 128 <try_writel+0x128>
54: e1a04002 mov r4, r2
58: e5923018 ldr r3, [r2, #24]
5c: e3530000 cmp r3, #0
60: 0a000000 beq 68 <try_writel+0x68>
64: e12fff33 blx r3
68: e59f30bc ldr r3, [pc, #188] ; 12c <try_writel+0x12c>
6c: e59f20c0 ldr r2, [pc, #192] ; 134 <try_writel+0x134>
70: e5832678 str r2, [r3, #1656] ; 0x678
74: f57ff04e dsb st
78: e5942018 ldr r2, [r4, #24]
7c: e1a04003 mov r4, r3
80: e3520000 cmp r2, #0
84: 0a000000 beq 8c <try_writel+0x8c>
88: e12fff32 blx r2
8c: e59f30a4 ldr r3, [pc, #164] ; 138 <try_writel+0x138>
90: e5843678 str r3, [r4, #1656] ; 0x678
94: f57ff04e dsb st
98: e59f2088 ldr r2, [pc, #136] ; 128 <try_writel+0x128>
9c: e1a04002 mov r4, r2
a0: e5923018 ldr r3, [r2, #24]
a4: e3530000 cmp r3, #0
a8: 0a000000 beq b0 <try_writel+0xb0>
ac: e12fff33 blx r3
b0: f57ff04e dsb st
b4: e5943018 ldr r3, [r4, #24]
b8: e3530000 cmp r3, #0
bc: 0a000000 beq c4 <try_writel+0xc4>
c0: e12fff33 blx r3
c4: e59f3060 ldr r3, [pc, #96] ; 12c <try_writel+0x12c>
c8: e59f206c ldr r2, [pc, #108] ; 13c <try_writel+0x13c>
cc: e5832678 str r2, [r3, #1656] ; 0x678
d0: f57ff04f dsb sy
d4: f57ff04e dsb st
d8: e59f1048 ldr r1, [pc, #72] ; 128 <try_writel+0x128>
dc: e1a04003 mov r4, r3
e0: e1a05001 mov r5, r1
e4: e5912018 ldr r2, [r1, #24]
e8: e3520000 cmp r2, #0
ec: 0a000000 beq f4 <try_writel+0xf4>
f0: e12fff32 blx r2
f4: e59f3044 ldr r3, [pc, #68] ; 140 <try_writel+0x140>
f8: e5843678 str r3, [r4, #1656] ; 0x678
fc: f57ff05a dmb ishst
100: f57ff04e dsb st
104: e5953018 ldr r3, [r5, #24]
108: e3530000 cmp r3, #0
10c: 0a000000 beq 114 <try_writel+0x114>
110: e12fff33 blx r3
114: e59f3010 ldr r3, [pc, #16] ; 12c <try_writel+0x12c>
118: e59f2024 ldr r2, [pc, #36] ; 144 <try_writel+0x144>
11c: e5832678 str r2, [r3, #1656] ; 0x678
120: f57ff05b dmb ish
124: e8bd8038 pop {r3, r4, r5, pc}
128: 00000000 .word 0x00000000
12c: 12345000 .word 0x12345000
130: abcd0001 .word 0xabcd0001
134: abcd0002 .word 0xabcd0002
138: abcd0003 .word 0xabcd0003
13c: abcd0004 .word 0xabcd0004
140: abcd0005 .word 0xabcd0005
144: abcd0006 .word 0xabcd0006
Overview
Having a Lenovo Yoga 2 13″ (non-pro) running Ubuntu 14.04.1, I couldn’t get Wireless LAN up and running, as the WLAN NIC appeared to be “hardware locked”. This is the summary of how I solved this issue. If you’re not interested in the gory details, you may jump right to bottom, where I offer a replacement module that fixes it. At least for me.
Environment details: Distribution kernel 3.13.0-32-generic on an Intel i5-4210U CPU @ 1.70GHz. The Wifi device is an Intel Dual Band Wireless-AC 7260 (8086:08b1) connected to the PCIe bus, taken care of by the iwlwifi driver.
The problem
Laptops have a mechanism for working in “flight mode” which means turning off any device that could emit RF power, so that the airplane can crash for whatever different reason. Apparently, some laptops have a physical on-off switch to request this, but on Lenovo Yoga 13, the arrangement is to press a button on the keyboard with an airplane drawn on it. The one shared with F7.
It seems to be, that on Lenovo Yoga 13, the ACPI interface, which is responsible for reporting the Wifi’s buttons state, always reports that it’s in flight mode. So Linux turns off Wifi, and on the desktop’s Gnome network applet it says “Wi-Fi is disabled by hardware switch”.
In the dmesg log one can tell the problem with a line like
iwlwifi 0000:01:00.0: RF_KILL bit toggled to disable radio.
which is issued by the interrupt request handler defined in drivers/net/wireless/iwlwifi/pcie/rx.c, which responds to an interrupt from the device that informs the host that the hardware RF kill bit is set. So the iwlwifi module is not to blame here — it just responds to a request from the ACPI subsystem.
rfkill
The management of RF-related devices is handled by the rfkill subsystem. On my laptop, before solving the problem, a typical output went
$ rfkill list all
0: ideapad_wlan: Wireless LAN
Soft blocked: yes
Hard blocked: yes
1: ideapad_bluetooth: Bluetooth
Soft blocked: no
Hard blocked: yes
6: hci0: Bluetooth
Soft blocked: no
Hard blocked: no
7: phy1: Wireless LAN
Soft blocked: yes
Hard blocked: yes
So there are different entities that can be controlled with rfkill, enumerated and assigned soft and hard blocks. Each of these relate to a directory in /sys/class/rfkill/. For example, the last device, “phy7″ enumerated as 7 corresponds to /sys/class/rfkill/rfkill7, where the “hard” and “soft” pseudo-files signify the status with “0″ or “1″ values.
The soft block can be changed by “rfkill unblock 0″ or “rfkill unblock 7″, but this doesn’t really help with the hardware block. Both has to be “off” to use the device.
As can be seen easily from the rkfill list above, each of the physical devices are registered twice as rfkill devices: Once by their driver, and a second time by the ideapad_laptop driver. This will be used in the solution below.
The ideapad_laptop module
The ideapad-laptop module is responsible for talking with the ACPI layer on machines that match “VPC2004″ as a platform (as in /sys/devices/platform/VPC2004:00, or /sys/bus/acpi/devices/VPC2004:00, but doesn’t fit anything found in /sys/class/dmi/id/).
Blacklisting this module has been suggested for Yoga laptops all over the web. In particular this post suggests to insmod the module once with a hack that forces the Wifi on, and then blacklist it.
But by blacklisting ideapad-laptop, the computer loses some precious functionality, including disabling Wifi and the touchpad by pressing a button. So this is not an appealing solution.
Ideapad’s two debugfs output files go:
# cat /sys/kernel/debug/ideapad/cfg
cfg: 0x017DE014
Capability: Bluetooth Wireless Camera
Graphic:
# cat /sys/kernel/debug/ideapad/status
Backlight max: 16
Backlight now: 9
BL power value: On
=====================
Radio status: Off(0)
Wifi status: Off(0)
BT status: On(1)
3G status: Off(0)
=====================
Touchpad status:Off(0)
Camera status: On(1)
So the Radio and Wifi statuses, which are read from the ACPI registers, are off. This makes the ideapad_laptop module conclude that everything should go off.
The solution
In essence, the solution for the problem is to take the ideapad_laptop’s hands off the Wifi hardware, except for turning the hardware block off when it’s loaded. It consists of making the following changes in drivers/platform/x86/ideapad-laptop.c:
- First, remove the driver’s rfkill registration. Somewhere at the beginning of the file, change
#define IDEAPAD_RFKILL_DEV_NUM (3)
to
#define IDEAPAD_RFKILL_DEV_NUM (2)
and in the definition of ideapad_rfk_data[], remove the line saying
{ "ideapad_wlan", CFG_WIFI_BIT, VPCCMD_W_WIFI, RFKILL_TYPE_WLAN }
This prevents the driver from presenting an rfkill interface, so it keeps its hands off.
- There is however a chance that the relevant bit in the ACPI layer already has the hardware block on. So let’s turn it off every time the driver loads. In ideapad_acpi_add(), after the call to ideapad_sync_rfk_state(), more or less, add the following two lines:
pr_warn("Hack: Forcing WLAN hardware block off\n");
write_ec_cmd(priv->adev->handle, VPCCMD_W_WIFI, 1);
- And finally, solve a rather bizarre phenomenon, that when reading for the RF state with a VPCCMD_R_RF command, the Wifi interface is hardware blocked for some reason. Note that radio is always in off mode, so it’s a meaningless register on Yoga 2. This is handled in two places. First, empty ideapad_sync_rfk_state() completely, by turning it into
static void ideapad_sync_rfk_state(struct ideapad_private *priv)
{
}
This function reads VPCCMD_R_RF and calls rfkill_set_hw_state() accordingly, but on Yoga 2 it will always block everything, so what’s the point?
Next, in debugfs_status_show() which prints out /sys/kernel/debug/ideapad/status, remove the following three lines:
if (!read_ec_data(priv->adev->handle, VPCCMD_R_RF, &value))
seq_printf(s, "Radio status:\t%s(%lu)\n",
value ? "On" : "Off", value);
Having these changes made, the Wifi works properly, regardless of it was previously reported hardware blocked.
This can’t be submitted as a patch to the kernel, because presumably some laptops need the rfkill interface for Wifi through ideapad_laptop (or else, why was it put there in the first place?).
Also, maybe I should have done this for Bluetooth too? Don’t know. I don’t use Bluetooth right now, and the desktop applet seems to say all is fine with it anyhow.
Download the driver fix
For the lazy ones, I’ve prepared a little kit for compiling the relevant driver. I’ve taken the driver as it appears in kernel 3.16, more or less, and applied the changes above. And I then added a Makefile to make it compile easily. Since the kernel API changes rather rapidly, this will probably work well for kernels around 3.16 (that includes 3.13), and then you’ll have to apply the changes manually. If it isn’t fixed in the kernel itself by then.
Download it from here, unzip it, change directory, and compile it with typing “make”. This works only if you have the kernel headers and gcc compiler installed, which is usually the case in recent distributions. So a session like this is expected:
$ make
make -C /lib/modules/3.13.0-32-generic/build SUBDIRS=/home/eli/yoga-wifi-fix modules
make[1]: Entering directory `/usr/src/linux-headers-3.13.0-32-generic'
CC [M] /home/eli/yoga-wifi-fix/ideapad-laptop.o
Building modules, stage 2.
MODPOST 1 modules
CC /home/eli/yoga-wifi-fix/ideapad-laptop.mod.o
LD [M] /home/eli/yoga-wifi-fix/ideapad-laptop.ko
make[1]: Leaving directory `/usr/src/linux-headers-3.13.0-32-generic'
Then replace the fresh ideapad-laptop.ko with the one the kernel uses. First, let’s figure out where to. The modinfo command help here:
$ modinfo ideapad_laptop
filename: /lib/modules/3.13.0-32-generic/kernel/drivers/platform/x86/ideapad-laptop.ko
license: GPL
description: IdeaPad ACPI Extras
author: David Woodhouse <dwmw2@infradead.org>
srcversion: BA339D663FA3B10105A1DC0
alias: acpi*:VPC2004:*
depends: sparse-keymap
vermagic: 3.13.0-32-generic SMP mod_unload modversions
parm: no_bt_rfkill:No rfkill for bluetooth. (bool)
So the directory is now known (marked in red). This leaves us with copying it into the right place:
$ sudo cp ideapad-laptop.ko /lib/modules/3.13.0-32-generic/kernel/drivers/platform/x86/
The new module is valid on the next reboot. Or the next insmod/modprobe, if you’re have the same allergy as myself regarding rebooting a Linux system.
The idea is to take a mail that has already been send (and is hence in the “sent” folder and send it again with sendmail. Why? In my case the idea is that Thunderbird and sendmail connect to different relay servers, and the one used by Thunderbird 3.0.7 is blacklisted by the destination (I got a reject message).
It’s simple: Find the message in Thunderbird’s “Sent” folder, and save it as an .eml file, say, Trying.eml.
Possibly edit the file, and remove the first three lines (even though there’s probably no problem leaving them there):
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00800000
X-Mozilla-Keys:
Possibly add yourself as a Bcc: after the From: line with
Bcc: Myself <myself@example.com>
And then send the message with
$ sendmail -t < Trying.eml
The -t flag means to find the recipient’s address in the message’s body, which is usually what we want.
I ran into a weird problem while attempting to enable SDMA for UARTs on an i.MX53 processor running Freescale’s 2.6.35.3 Linux kernel: To begin with, the UART would only transmit 48 bytes, which is probably a result of only one watermark event arriving (the initial kickoff filled the UART’s FIFO with 32 bytes, and then one SDMA event occurred when the FIFO reached 16 bytes’ fill, so another 16 bytes were sent).
So it seemed like the SDMA core misses the UART’s watermark events. More scrutinized experiments with my own test scripts revealed a variety of weird behaviors, including what appeared to be preemption of the SDMA script’s process, even though the reference manual is quite clear about it: Context switching of SDMA scripts is voluntary. And still, the flow of data on the UART’s tx lines was stopped for 5-6 ms periods randomly, even when I ran a busy-wait loop in the SDMA script, polling the “not full” flag of the UART’s transmission FIFO.
So it looked like something stopped the SDMA script from running in the middle of the loop (which included no “yield” nor “done” command). Or maybe a completely different issue? Maybe the peripheral bus wasn’t completely coherent? Anything seemed possible at some point.
As the title implies, the problem was power management, and poor settings of the SDMA’s behavior during low power modes.
It goes like this: Every time the Linux kernel’s scheduler has no process to run, it executes an WFI ARM processor command, halting the processor until an interrupt arrives (from a peripheral or just the scheduler’s tick clock). But before doing that, the kernel calls an architecture-dependent function, arch_idle(), which possibly shuts down or slows down clocks in order to increase power savings.
The kernel I used didn’t configure the SDMA’s behavior in the lower-power WAIT mode correctly, causing it halt and miss events while the processor was in this mode. The word is that to overcome this, the CCM_CCGR bits for SDMA clocks should be set to 11 (bits 31-30 in CCM_CCGR4). There is probably also a need to enable aips_tz1_clk to keep the SDMA and aips_tz1 clocks running. But since the application I worked on didn’t have any power restrictions, I decided to avoid these power mode switches altogether.
This was done by editing arch/arm/mach-mx5/system.c in the kernel tree, where it said:
void arch_idle(void)
{
if (likely(!mxc_jtag_enabled)) {
if (ddr_clk == NULL)
ddr_clk = clk_get(NULL, "ddr_clk");
if (gpc_dvfs_clk == NULL)
gpc_dvfs_clk = clk_get(NULL, "gpc_dvfs_clk");
/* gpc clock is needed for SRPG */
clk_enable(gpc_dvfs_clk);
mxc_cpu_lp_set(arch_idle_mode);
and delete the last line in the listing above — the call to mxc_cpu_lp_set(), which changes the processor’s power mode.
This solved the SDMA problem for me.
As a matter of fact, I would suggest commenting out this line during the development phase of any i.MX-based system, and return it once everything works. True, this shouldn’t be an issue if the clocks are properly configured. But if they’re not, something will fail, and the natural tendency is to focus the drivers of the failing functionality, and not looking for power management issues.
When the power reduction function is re-enabled at some later point, it’s quite evident what the problem is, if something fails then. So even if the target product is battery-driven, do yourself a favor, and drop that line in system.c until you’re finished struggling with other things.
Running Xillinux on the Zybo board, this is how I toggled a GPIO pin from a plain one-liner bash script in Linux. The same technique can be used for other Zynq-7000 boards (Zedboard in particular) to easily control GPIO pins.
First, I looked up which GPIO pin it is. The pin assignments can be found in the FPGA bundle, in xillydemo.ucf (or in xillydemo.sdc, if Vivado was used to build the project).
So I choose to connect to PMOD header JB, first pin, and the PMOD’s GND.
In the UCF file there’s a line saying
## Pmod Header JB
NET PS_GPIO[32] LOC=T20 | IOSTANDARD=LVCMOS33; #IO_L15P_T2_DQS_34
and its counterpart in the SDC file is
## Pmod Header JB
set_property -dict "PACKAGE_PIN T20 IOSTANDARD LVCMOS33" [get_ports "PS_GPIO[32]"]
So it’s quite clear and cut that the PS_GPIO[32] signal is connected to PMOD B. It doesn’t hurt taking a look on the board’s schematics as well, if you’re convenient with those drawings, and see that the Zynq device’s pin T20 indeed goes to PMOD B, and which pin.
Hooked up as shown in this pic (click to enlarge):
The offset between PS_GPIO numbers and those designated by Linux is 54. So this pin is found as number 32+54=86.
Hence
# echo 86 > /sys/class/gpio/export
# echo out > /sys/class/gpio/gpio86/direction
And then poor man’s oscillator:
# while [ 1 ] ; do echo 1 > /sys/class/gpio/gpio86/value ; echo 0 > /sys/class/gpio/gpio86/value ; done
This runs at a staggering 2.9 kHz. Pretty impressive for the slowest form of programming one can think about.
So I installed Vivado on my Centos 6.5 64-bit Linux machine, and even though it promised to install icons on my desktop, it didn’t. This is how I installed them manually. There is surely a simpler way, as the special launch bash scripts I created must be somewhere. But I didn’t bother looking.
So it consists of generating four files, all in all, as follows.
First, as root, create these two files, and make them executable by all:
/usr/local/bin/run-vivado as follows:
#!/bin/bash
. /opt/Xilinx/Vivado/2014.1/settings64.sh
vivado &
And /usr/local/bin/run-sdk:
#!/bin/bash
. /opt/Xilinx/SDK/2014.1/settings64.sh
xsdk &
The path to Xilinx’ installation is /opt/Xilinx, of course. Adjust this to where your installation was made, and you should pick the settings32.sh file if you’re running on a 32-bit machine.
And next, we have the launchers, both to be placed in the Desktop directory of the ordinary user who should have these on the desktop.
The file named “Vivado 2014.1.desktop” goes
[Desktop Entry]
Version=1.0
Type=Application
Terminal=false
Icon=/opt/Xilinx/Vivado/2014.1/doc/images/vivado_logo.ico
Name[en_US]=Vivado 2014.1
Exec=/usr/local/bin/run-vivado
Path=/home/myself/vivado-outputs/
Name=Vivado 2014.1
StartupNotify=true
and “Xilinx SDK.desktop” is
[Desktop Entry]
Version=1.0
Type=Application
Terminal=false
Icon=/opt/Xilinx/SDK/2014.1/data/sdk/images/sdk_logo.ico
Name[en_US]=Xilinx SDK
Exec=/usr/local/bin/run-sdk
Name=Xilinx SDK
StartupNotify=true
I’ve marked the StartupNotify assignment in red, because this is what makes the mouse pointer turn into “busy” when the program is launched, until the splash window appears. It’s important for Vivado in particular, which takes some time to start up.
Also, the Path assignment in the Vivado launcher sets the directory at which Vivado runs, which should be changed to a directory that exists, and is a convenient place to dump all log files that Vivado generates.
A list of possible assignments in desktop launchers can be found on this page.
Background
This is yet another war story about making the FSBL boot on a Zynq processor.
I had prepared an FSBL for a certain target using SDK 14.6, and then someone needed it in a Vivado package, using the SDK attached to Vivado 2014.1. In a perfect world, I would have exported the system’s configuration from XPS 14.6 to Vivado as an XML file, and generated the FSBL there. But experience shows that nothing really guarantees that the processor’s configuration will be adopted correctly in Vivado. As a matter of fact, I’ve seen that Vivado imports some parameters, and others are ignored.
But hey, I could just copy the existing FSBL source files to a new workspace in the target SDK? After all, it’s just C code!
This is in fact possible, going File > Import… > General > Existing Projects into Workspace. Then navigate to the path of the original project’s workspace. And don’t forget marking “Copy projects into workspace” so that the old one can be moved or deleted. A popup will allow selecting which projects to import, and it’s done!
Well, not. Selecting the three projects in an FSBL source set (fsbl, fsbl_bsp and system_hw_platform) will indeed create a fresh FSBL project, but it fails compiling (saying that it can’t find libxilffs as required by the -lxilffs or something like that).
To work around this, I imported only the system_hw_platform project, and generated the FSBL project in Vivado’s SDK, as usual: File > New > Application Project. Set the name to “fsbl”, make sure that the underlying hardware project it system_hw_platform. Click “Next” and pick “Zynq FSBL” as the template.
This makes sense, because the FSBL project relies on the C sources that were generated when XPS exported the project to SDK. So the hardware configuration remains correct, and the FSBL is new. No reason why this shouldn’t work, in theory.
The project compiled right away, and an fsbl.elf was ready for mixing into a boot.bin file.
Hurray! Not. It didn’t boot.
Despair not
The immediate measure for these cases in compiling the FSBL with the -DFSBL_DEBUG compilation parameter (which defines the FSBL_DEBUG compilation variable, turning on debug messages). With some luck, something informative will show up on the serial console, even if it appeared dead before.
I was one of those lucky bas#$%*s. I got:
PS7_INIT_FAIL : PS7 initialization successful
FSBL Status = 0xA012
Hmmm… That sounds like a mixed-up error message. It failed because it was successful? Well, in fact, the message itself represents the confusion causing the problem.
The FSBL status 0xA012 is returned when the call to ps7_init() fails in main.c. Or more precisely, when the returned value isn’t FSBL_PS7_INIT_SUCCESS. By the way, the FSBL generated by SDK 14.6 doesn’t even bother to check the return value of ps7_init(), but that’s irrelevant here.
Anyhow, note that ps7_init() is defined in the system_hw_platform, which consists of sources generated by XPS 14.6, but called by the FSBL, which was generated by Vivado.
This is a bit delicate, because ps7_init() returns PS7_INIT_SUCCESS when successful (see ps7_init.c), which happens to be defined in ps7_init.h as
#define PS7_INIT_SUCCESS (0) // 0 is success in good old C
and non-zero values meaning failure. This is the classic UNIX convention.
For some reason, this is what one finds in fsbl.h:
#ifdef NEW_PS7_ERR_CODE
#define FSBL_PS7_INIT_SUCCESS PS7_INIT_SUCCESS
#else
#define FSBL_PS7_INIT_SUCCESS (1)
#endif
In short: FSBL_PS7_INIT_SUCCESS=1, PS7_INIT_SUCCESS=0. A problem indeed.
So this is a direct consequence of mixing an old hardware project with a new FSBL. They changed the error code values somewhere in the middle.
Solution
The clean way to fix this is defining NEW_PS7_ERR_CODE during compilation. The less clean method is just remove this #ifdef statement and leave it as
#define FSBL_PS7_INIT_SUCCESS PS7_INIT_SUCCESS
And with this FSBL booted correctly and all was well.
I know that getting the FSBL to boot is a recurring problem. Please don’t turn to me for help if your board doesn’t boot — there’s no secret trick, just good old debugging that takes time and effort.
While trying to use executables from one ARM-based distribution to another, it failed to run, even before trying to load any libraries. The ARM architectures were compatible (armhf in both cases) so it wasn’t like I was trying to run an Intel binary on an ARM. I could always cross-compile from sources, but copying binaries is much easier…
I’ll demonstrate this issue with the “ls” program. Of course I tried to adopt something more worthy.
It was just like (where the current directory’s “ls” is the binary belonging to the other distro)
# ./ls
-bash: ./ls: No such file or directory
or sometimes (depends on the distribution) it says
$ ./ls
-sh: ./ls: not found
or when attempting to run with bash:
$ bash ./ls
./ls: ./ls: cannot execute binary file
Attempting to set LD_DEBUG=all was pointless, because the error was earlier on. Strace gave an idea:
$ strace ./ls
execve("./ls", ["./ls"], [/* 13 vars */]) = -1 ENOENT (No such file or directory)
dup(2) = 3
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aac9000
_llseek(3, 0, 0x7efca940, SEEK_CUR) = -1 ESPIPE (Illegal seek)
write(3, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory
) = 40
close(3) = 0
munmap(0x2aac9000, 4096) = 0
exit_group(1) = ?
So execve() returns ENOENT even though the file exists. Which means, in this case, that the file is there but the kernel refuses to run it.
The reason
The crucial difference between the alien “ls” and the native one, is the where they expect to find their loader:
$ readelf -l /bin/ls
Elf file type is EXEC (Executable file)
Entry point 0xcb84
There are 7 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
EXIDX 0x093b4c 0x0009bb4c 0x0009bb4c 0x00110 0x00110 R 0x4
PHDR 0x000034 0x00008034 0x00008034 0x000e0 0x000e0 R E 0x4
INTERP 0x000114 0x00008114 0x00008114 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.3]
LOAD 0x000000 0x00008000 0x00008000 0x93c60 0x93c60 R E 0x8000
LOAD 0x094000 0x000a4000 0x000a4000 0x007bd 0x02a88 RW 0x8000
DYNAMIC 0x09400c 0x000a400c 0x000a400c 0x000f0 0x000f0 RW 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00 .ARM.exidx
01
02 .interp
03 .interp .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.extab .ARM.exidx .eh_frame
04 .init_array .fini_array .jcr .dynamic .got .data .bss
05 .dynamic
06
$ readelf -l ./ls
Elf file type is EXEC (Executable file)
Entry point 0xb6d9
There are 9 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
EXIDX 0x00fce8 0x00017ce8 0x00017ce8 0x00030 0x00030 R 0x4
PHDR 0x000034 0x00008034 0x00008034 0x00120 0x00120 R E 0x4
INTERP 0x000154 0x00008154 0x00008154 0x00027 0x00027 R 0x1
[Requesting program interpreter: /lib/arm-linux-gnueabihf/ld-linux.so.3]
LOAD 0x000000 0x00008000 0x00008000 0x0fd1c 0x0fd1c R E 0x8000
LOAD 0x00fee4 0x0001fee4 0x0001fee4 0x003e4 0x01050 RW 0x8000
DYNAMIC 0x00fef0 0x0001fef0 0x0001fef0 0x00110 0x00110 RW 0x4
NOTE 0x00017c 0x0000817c 0x0000817c 0x00044 0x00044 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
GNU_RELRO 0x00fee4 0x0001fee4 0x0001fee4 0x0011c 0x0011c R 0x1
Section to Segment mapping:
Segment Sections...
00 .ARM.exidx
01
02 .interp
03 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.exidx .eh_frame
04 .init_array .fini_array .jcr .dynamic .got .data .bss
05 .dynamic
06 .note.ABI-tag .note.gnu.build-id
07
08 .init_array .fini_array .jcr .dynamic
Aha! When the native “ls” is executed, the kernel loads /lib/ld-linux.so.3 which in turn executes the required executable. When the alien “ls” was attempted, the kernel went for /lib/arm-linux-gnueabihf/ld-linux.so.3, couldn’t find it and returned “no such file”. It actually means that it didn’t find the interpreter binary (i.e. the glibc dynamic library loader).
The Solution
Create a symlink from where the executable expects the loader to where it actually is. In this case
# mkdir /lib/arm-linux-gnueabihf
# cd /lib/arm-linux-gnueabihf
# ln -s /lib/ld-linux.so.3
It’s of course quite likely that some library binaries will need to be copied along with the executable. LD_DEBUG or ldd may be helpful here, as well as “readelf -d” if there’s no ldd.
Changing the dynamic linker when compiling
Sometimes it’s possible to go the other way around: Tell gcc to pick a certain dynamic linker.
But first, to see which loader a program compiled with gcc will expect, add the -v flag in the compilation command, e.g.
$ gcc -v -O3 -Wall tryexec.c -o tryexec
and look for the -dynamic-linker flag in COLLECT_GCC_OPTIONS (could be, for example, /lib64/ld-linux-x86-64.so.2).
To change the choice of linker, pass an argument to the linker through gcc with the -Wl flag:
$ gcc -O3 -Wl,-I/lib/ld-linux.so.3 -Wall tryexec.c -o tryexec
What comes after the comma of the -Wl flag goes to the linker, so -Wl,-I/lib/ld-linux.so.3 passes “-I/lib/ld-linux.so.3″ to ld, which does the job.
Those using Eclipse (Xilinx SDK included) can add the flag in the project C/C++ Build Settings > Tool Settings > ARM Linux gcc linker > Miscellaneous > Linker Flags (write e.g. “-Wl,-I/lib/myloader.so”, without the quotes, in the text box).
Introduction
These are my rather messy notes as I set up a wireless access point on my desktop (Fedora 12) running a home-compiled 3.12.20 Linux kernel. Somewhere below (see “Rubbish starts here”) I’ve added things that I tried out but lead nowhere. Beware.
I began with two USB dongles, 8188EU and 8192CU. I got 8188EU up and running with Realtek’s hostapd and driver, but only for the 2.4 GHz band. So I bought a RaLink-based dual-band USB dongle, and ran it with the kernel’s built-in driver and an updated version of hostapd (it’s hardware neutral however). If you want it, search E-Bay for “300m USB Wifi dual band”. It should look like this, and cost some $15 or so:
This dongle is what I ended up using. You may skip to “Dual-band dongle” below if you don’t care about the other things I tried out before I chose this one.
The purpose is a manual setup for occasional use. There are plenty of similar writeouts, like this one.
It’s very easy to get mixed up with all those do-this-do-that howtos, and forget one simple fact: A wireless NIC is just another Ethernet card that happens not to have a cable. The authentication of a wireless link takes place with plain Ethernet packets, and once the two sides agree on talking with each other, it’s back to two Ethernet cards with a cross cable.
To make a machine serve as an access point, the NIC must support Master mode, and there must be software running that plays the role of authenticating clients and setting up encryption. But in the end of the day, that all there is to it. Linux’ daemon for doing this is hostapd.
The swiss army knives are “iw“, “iwconfig” and “iwlist”. Try “iw help” in particular.
In short
- Plug in device — driver autoloads
- Bring up the device with ifconfig (assign an IP address)
- Switch regulation region, if the 5 GHz band is required (and the device reports old and over-restrictive regulation rules):
# iw reg set GD
- Restart dhcpd, so that it listens for requests on wlan0
- Start hostapd
Realtek vs. community
There are two completely different takes on getting the Wifi working. One is to use the tools that are maintained by the community: The hostapd that arrives along with distributions, and the drivers compiled in the kernel. Well, as of June 2014, that’s not a go with Realtek’s USB Wifi dongles.
The thing is that the typical distribution hostapd expects to find the kernel’s native interface, which is implemented in the cfg80211 and mac80211 kernel modules. These modules are supposed to talk with the low-level hardware drivers. Very structured and nice. Only hi-tec companies don’t always play ball with the kernel community.
Realtek, in this case, chose to compile together everything, including the higher level frontend source code, and make a single kernel module of that. Kinda makes sense when all you need is a single driver for your specific hardware (a bit like static linking of a program), but not when that hardware is just one of many to be supported.
For example, the kernel’s 8192CU driver (appears as rtl8192cu on lsmod with ~79kB) relies on the kernel’s low-level modules (which are mac80211. cfg80211, rtl8192c_common, rtl_usb, rtlwifi), but the Realtek driver has everything in a single module, which appears as 8192cu and takes ~526kB.
Now to hostapd: The distribution’s version are minded on the kernel’s native interface (“driver=nl80211″) with some partial support for Realtek’s drivers (“driver=rtl871x”), so all in all, if you use Realtek’s kernel drivers, use their hostapd as well.
My chosen solution (well, no-other-choice solution) was to compile the Realtek’s kernel modules and hostapd. With slight variations.
So first is a summary of commands when things finally work, and then the battle field (compilations from sources etc.).
ifconfig
This is necessary for the already running DHCP daemon to answer requests from wireless clients. This ifconfig command is also the moment at which the firmware is loaded (and not when the driver loads, as one could expect).
Important: Remember that routing rules apply like any Ethernet card, so don’t pick an IP address space that is already accounted for in the access point’s routing table. Doing that mistake will not just make pings fail, but the access point will also ignore ARP requests (see below).
# ifconfig wlan0 10.10.0.1 netmask 255.255.255.0
# service dhcpd restart
Starting hostapd
# service hostapd start
or running in the foreground, with a lot of debug output
# hostapd -dd /etc/hostapd/hostapd.conf
Note that when hostapd is running in the foreground and is stopped with CTRL-C, unplugging and replugging the device may be necessary before re-attempting to work with it.
What happens if you pick a bad IP address
For some reason, I had the silly idea that since my internal LAN’s subnet is 10.1.0.0/16, I should assign my wlan0 card the address 10.1.1.123, so it will natively belong to the LAN. What I didn’t realize was that another NIC is already assigned for handling 10.1.0.0/16, so wlan0 will never get packets routed to it.
Even worse, the wireless adapter will not answer to ARP requests, which kinda makes sense — the wireless adapter “knows” that it can’t work with the IP address it has, so it might as well not announce any IP connectivity. The interesting thing was that ping requests were ignored completely as well. It’s not like the replies went out on NIC to which the IP subnet belongs. There was no reply packet at all. Which again, makes sense, because pings are not supposed to go out on another NIC. That could potentially confuse someone into thinking that the link is OK (in case there was a way for the reply to reach the requester).
In grey, with a line-over, here is the description of the problem, as I saw it before I solved it. Just in case someone is stuck in the same situation.
At this point, I can connect to the Access Point from Windows XP (even with a client having poor WPA support) as well as Linux with seemingly no problem. But there’s no real internet access. The reason seems to be, that the USB dongle doesn’t seem to be connected with its IP protocol layer. Ethernet packets go through well, as can be seen in sniff dumps on both sides, and the client manages to acquire an address with DHCP, because it depends only on plain MAC packets.
Despite setting an address with ifconfig (or “ip address add” for that matter), the dongle doesn’t respond to ARP requests asking for the address it has, and doesn’t respond to pings.
ARP packets are sent properly from the dongle (acting as AP) and the responses from the client arrive fine as well (when asking for the address of the client’s Wifi NIC as well as another wired Ethernet NIC, both are answered).
# arping -I wlan0 10.1.1.166
ARPING 10.1.1.166 from 10.1.1.123 wlan0
Unicast reply from 10.1.1.166 [00:0E:2E:40:5B:11] 48.329ms
Unicast reply from 10.1.1.166 [00:0E:2E:40:5B:11] 80.612ms
Unicast reply from 10.1.1.166 [00:0E:2E:40:5B:11] 104.531ms
but not on the other way (from the client):
# arping -I wlan0 10.1.1.123
(nothing happens)
Now, if the access point sends a gratuitous ARP to the client:
# arping -A -I wlan0 10.1.1.123
the client can send ping packets to the access point. These ICMP packets appear in the sniff dump of wlan0 on both sides, but the access point doesn’t reply. So did pinging to the broadcast address. The packets were seen at the access point’s sniff dumps with all 0xff’s MAC address, but with no response:
# ping -b 10.1.255.255
This is not a firewall issue. The problem remains with the firewall taken down. Both USB dongles have this same problem.
Compiling Realtek’s driver for RTL8188EU
Possible reason why this is necessary: The USB device is V2.0 according to the package, and the newer version contains firmware. Anyhow,
$ git clone https://github.com/lwfinger/rtl8188eu.git
A plain “make” compiled the code cleanly on kernel 3.12.20 (using commit ID 63fe7cda86c2830d66335026efde7472c10bc5c2). Copy firmware (also in Git bundle):
# cp rtl8188eufw.bin /lib/firmware/rtlwifi/
(well, I ended up doing “make install”. After removing the existing driver from the staging subdirectory).
Compiling Realtek’s driver for RTL8192CU
Following this guide, went to Realtek’s site, and download something like RTL8188C_8192C_USB_linux_v4.0.2_9000.20130911.zip (ZIP??!), untarred wpa_supplicant_hostapd-0.8_rtw_r7475.20130812.tar.gz.
Tried to compile from this zip file (under “driver”). Compilation failed against my kernel (3.12) on the change of the “create_proc_entry” API. So instead, I went for
$ git clone https://github.com/pvaret/rtl8192cu-fixes.git
and compiled cleanly from commit ID f0dfbb46a891820b27942ba3e213af83f2452957.
Compiling and running Realtek’s hostapd
From the zip file that I downloaded from Realtek, went to the hostapd subdirectory in wpa_supplicant_hostapd/, and typed “make”. Compiled cleanly, and generated a “hostapd” and “hostapd_cli” executables. Yey.
And that actually worked! Note that the rtl871x driver is picked even though the “driver=” isn’t assigned at all in hostapd.conf.
# hostapd -d /etc/hostapd/hostapd.conf
random: Trying to read entropy from /dev/random
Configuration file: /etc/hostapd/hostapd.conf
ctrl_interface_group=0
eapol_version=1
drv->ifindex=35
l2_sock_recv==l2_sock_xmit=0x0x1203be0
BSS count 1, BSSID mask 00:00:00:00:00:00 (0 bits)
Completing interface initialization
Mode: IEEE 802.11g Channel: 4 Frequency: 2427 MHz
RATE[0] rate=10 flags=0x1
RATE[1] rate=20 flags=0x1
RATE[2] rate=55 flags=0x1
RATE[3] rate=110 flags=0x1
RATE[4] rate=60 flags=0x0
RATE[5] rate=90 flags=0x0
RATE[6] rate=120 flags=0x0
RATE[7] rate=180 flags=0x0
RATE[8] rate=240 flags=0x0
RATE[9] rate=360 flags=0x0
RATE[10] rate=480 flags=0x0
RATE[11] rate=540 flags=0x0
Flushing old station entries
Deauthenticate all stations
+rtl871x_sta_deauth_ops, ff:ff:ff:ff:ff:ff is deauth, reason=2
rtl871x_set_key_ops
rtl871x_set_key_ops
rtl871x_set_key_ops
rtl871x_set_key_ops
Using interface wlan0 with hwaddr c0:4a:00:18:ef:21 and ssid 'ocho'
Deriving WPA PSK based on passphrase
SSID - hexdump_ascii(len=4):
6f 63 68 6f ocho
PSK (ASCII passphrase) - hexdump_ascii(len=9): [REMOVED]
PSK (from passphrase) - hexdump(len=32): [REMOVED]
rtl871x_set_wps_assoc_resp_ie
rtl871x_set_wps_beacon_ie
rtl871x_set_wps_probe_resp_ie
urandom: Got 20/20 bytes from /dev/urandom
GMK - hexdump(len=32): [REMOVED]
Key Counter - hexdump(len=32): [REMOVED]
WPA: group state machine entering state GTK_INIT (VLAN-ID 0)
GTK - hexdump(len=32): [REMOVED]
WPA: group state machine entering state SETKEYSDONE (VLAN-ID 0)
rtl871x_set_key_ops
rtl871x_set_beacon_ops
rtl871x_set_hidden_ssid ignore_broadcast_ssid:0, ocho,4
rtl871x_set_acl
wlan0: Setup of interface done.
But with WPA authentication enabled, I got a lot of
hostapd: wlan0: STA 00:0e:2e:40:5b:94 IEEE 802.11: associated
hostapd: wlan0: STA 00:0e:2e:40:5b:94 IEEE 802.11: deauthenticated due to local deauth request
hostapd: wlan0: STA 00:0e:2e:40:5b:94 IEEE 802.11: disassociated
It was also evident sniffing wlan0 that EAPOL WPA key (254) frames were sent to the client, but they didn’t get answered, which is probably the reason for the whole thing, as mentioned on this page.
The solution was to restrict the protocol to version 1 with
eapol_version=1
in hostapd.conf. This problem occurred only when I used the RT2500 utility on the Windows laptop. Using Windows XP’s native wireless selection tool connected well either way.
8192CU is single band. Really.
I tried to work with the 8192CU dongle, because it supposedly supports the 5 GHz band as well. The 2.4 GHz is heavily crowded. I don’t know why I got the impression that it’s dual-band. Anyhow,
# cp 8192cu.ko /lib/modules/$(uname -r)/kernel/drivers/net/wireless/
# depmod -a
and also blacklist the kernel’s native driver by adding the following lines to /etc/modprobe.d/blacklist.conf
# Native Wifi drivers not usable as accept points
blacklist rtl8192cu
blacklist rtl8192c_common
To see the list of channels:
$ iwlist wlan0 freq
Darn, only 2.4 GHz! It even says so on Realtek’s site: “Complete 802.11n MIMO solution for 2.4GHz band” and “Single-Band 11n (2x2) WLAN USB Dongle”.
Besides, the signal it transmits appears to be really lousy. I got a really bad link quality (but hey, this is a cheapo dongle from Ebay).
Compiling hostapd from the sources
First, install libnl-devel, which is required for compiling hostapd:
# yum install libnl-devel
Download from the hostapd’s main page, copy the config file and compile:
$ git clone git://w1.fi/srv/git/hostap.git
$ cd hostap/hostapd
$ git checkout hostap_2_2
$ cp defconfig .config
$ make
Dual-band dongle
Plugged in an MediaTek (formerly RaLink) RT5572-based no-brand dongle (0x148f/0x5572) into my computer with kernel 3.12. Was detected right away. “iw list” gave a long answer, so revert to the original hostapd, and pick driver=nl80211. The driver handling it was rt2800usb, along with its dependencies, rt2800usb, rt2x00usb, rt2x00lib, mac80211 and cfg80211.
The Linux drivers MediaTek’s site were last updated in 2010, supporting kernel 2.4.0, but the rt2800usb driver seems to be maintained properly with occasional patches. So it looks like the kernel’s built-in driver is the best choice. The RT5572 was added in March 2013 to kernel 3.10.
Attempted to run hostapd, it said
# hostapd -dd /etc/hostapd/hostapd.conf
Configuration file: /etc/hostapd/hostapd.conf
ctrl_interface_group=0
eapol_version=1
ioctl[SIOCSIFFLAGS]: No such file or directory
nl80211 driver initialization failed.
wlan1: Unable to setup interface.
rmdir[ctrl_interface]: No such file or directory
That wasn’t very helpful, but looking at the system log was:
ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
ieee80211 phy0: rt2x00lib_request_firmware: Error - Failed to request Firmware
Ah, yes. A firmware file. Taken from the Linux Firmware Git repo,
# cp rt2870.bin /lib/firmware/
(note that it’s NOT to rtlwifi. The is RaLink, not RealTek).
At which point I got a lot of output from hostapd -dd, but it ended with
Could not set DTIM period for kernel driver
This seems to be an hostapd issue (I ran 0.6.9), as the driver is stable. Compiling hostapd-2.2 solved this (see just above), and the dongle works nicely as an access point.
Access point at 5 GHz
The whole point with this dual-band dongle was to run the access point at 5 GHz, and avoid all the noise from my neighbors. But alas, requesting a 5 GHz channel with hostapd -dd, says, somewhere in the middle:
channel [40] (157) is disabled for use in AP mode, flags: 0x1
wlan1: IEEE 802.11 Configured channel (157) not found from the channel list of current mode (2) IEEE 802.11a
wlan1: IEEE 802.11 Hardware does not support configured channel
Could not select hw_mode and channel. (-3)
wlan1: interface state UNINITIALIZED->DISABLED
wlan1: AP-DISABLED
wlan1: Unable to setup interface.
Hmmm… I failed twice here. The frequency isn’t allowed in Israel, and the 5 GHz band is blocked altogether.
Indeed,
$ iw list
Wiphy phy2
Band 1:
Capabilities: 0x2f2
[...]
Frequencies:
* 2412 MHz [1] (20.0 dBm)
* 2417 MHz [2] (20.0 dBm)
* 2422 MHz [3] (20.0 dBm)
* 2427 MHz [4] (20.0 dBm)
* 2432 MHz [5] (20.0 dBm)
* 2437 MHz [6] (20.0 dBm)
* 2442 MHz [7] (20.0 dBm)
* 2447 MHz [8] (20.0 dBm)
* 2452 MHz [9] (20.0 dBm)
* 2457 MHz [10] (20.0 dBm)
* 2462 MHz [11] (20.0 dBm)
* 2467 MHz [12] (20.0 dBm)
* 2472 MHz [13] (20.0 dBm)
* 2484 MHz [14] (disabled)
Bitrates (non-HT):
* 1.0 Mbps
* 2.0 Mbps (short preamble supported)
* 5.5 Mbps (short preamble supported)
* 11.0 Mbps (short preamble supported)
* 6.0 Mbps
* 9.0 Mbps
* 12.0 Mbps
* 18.0 Mbps
* 24.0 Mbps
* 36.0 Mbps
* 48.0 Mbps
* 54.0 Mbps
Band 2:
Capabilities: 0x2f2
HT20/HT40
[...]
Frequencies:
* 5180 MHz [36] (disabled)
* 5190 MHz [38] (disabled)
* 5200 MHz [40] (disabled)
* 5210 MHz [42] (disabled)
* 5220 MHz [44] (disabled)
* 5230 MHz [46] (disabled)
* 5240 MHz [48] (disabled)
* 5250 MHz [50] (disabled)
* 5260 MHz [52] (disabled)
* 5270 MHz [54] (disabled)
* 5280 MHz [56] (disabled)
* 5290 MHz [58] (disabled)
* 5300 MHz [60] (disabled)
* 5310 MHz [62] (disabled)
* 5320 MHz [64] (disabled)
* 5500 MHz [100] (disabled)
* 5510 MHz [102] (disabled)
* 5520 MHz [104] (disabled)
* 5530 MHz [106] (disabled)
* 5540 MHz [108] (disabled)
* 5550 MHz [110] (disabled)
* 5560 MHz [112] (disabled)
* 5570 MHz [114] (disabled)
* 5580 MHz [116] (disabled)
* 5590 MHz [118] (disabled)
* 5600 MHz [120] (disabled)
* 5610 MHz [122] (disabled)
* 5620 MHz [124] (disabled)
* 5630 MHz [126] (disabled)
* 5640 MHz [128] (disabled)
* 5650 MHz [130] (disabled)
* 5660 MHz [132] (disabled)
* 5670 MHz [134] (disabled)
* 5680 MHz [136] (disabled)
* 5690 MHz [138] (disabled)
* 5700 MHz [140] (disabled)
* 5745 MHz [149] (disabled)
* 5755 MHz [151] (disabled)
* 5765 MHz [153] (disabled)
* 5775 MHz [155] (disabled)
* 5785 MHz [157] (disabled)
* 5795 MHz [159] (disabled)
* 5805 MHz [161] (disabled)
* 5825 MHz [165] (disabled)
* 4920 MHz [-16] (disabled)
* 4940 MHz [-12] (disabled)
* 4960 MHz [-8] (disabled)
* 4980 MHz [-4] (disabled)
Bitrates (non-HT):
* 6.0 Mbps
* 9.0 Mbps
* 12.0 Mbps
* 18.0 Mbps
* 24.0 Mbps
* 36.0 Mbps
* 48.0 Mbps
* 54.0 Mbps
[...]
Are you kidding me? Disabled? Well, no wonder. The kernel thinks 5 GHz is disallowed in Israel:
$ iw reg get
country IL:
(2402 - 2482 @ 40), (N/A, 20)
Where did it get that from? A peek on dmesg reveals the answer:
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211: (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
usb 5-1.4: reset full-speed USB device number 9 using uhci_hcd
ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 5592, rev 0222 detected
ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 000f detected
ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
usbcore: registered new interface driver rt2800usb
cfg80211: Calling CRDA for country: IL
cfg80211: Regulatory domain changed to country: IL
cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211: (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
The thing is that according to Israel’s local regulations, the lower 5 GHz band is allowed for indoor use. My initial choice of channel 157 is probably illegal in Israel (see Wikipedia’s list). But hey, some channels are still open on the 5 GHz band! It’s also interesting to note that some of 5 GHz channels that are banned for Wifi are allowed for amateur radio (also see this and this).
As the regulations for each country is taken from some ROM on the hardware device itself, it’s probably outdated.
The ugly solution is to switch regulation country. For example, Granada has a relatively relaxed setting:
# iw reg set GD
A full list of these country codes can be found here. “BO” (for Bolivia) is also worth a try.
Now the responsibility is on me to pick a legal frequency. For example, anywhere between 36-48.
Rubbish starts here
From this point on, it’s just random stuff that I tried out, and didn’t lead anywhere. But since I write as I work, why delete it? Maybe it helps someone as is.
Plugging in a TL-WN725N before switching to Realtek’s drivers
usb 2-2.2: Product: 802.11n NIC
usb 2-2.2: Manufacturer: Realtek
usb 2-2.2: SerialNumber: 00E04C0001
r8188eu: module is from the staging directory, the quality is unknown, you have been warned.
Chip Version Info: CHIP_8188E_Normal_Chip_TSMC_D_CUT_1T1R_RomVer(0)
usbcore: registered new interface driver r8188eu
Check if it’s ready to be an access point:
# iwconfig wlan0 mode master
# iwconfig wlan0
wlan0 unassociated Nickname:"<WIFI@REALTEK>"
Mode:Master Frequency=2.412 GHz Access Point: Not-Associated
Sensitivity:0/0
Retry:off RTS thr:off Fragment thr:off
Encryption key:off
Power Management:off
Link Quality:0 Signal level:0 Noise level:0
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0
OK, so it is. :)
But this doesn’t seem very good:
# iw list
nl80211 not found.
And here comes a bit of nonsense that was fixed by compiling software from sources, as shown below.
Fixed with
# modprobe mac80211
Installing the access point daemon:
# yum install hostapd
Running manually for a test:
# hostapd -dd /etc/hostapd/hostapd.conf
Configuration file: /etc/hostapd/hostapd.conf
ctrl_interface_group=10 (from group name 'wheel')
nl80211 not found.
nl80211 driver initialization failed.
wlan0: Unable to setup interface.
Tried second dongle (the I bought cheap from Ebay)
usb 2-2.2: New USB device found, idVendor=0bda, idProduct=8176
usb 2-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-2.2: Product: 802.11n WLAN Adapter
usb 2-2.2: Manufacturer: Realtek
usb 2-2.2: SerialNumber: 00e04c000001
rtl8192cu: Chip version 0x10
rtl8192cu: MAC address: 00:13:ef:40:08:98
rtl8192cu: Board Type 0
rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
usbcore: registered new interface driver rtl8192cu
rtlwifi: Loading alternative firmware rtlwifi/rtl8192cufw.bin
rtlwifi: Firmware rtlwifi/rtl8192cufw_TMSC.bin not available
OK, OK, take the firmware!
# mkdir /lib/firmware/rtlwifi
# cp rtl8192cufw.bin /lib/firmware/rtlwifi/
Unplug-replug. This one went much better:
usb 2-2.2: New USB device found, idVendor=0bda, idProduct=8176
usb 2-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-2.2: Product: 802.11n WLAN Adapter
usb 2-2.2: Manufacturer: Realtek
usb 2-2.2: SerialNumber: 00e04c000001
rtl8192cu: Chip version 0x10
rtl8192cu: MAC address: 00:13:ef:40:08:98
rtl8192cu: Board Type 0
rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
rtlwifi: Loading alternative firmware rtlwifi/rtl8192cufw.bin
ieee80211 phy1: Selected rate control algorithm 'rtl_rc'
rtlwifi: wireless switch is on
cfg80211: Calling CRDA for country: IL
cfg80211: Regulatory domain changed to country: IL
cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211: (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
but
# hostapd /etc/hostapd/hostapd.conf
ioctl[SIOCSIFFLAGS]: Unknown error 132
nl80211 driver initialization failed.
rmdir[ctrl_interface]: No such file or directory
Newer hostapd
Stole the binaries from Fedora 20, including a set of necessary libraries, and created a chroot for that as follows:
# chroot . /hostapd -d /hostapd.conf
With the Ebay dongle, the AP was visible from my laptop, but I failed to connect. Nothing appears on sniffing wlan1, and strace shows nothing happens during these connection attempts, so the conclusion must be that the problem is with the dongle.
So I found the first firmware the driver was checking for,
usb 2-2.3: New USB device found, idVendor=0bda, idProduct=8176
usb 2-2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-2.3: Product: 802.11n WLAN Adapter
usb 2-2.3: Manufacturer: Realtek
usb 2-2.3: SerialNumber: 00e04c000001
rtl8192cu: Chip version 0x10
rtl8192cu: MAC address: 00:13:ef:40:08:98
rtl8192cu: Board Type 0
rtl_usb: rx_max_size 15360, rx_urb_num 8, in_ep 1
rtl8192cu: Loading firmware rtlwifi/rtl8192cufw_TMSC.bin
ieee80211 phy7: Selected rate control algorithm 'rtl_rc'
rtlwifi: wireless switch is on
rtl8192cu: MAC auto ON okay!
rtl8192cu: Tx queue select: 0x05
Didn’t make any difference.
Creating a bridge
This is the really manual route, based upon this page.
Basically,
# brctl addbr br0
# brctl setfd br0 0
# brctl addif br0 eth0
# brctl addif br0 wlan0
# ifconfig br0 10.1.1.123 netmask 255.255.255.0
# ifconfig br0 up
The second command sets the forward delay to zero, to prevent problems on the first connection, as mentioned on this page.
One can take a look on the status with
# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.00241dd37e38 no eth0
wlan0
To remove the bridge:
# ifconfig br0 down
# brctl delbr br0
Introduction
I’ve been annoyed for quite a while by Thunderbird’s strong inclination towards HTML mail. To the extent that if I don’t really, really verify that a mail goes out in plain text, it’s probably going to slip out in HTML. This is bad in particular when sending mails to Linux-related mailing lists. They don’t like it. And the truth is that I’m not very fond of them either, but I usually don’t care.
There’s an add-on for this, Outgoing Message Format, but I run a version of Thunderbird that is too old for that, and trying to fool Thunderbird into installing it by changing the add-on’s version requirement field ended up with an add-on that does nothing.
Upgrading was not an attractive direction: If I’m happy with a tool except for one thing, I’ll fix that thing. Upgrading tends to fix that thing but create a new problem. On a good day.
It turned out to be extremely difficult to convince Thunderbird stopping with that. My notes while trying below.
Note to self: To find the entire hack history, search your “Sent” box for “Thunderbird plain text hacks” in the subject.
Remove the HTML composition capability completely
Ths method makes it impossible for a certain mail identity to compose HTML mails. Go to Preferences > General > Config Editor… and agree to be careful.
mail.identity.id1.compose_html: Set from true to false.
In internal JavaScript code, these preferences are fetched with getPref() commands.
Fixing Thunderbird from within
After wasting a lot of time on this, I reached the conclusion, that the problem was that quite a few components in Thunderbird’s script environment push the HTML format for various reasons. These are apparently ugly hacks that solved a problem for someone in the far past, and remained there, because noone noticed them or understood exactly what they do, possibly including whoever wrote them in the first place.
The solution was a counter-hack. Basically, hide the relevant menu’s IDs from other scripts and set the default to “Plain text”. This requires opening a JAR, making a few fixes in a couple of files, and packing it up again.
So let’s get to it. In a fresh directory,
$ jar xf /usr/lib64/thunderbird-3.0/chrome/messenger.jar
and edit ./content/messenger/messengercompose/messengercompose.xul, in the part saying
<menu id="outputFormatMenu" label="&outputFormatMenu.label;" accesskey="&outputFormatMenu.accesskey;" oncommand="OutputFormatMenuSelect(event.target)">
<menupopup id="outputFormatMenuPopup">
<menuitem type="radio" name="output_format" label="&autoFormatCmd.label;" accesskey="&autoFormatCmd.accesskey;" id="format_auto" checked="true"/>
<menuitem type="radio" name="output_format" label="&plainTextFormatCmd.label;" accesskey="&plainTextFormatCmd.accesskey;" id="format_plain"/>
<menuitem type="radio" name="output_format" label="&htmlFormatCmd.label;" accesskey="&htmlFormatCmd.accesskey;" id="format_html"/>
<menuitem type="radio" name="output_format" label="&bothFormatCmd.label;" accesskey="&bothFormatCmd.accesskey;" id="format_both"/>
</menupopup>
</menu>
The idea is to hide the elements from any script, except the one that responds to changes in this menu. Also, change the default from “Auto detect” to “plain text”. After the change we have
<menu id="my_outputFormatMenu" label="&outputFormatMenu.label;" accesskey="&outputFormatMenu.accesskey;" oncommand="OutputFormatMenuSelect(event.target)">
<menupopup id="outputFormatMenuPopup">
<menuitem type="radio" name="output_format" label="&autoFormatCmd.label;" accesskey="&autoFormatCmd.accesskey;" id="my_format_auto"/>
<menuitem type="radio" name="output_format" label="&plainTextFormatCmd.label;" accesskey="&plainTextFormatCmd.accesskey;" id="my_format_plain" checked="true"/>
<menuitem type="radio" name="output_format" label="&htmlFormatCmd.label;" accesskey="&htmlFormatCmd.accesskey;" id="my_format_html"/>
<menuitem type="radio" name="output_format" label="&bothFormatCmd.label;" accesskey="&bothFormatCmd.accesskey;" id="my_format_both"/>
</menupopup>
</menu>
Note the “my_” prefixes on the IDs + that the “checked” attribute has moved.
This leaves a few changes in the only script that should deal with this, ./content/messenger/messengercompose/MsgComposeCommands.js: In
In ComposeStartup(),
document.getElementById("outputFormatMenu").setAttribute("hidden", true);
is replaced with
document.getElementById("my_outputFormatMenu").setAttribute("hidden", true);
and likewise, in OutputFormatMenuSelect()
if (msgCompFields)
switch (target.getAttribute('id'))
{
case "format_auto": gSendFormat = nsIMsgCompSendFormat.AskUser; break;
case "format_plain": gSendFormat = nsIMsgCompSendFormat.PlainText; break;
case "format_html": gSendFormat = nsIMsgCompSendFormat.HTML; break;
case "format_both": gSendFormat = nsIMsgCompSendFormat.Both; break;
}
is replaced with
if (msgCompFields)
switch (target.getAttribute('id'))
{
case "my_format_auto": gSendFormat = nsIMsgCompSendFormat.AskUser; break;
case "my_format_plain": gSendFormat = nsIMsgCompSendFormat.PlainText; break;
case "my_format_html": gSendFormat = nsIMsgCompSendFormat.HTML; break;
case "my_format_both": gSendFormat = nsIMsgCompSendFormat.Both; break;
}
Finally remove a single line that fiddles with the default (harmless now, but why leave it there…). In the definition of gComposeRecyclingListener, remove this line
document.getElementById("format_auto").setAttribute("checked", "true");
And that’s it.
and then repackage the Jar archive
$ jar cf messenger.jar content
Close Thunderbird, overwrite the original Jar file with the amended one (make a backup copy first, of course) and restart Thunderbird.
I should add, that there are several reasons to be surprised that this is enough. For example, while working on this, I noted that there are several direct calls to OutputFormatMenuSelect(), that attempt to fake a click on one of the HTML-enabling radio buttons. In the aftermath, plain text messages are generated even if this isn’t addressed directly.
Other stuff
During the process of figuring out how to solve this issue, I found a few tricks that may be useful in the future. So here they are
Open all jars you can find
$ find /usr/lib64/thunderbird-3.0/ -iname '*.jar' | while read i ; do ( mkdir "${i##*/}" && cd "${i##*/}" && jar xf "$i" ; ) done
This opens each jar in a directory holding its name (including the .jar suffix)
Set the default HTML format
mail.default_html_action: Set from 3 to 1. Seems not to have a significant effect.
Enabling the dump() command
dump() is used in internal Javascript code to produce debug messages, which are printed to stdout. This requires running Thunderbird from the command line.
In the Config Editor mentioned above, add the boolean browser.dom.window.dump.enabled and set it to true. Otherwise nothing is printed.
Creating stack traces
function DumpTrace()
{
var err = new Error();
dump("\nStack trace:\n" + err.stack + "\n\n");
}
The stack trace is pretty ugly, and contains a DumpTrace() too, but it’s good enough to find out why a certain function is called.