iowrite32(), writel() and memory barriers taken apart
Introduction
Needing to remove superfluous memory barriers from a Linux kernel device driver, I wondered what they actually do. The issue is discussed down to painful detail in Documentation/memory-barriers.txt, but somehow it’s quite difficult to figure out if they’re really needed and where. Most drivers rely on subsequent iowrite32′s (or writel’s) to arrive to the hardware in the same order they appear in the code, and this is backed up the following clause in memory-barriers.txt:
Inside of the Linux kernel, I/O should be done through the appropriate accessor routines – such as inb() or writel() – which know how to make such accesses appropriately sequential. Whilst this, for the most part, renders the explicit use of memory barriers unnecessary, there are a couple of situations where they might be needed:
- On some systems, I/O stores are not strongly ordered across all CPUs, and so for _all_ general drivers locks should be used and miowb() must be issued prior to unlocking the critical section.
- If the accessor functions are used to refer to an I/O memory window with relaxed memory access properties, then _mandatory_ memory barriers are required to enforce ordering.
See Documentation/DocBook/deviceiobook.tmpl for more information.
So what they’re saying is that a memory barrier should be used before releasing a lock (spinlock? mutex? both? The examples show only a spinlock) and when prefetching is allowed by hardware.
Nice. Are they doing anything?
April 2020 update: I’ve written a new post on a similar topic. Also, on top of memory-barriers.txt mentioned above, there are some excellent explanations in the kernel tree’s tools/memory-model/Documentation/explanation.txt and tools/memory-model/Documentation/recipes.txt. There are relatively new (from v4.17, beginning of 2018).
May 2021 update: I’ve also written the parallel post for Windows device driver coding, which occasionally brings up Linux.
The practical take
Since I care most about x86 and ARM, I decided to figure out what the memory barriers actually do. The driver’s code should be formally correct, but in the end, if I remove a memory barrier and then test the driver — have I really made a difference? Have I really tested anything?
Ah, and in case you wonder why I didn’t check ioread32() and readl(): I don’t use them in my driver. Odd as it may sound.
The kernel sources in this post are ~3.12 but how often does anyone dare touching those basic functions?
Spoiler
For the lazy ones, here are my conclusions:
- On x86 platforms, iowrite32() and writel() are translated to just a “mov” into memory.
- On ARM, the same functions translate into a full write synchronization barrier (stop execution until all previous writes are done), and then an “str” into memory.
- On x86, the following functions translate into nothing: mmiowb(), smp_wmb() and smp_rmb(). wmb() and rmb() translate into “sfence” and “lfence” respectively.
- On ARM, mmiowb() translates into nothing. The other barriers translate into sensible opcodes.
Trying memory barriers with iowrite32()
I wrote the following kernel module as minimodule.c. Obviously, it won’t do anything good except for being disassembled after compilation.
#include <linux/module.h> #include <linux/slab.h> #include <linux/io.h> void try_iowrite32(void) { void __iomem *p = (void *) 0x12345678; iowrite32(0xabcd0001, p); iowrite32(0xabcd0001, p); iowrite32(0xabcd0002, p); mmiowb(); iowrite32(0xabcd0003, p); wmb(); iowrite32(0xabcd0004, p); rmb(); iowrite32(0xabcd0005, p); smp_wmb(); iowrite32(0xabcd0006, p); smp_rmb(); } EXPORT_SYMBOL(try_iowrite32);
The idea: First repeat exactly the same write to see how that’s handled, and then add barriers to see what they turn into.
The related sources for iowrite32() on x86
I have to admit that I was surprised to find out that iowrite32() is a function in itself, as is shown later in the disassembly. My best understanding was that it’s just an alias for writel(), by virtue of a define statement. But since CONFIG_GENERIC_IOMAP is defined on my kernel, it’s not defined in include/asm-generic/io.h, but there’s just a header for it in include/asm-generic/iomap.h. It’s defined as a function in lib/iomap.c as follows:
void iowrite32(u32 val, void __iomem *addr) { IO_COND(addr, outl(val,port), writel(val, addr)); }
where IO_COND is previously defined in the same file as follows (the comment is in the sources):
/* * Ugly macros are a way of life. */ #define IO_COND(addr, is_pio, is_mmio) do { \ unsigned long port = (unsigned long __force)addr; \ if (port >= PIO_RESERVED) { \ is_mmio; \ } else if (port > PIO_OFFSET) { \ port &= PIO_MASK; \ is_pio; \ } else \ bad_io_access(port, #is_pio ); \ } while (0)
So there we have it. iowrite32() isn’t just an alias for writel(), but it checks the address and interprets it as port I/O if that makes sense.
To be sure, iowrite32() was disassembled as follows from the kernel’s object code (32-bit version):
0020f79f <iowrite32>: 20f79f: 81 fa ff ff 03 00 cmp $0x3ffff,%edx 20f7a5: 89 d1 mov %edx,%ecx 20f7a7: 76 03 jbe 20f7ac <iowrite32+0xd> 20f7a9: 89 02 mov %eax,(%edx) 20f7ab: c3 ret 20f7ac: 81 fa 00 00 01 00 cmp $0x10000,%edx 20f7b2: 76 08 jbe 20f7bc <iowrite32+0x1d> 20f7b4: 81 e2 ff ff 00 00 and $0xffff,%edx 20f7ba: ef out %eax,(%dx) 20f7bb: c3 ret 20f7bc: ba f2 56 03 00 mov $0x356f2,%edx 20f7c1: 89 c8 mov %ecx,%eax 20f7c3: e9 41 fe ff ff jmp 20f609 <bad_io_access>
Results on x86_64
Compiled on Intel x86/64 bit:
$ objdump -d minimodule.ko minimodule.ko: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <try_iowrite32>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: e8 00 00 00 00 callq 9 <try_iowrite32+0x9> 9: be 78 56 34 12 mov $0x12345678,%esi e: bf 01 00 cd ab mov $0xabcd0001,%edi 13: e8 00 00 00 00 callq 18 <try_iowrite32+0x18> 18: be 78 56 34 12 mov $0x12345678,%esi 1d: bf 01 00 cd ab mov $0xabcd0001,%edi 22: e8 00 00 00 00 callq 27 <try_iowrite32+0x27> 27: be 78 56 34 12 mov $0x12345678,%esi 2c: bf 02 00 cd ab mov $0xabcd0002,%edi 31: e8 00 00 00 00 callq 36 <try_iowrite32+0x36> 36: be 78 56 34 12 mov $0x12345678,%esi 3b: bf 03 00 cd ab mov $0xabcd0003,%edi 40: e8 00 00 00 00 callq 45 <try_iowrite32+0x45> 45: 0f ae f8 sfence 48: be 78 56 34 12 mov $0x12345678,%esi 4d: bf 04 00 cd ab mov $0xabcd0004,%edi 52: e8 00 00 00 00 callq 57 <try_iowrite32+0x57> 57: 0f ae e8 lfence 5a: be 78 56 34 12 mov $0x12345678,%esi 5f: bf 05 00 cd ab mov $0xabcd0005,%edi 64: e8 00 00 00 00 callq 69 <try_iowrite32+0x69> 69: be 78 56 34 12 mov $0x12345678,%esi 6e: bf 06 00 cd ab mov $0xabcd0006,%edi 73: e8 00 00 00 00 callq 78 <try_iowrite32+0x78> 78: c9 leaveq 79: c3 retq ...
Those “callq” statements are modified upon linking. To resolve what these are calling, go
$ readelf -r minimodule.ko Relocation section '.rela.text' at offset 0xa9b0 contains 8 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000005 002300000002 R_X86_64_PC32 0000000000000000 mcount - 4 000000000014 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4 000000000023 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4 000000000032 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4 000000000041 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4 000000000053 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4 000000000065 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4 000000000074 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
(the output continues with relocation information for debug variables).
It’s quite easy to work this out: The “Offset” column tells us the offset in the object code. For example, a callq statement begins at 0x13, but the address to call starts at 0x14. The second entry in the relocation section points at offset 0x14, and says that the target is iowrite32().
So from this output we learn that all callq’s are to iowrite32(), except the first one, which goes to mcount() (which is intended for kernel call tracing).
Now to conclusions: There are no memory barriers in the code, except those generated by wmb() and rmb(), which added sfence and lfence respectively. sfence is defined as
Performs a serializing operation on all store instructions that were issued prior the SFENCE instruction. This serializing operation guarantees that every store instruction that precedes in program order the SFENCE instruction is globally visible before any store instruction that follows the SFENCE instruction is globally visible. The SFENCE instruction is ordered with respect store instructions, other SFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to load instructions or the LFENCE instruction.
and lfence as
Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction. This serializing operation guarantees that every load instruction that precedes in program order the LFENCE instruction is globally visible before any load instruction that follows the LFENCE instruction is globally visible. The LFENCE instruction is ordered with respect to load instructions, other LFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to store instructions or the SFENCE instruction.
One can feel the Intel-headache just reading this.
Results on x86 (32 bit)
Compiling this against a 32-bit kernel, with a slightly different configuration:
$ objdump -d minimodule.ko minimodule.ko: file format elf32-i386 Disassembly of section .text: 00000000 <try_iowrite32>: 0: ba 78 56 34 12 mov $0x12345678,%edx 5: b8 01 00 cd ab mov $0xabcd0001,%eax a: e8 fc ff ff ff call b <try_iowrite32+0xb> f: ba 78 56 34 12 mov $0x12345678,%edx 14: b8 01 00 cd ab mov $0xabcd0001,%eax 19: e8 fc ff ff ff call 1a <try_iowrite32+0x1a> 1e: ba 78 56 34 12 mov $0x12345678,%edx 23: b8 02 00 cd ab mov $0xabcd0002,%eax 28: e8 fc ff ff ff call 29 <try_iowrite32+0x29> 2d: ba 78 56 34 12 mov $0x12345678,%edx 32: b8 03 00 cd ab mov $0xabcd0003,%eax 37: e8 fc ff ff ff call 38 <try_iowrite32+0x38> 3c: f0 83 04 24 00 lock addl $0x0,(%esp) 41: ba 78 56 34 12 mov $0x12345678,%edx 46: b8 04 00 cd ab mov $0xabcd0004,%eax 4b: e8 fc ff ff ff call 4c <try_iowrite32+0x4c> 50: f0 83 04 24 00 lock addl $0x0,(%esp) 55: ba 78 56 34 12 mov $0x12345678,%edx 5a: b8 05 00 cd ab mov $0xabcd0005,%eax 5f: e8 fc ff ff ff call 60 <try_iowrite32+0x60> 64: ba 78 56 34 12 mov $0x12345678,%edx 69: b8 06 00 cd ab mov $0xabcd0006,%eax 6e: e8 fc ff ff ff call 6f <try_iowrite32+0x6f> 73: f0 83 04 24 00 lock addl $0x0,(%esp) 78: c3 ret 79: 00 00 add %al,(%eax) ... Disassembly of section .altinstr_replacement: 00000000 <.altinstr_replacement>: 0: 0f ae f8 sfence 3: 0f ae e8 lfence 6: 0f ae e8 lfence $ readelf -r minimodule.ko Relocation section '.rel.text' at offset 0xc3e0 contains 7 entries: Offset Info Type Sym.Value Sym. Name 0000000b 00002402 R_386_PC32 00000000 iowrite32 0000001a 00002402 R_386_PC32 00000000 iowrite32 00000029 00002402 R_386_PC32 00000000 iowrite32 00000038 00002402 R_386_PC32 00000000 iowrite32 0000004c 00002402 R_386_PC32 00000000 iowrite32 00000060 00002402 R_386_PC32 00000000 iowrite32 0000006f 00002402 R_386_PC32 00000000 iowrite32
So it’s in essence the same, only the mcount() call in the beginning was skipped.
The related sources for iowrite32() on ARM
These are the key excerpts from arch/arm/include/asm/io.h:
static inline void __raw_writel(u32 val, volatile void __iomem *addr) { asm volatile("str %1, %0" : "+Qo" (*(volatile u32 __force *)addr) : "r" (val)); } ... #define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c) ... #define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); }) ... #define iowrite32(v,p) ({ __iowmb(); __raw_writel((__force __u32)cpu_to_le32(v), p); })
As for __iowmb(), it goes
/* IO barriers */ #ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE #include <asm/barrier.h> #define __iormb() rmb() #define __iowmb() wmb() #else #define __iormb() do { } while (0) #define __iowmb() do { } while (0) #endif
so it’s down to the configuration if __iowmb() does something. And to get the full picture, these are snips from arch/arm/include/asm/barrier.h:
#if __LINUX_ARM_ARCH__ >= 7 #define isb(option) __asm__ __volatile__ ("isb " #option : : : "memory") #define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory") #define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory") ... #ifdef CONFIG_ARCH_HAS_BARRIERS #include <mach/barriers.h> #elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP) #define mb() do { dsb(); outer_sync(); } while (0) #define rmb() dsb() #define wmb() do { dsb(st); outer_sync(); } while (0) #else #define mb() barrier() #define rmb() barrier() #define wmb() barrier() #endif
Results on ARM
This is what the same module compiled for ARM Cortex A9, Little Endian gives (I’ve added extra newlines in the middle for clarity):
minimodule.o: file format elf32-littlearm Disassembly of section .text: 00000000 <try_iowrite32>: 0: e92d4038 push {r3, r4, r5, lr} 4: f57ff04e dsb st 8: e59f2118 ldr r2, [pc, #280] ; 128 <try_iowrite32+0x128> c: e1a04002 mov r4, r2 10: e5923018 ldr r3, [r2, #24] 14: e3530000 cmp r3, #0 18: 0a000000 beq 20 <try_iowrite32+0x20> 1c: e12fff33 blx r3 20: e59f3104 ldr r3, [pc, #260] ; 12c <try_iowrite32+0x12c> 24: e59f1104 ldr r1, [pc, #260] ; 130 <try_iowrite32+0x130> 28: e5831678 str r1, [r3, #1656] ; 0x678 2c: f57ff04e dsb st 30: e5942018 ldr r2, [r4, #24] 34: e1a05001 mov r5, r1 38: e1a04003 mov r4, r3 3c: e3520000 cmp r2, #0 40: 0a000000 beq 48 <try_iowrite32+0x48> 44: e12fff32 blx r2 48: e5845678 str r5, [r4, #1656] ; 0x678 4c: f57ff04e dsb st 50: e59f20d0 ldr r2, [pc, #208] ; 128 <try_iowrite32+0x128> 54: e1a04002 mov r4, r2 58: e5923018 ldr r3, [r2, #24] 5c: e3530000 cmp r3, #0 60: 0a000000 beq 68 <try_iowrite32+0x68> 64: e12fff33 blx r3 68: e59f30bc ldr r3, [pc, #188] ; 12c <try_iowrite32+0x12c> 6c: e59f20c0 ldr r2, [pc, #192] ; 134 <try_iowrite32+0x134> 70: e5832678 str r2, [r3, #1656] ; 0x678 74: f57ff04e dsb st 78: e5942018 ldr r2, [r4, #24] 7c: e1a04003 mov r4, r3 80: e3520000 cmp r2, #0 84: 0a000000 beq 8c <try_iowrite32+0x8c> 88: e12fff32 blx r2 8c: e59f30a4 ldr r3, [pc, #164] ; 138 <try_iowrite32+0x138> 90: e5843678 str r3, [r4, #1656] ; 0x678 94: f57ff04e dsb st 98: e59f2088 ldr r2, [pc, #136] ; 128 <try_iowrite32+0x128> 9c: e1a04002 mov r4, r2 a0: e5923018 ldr r3, [r2, #24] a4: e3530000 cmp r3, #0 a8: 0a000000 beq b0 <try_iowrite32+0xb0> ac: e12fff33 blx r3 b0: f57ff04e dsb st b4: e5943018 ldr r3, [r4, #24] b8: e3530000 cmp r3, #0 bc: 0a000000 beq c4 <try_iowrite32+0xc4> c0: e12fff33 blx r3 c4: e59f3060 ldr r3, [pc, #96] ; 12c <try_iowrite32+0x12c> c8: e59f206c ldr r2, [pc, #108] ; 13c <try_iowrite32+0x13c> cc: e5832678 str r2, [r3, #1656] ; 0x678 d0: f57ff04f dsb sy d4: f57ff04e dsb st d8: e59f1048 ldr r1, [pc, #72] ; 128 <try_iowrite32+0x128> dc: e1a04003 mov r4, r3 e0: e1a05001 mov r5, r1 e4: e5912018 ldr r2, [r1, #24] e8: e3520000 cmp r2, #0 ec: 0a000000 beq f4 <try_iowrite32+0xf4> f0: e12fff32 blx r2 f4: e59f3044 ldr r3, [pc, #68] ; 140 <try_iowrite32+0x140> f8: e5843678 str r3, [r4, #1656] ; 0x678 fc: f57ff05a dmb ishst 100: f57ff04e dsb st 104: e5953018 ldr r3, [r5, #24] 108: e3530000 cmp r3, #0 10c: 0a000000 beq 114 <try_iowrite32+0x114> 110: e12fff33 blx r3 114: e59f3010 ldr r3, [pc, #16] ; 12c <try_iowrite32+0x12c> 118: e59f2024 ldr r2, [pc, #36] ; 144 <try_iowrite32+0x144> 11c: e5832678 str r2, [r3, #1656] ; 0x678 120: f57ff05b dmb ish 124: e8bd8038 pop {r3, r4, r5, pc} 128: 00000000 .word 0x00000000 12c: 12345000 .word 0x12345000 130: abcd0001 .word 0xabcd0001 134: abcd0002 .word 0xabcd0002 138: abcd0003 .word 0xabcd0003 13c: abcd0004 .word 0xabcd0004 140: abcd0005 .word 0xabcd0005 144: abcd0006 .word 0xabcd0006
This was a lot of code (somehow that’s what you get with ARM). There are no calls to iowrite32(), so this is done inline for ARM (consistent with the sources).
This requires some translation from ARM opcodes to human language (taken from this page):
- DSB SY — Data Synchronization Barrier: No instruction in program order after this instruction executes until all explicit memory accesses before this instruction complete, as well as all cache, branch predictor and TLB maintenance operations before this instruction complete.
- DSB ST — Like DSB SY, but waits only for data writes to complete.
- DMB ISHST — Data Memory Barrier, operation that waits only for stores to complete, and only to the inner shareable domain (whatever that “inner shareable domain” is).
- DMB ISH — Data Memory Barrier, operation that waits only to the inner shareable domain.
Now let’s decipher the assembly code, which is quite tangled. Luckily, it’s easy to spot the seven write operations as the seven “str” commands in the assembly code. It’s also easy to see that all each iowrite32() starts with an “dsb st” which forces waiting until previous writes has completed. So each iowrite32() spans from a “dsb st” to a “str”. This matches the definition of iowrite32() as __iowmb() and then __raw_writel(…).
The memory barriers are quite clear too:
- wmb() becomes “dsb st”, the full synchronization barrier for writes (which is also issued automatically before each iowrite32).
- rmb() becomes “dsb sy”, the full synchronization barrier for reads and writes
- smp_wmb() becomes “dmb ishst”, the “inner shareable domain” memory barrier for writes
- smp_rmb() becomes “dmb ish”, the “inner shareable domain” memory barrier for reads and writes
Now with writel()
So I through it would be nice to repeat all this with writel(). Spoiler: Nothing thrilling happens here.
Module code (includes omitted):
void try_writel(void) { void __iomem *p = (void *) 0x12345678; writel(0xabcd0001, p); writel(0xabcd0001, p); writel(0xabcd0002, p); mmiowb(); writel(0xabcd0003, p); wmb(); writel(0xabcd0004, p); rmb(); writel(0xabcd0005, p); smp_wmb(); writel(0xabcd0006, p); smp_rmb(); } EXPORT_SYMBOL(try_writel);
Assembly on 64-bit Intel:
minimodule.ko: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <try_writel>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: e8 00 00 00 00 callq 9 <try_writel+0x9> 9: b8 01 00 cd ab mov $0xabcd0001,%eax e: 89 04 25 78 56 34 12 mov %eax,0x12345678 15: 89 04 25 78 56 34 12 mov %eax,0x12345678 1c: b8 02 00 cd ab mov $0xabcd0002,%eax 21: 89 04 25 78 56 34 12 mov %eax,0x12345678 28: b8 03 00 cd ab mov $0xabcd0003,%eax 2d: 89 04 25 78 56 34 12 mov %eax,0x12345678 34: 0f ae f8 sfence 37: b8 04 00 cd ab mov $0xabcd0004,%eax 3c: 89 04 25 78 56 34 12 mov %eax,0x12345678 43: 0f ae e8 lfence 46: b8 05 00 cd ab mov $0xabcd0005,%eax 4b: 89 04 25 78 56 34 12 mov %eax,0x12345678 52: b8 06 00 cd ab mov $0xabcd0006,%eax 57: 89 04 25 78 56 34 12 mov %eax,0x12345678 5e: c9 leaveq 5f: c3 retq
OK, so writel() just translated into a couple of inline “mov” opcodes. There’s even an optimization between the first and second move, so %eax isn’t set twice. Hi-tec, I’m telling you.
And on 32-bit Intel:
minimodule.ko: file format elf32-i386 Disassembly of section .text: 00000000 <try_writel>: 0: b8 01 00 cd ab mov $0xabcd0001,%eax 5: a3 78 56 34 12 mov %eax,0x12345678 a: a3 78 56 34 12 mov %eax,0x12345678 f: b0 02 mov $0x2,%al 11: a3 78 56 34 12 mov %eax,0x12345678 16: b0 03 mov $0x3,%al 18: a3 78 56 34 12 mov %eax,0x12345678 1d: f0 83 04 24 00 lock addl $0x0,(%esp) 22: b0 04 mov $0x4,%al 24: a3 78 56 34 12 mov %eax,0x12345678 29: f0 83 04 24 00 lock addl $0x0,(%esp) 2e: b0 05 mov $0x5,%al 30: a3 78 56 34 12 mov %eax,0x12345678 35: b0 06 mov $0x6,%al 37: a3 78 56 34 12 mov %eax,0x12345678 3c: f0 83 04 24 00 lock addl $0x0,(%esp) 41: c3 ret ... Disassembly of section .altinstr_replacement: 00000000 <.altinstr_replacement>: 0: 0f ae f8 sfence 3: 0f ae e8 lfence 6: 0f ae e8 lfence
And for ARM, it’s exactly the same code (to the byte) as iowrite32() is an alias for writel(). But I listed it here anyhow for those who don’t take my word for it:
minimodule.o: file format elf32-littlearm Disassembly of section .text: 00000000 <try_writel>: 0: e92d4038 push {r3, r4, r5, lr} 4: f57ff04e dsb st 8: e59f2118 ldr r2, [pc, #280] ; 128 <try_writel+0x128> c: e1a04002 mov r4, r2 10: e5923018 ldr r3, [r2, #24] 14: e3530000 cmp r3, #0 18: 0a000000 beq 20 <try_writel+0x20> 1c: e12fff33 blx r3 20: e59f3104 ldr r3, [pc, #260] ; 12c <try_writel+0x12c> 24: e59f1104 ldr r1, [pc, #260] ; 130 <try_writel+0x130> 28: e5831678 str r1, [r3, #1656] ; 0x678 2c: f57ff04e dsb st 30: e5942018 ldr r2, [r4, #24] 34: e1a05001 mov r5, r1 38: e1a04003 mov r4, r3 3c: e3520000 cmp r2, #0 40: 0a000000 beq 48 <try_writel+0x48> 44: e12fff32 blx r2 48: e5845678 str r5, [r4, #1656] ; 0x678 4c: f57ff04e dsb st 50: e59f20d0 ldr r2, [pc, #208] ; 128 <try_writel+0x128> 54: e1a04002 mov r4, r2 58: e5923018 ldr r3, [r2, #24] 5c: e3530000 cmp r3, #0 60: 0a000000 beq 68 <try_writel+0x68> 64: e12fff33 blx r3 68: e59f30bc ldr r3, [pc, #188] ; 12c <try_writel+0x12c> 6c: e59f20c0 ldr r2, [pc, #192] ; 134 <try_writel+0x134> 70: e5832678 str r2, [r3, #1656] ; 0x678 74: f57ff04e dsb st 78: e5942018 ldr r2, [r4, #24] 7c: e1a04003 mov r4, r3 80: e3520000 cmp r2, #0 84: 0a000000 beq 8c <try_writel+0x8c> 88: e12fff32 blx r2 8c: e59f30a4 ldr r3, [pc, #164] ; 138 <try_writel+0x138> 90: e5843678 str r3, [r4, #1656] ; 0x678 94: f57ff04e dsb st 98: e59f2088 ldr r2, [pc, #136] ; 128 <try_writel+0x128> 9c: e1a04002 mov r4, r2 a0: e5923018 ldr r3, [r2, #24] a4: e3530000 cmp r3, #0 a8: 0a000000 beq b0 <try_writel+0xb0> ac: e12fff33 blx r3 b0: f57ff04e dsb st b4: e5943018 ldr r3, [r4, #24] b8: e3530000 cmp r3, #0 bc: 0a000000 beq c4 <try_writel+0xc4> c0: e12fff33 blx r3 c4: e59f3060 ldr r3, [pc, #96] ; 12c <try_writel+0x12c> c8: e59f206c ldr r2, [pc, #108] ; 13c <try_writel+0x13c> cc: e5832678 str r2, [r3, #1656] ; 0x678 d0: f57ff04f dsb sy d4: f57ff04e dsb st d8: e59f1048 ldr r1, [pc, #72] ; 128 <try_writel+0x128> dc: e1a04003 mov r4, r3 e0: e1a05001 mov r5, r1 e4: e5912018 ldr r2, [r1, #24] e8: e3520000 cmp r2, #0 ec: 0a000000 beq f4 <try_writel+0xf4> f0: e12fff32 blx r2 f4: e59f3044 ldr r3, [pc, #68] ; 140 <try_writel+0x140> f8: e5843678 str r3, [r4, #1656] ; 0x678 fc: f57ff05a dmb ishst 100: f57ff04e dsb st 104: e5953018 ldr r3, [r5, #24] 108: e3530000 cmp r3, #0 10c: 0a000000 beq 114 <try_writel+0x114> 110: e12fff33 blx r3 114: e59f3010 ldr r3, [pc, #16] ; 12c <try_writel+0x12c> 118: e59f2024 ldr r2, [pc, #36] ; 144 <try_writel+0x144> 11c: e5832678 str r2, [r3, #1656] ; 0x678 120: f57ff05b dmb ish 124: e8bd8038 pop {r3, r4, r5, pc} 128: 00000000 .word 0x00000000 12c: 12345000 .word 0x12345000 130: abcd0001 .word 0xabcd0001 134: abcd0002 .word 0xabcd0002 138: abcd0003 .word 0xabcd0003 13c: abcd0004 .word 0xabcd0004 140: abcd0005 .word 0xabcd0005 144: abcd0006 .word 0xabcd0006