iowrite32(), writel() and memory barriers taken apart

This post was written by eli on August 14, 2014
Posted Under: ARM,Linux kernel

Introduction

Needing to remove superfluous memory barriers from a Linux kernel device driver, I wondered what they actually do. The issue is discussed down to painful detail in Documentation/memory-barriers.txt, but somehow it’s quite difficult to figure out if they’re really needed and where. Most drivers rely on subsequent iowrite32′s (or writel’s) to arrive to the hardware in the same order they appear in the code, and this is backed up the following clause in memory-barriers.txt:

Inside of the Linux kernel, I/O should be done through the appropriate accessor routines – such as inb() or writel() – which know how to make such accesses appropriately sequential. Whilst this, for the most part, renders the explicit use of memory barriers unnecessary, there are a couple of situations where they might be needed:

  1. On some systems, I/O stores are not strongly ordered across all CPUs, and so for _all_ general drivers locks should be used and miowb() must be issued prior to unlocking the critical section.
  2. If the accessor functions are used to refer to an I/O memory window with relaxed memory access properties, then _mandatory_ memory barriers are required to enforce ordering.

See Documentation/DocBook/deviceiobook.tmpl for more information.

So what they’re saying is that a memory barrier should be used before releasing a lock (spinlock? mutex? both? The examples show only a spinlock) and when prefetching is allowed by hardware.

Nice. Are they doing anything?

April 2020 update: I’ve written a new post on a similar topic. Also, on top of memory-barriers.txt mentioned above, there are some excellent explanations in the kernel tree’s tools/memory-model/Documentation/explanation.txt and tools/memory-model/Documentation/recipes.txt. There are relatively new (from v4.17, beginning of 2018).

May 2021 update: I’ve also written the parallel post for Windows device driver coding, which occasionally brings up Linux.

The practical take

Since I care most about x86 and ARM, I decided to figure out what the memory barriers actually do. The driver’s code should be formally correct, but in the end, if I remove a memory barrier and then test the driver — have I really made a difference? Have I really tested anything?

Ah, and in case you wonder why I didn’t check ioread32() and readl(): I don’t use them in my driver. Odd as it may sound.

The kernel sources in this post are ~3.12 but how often does anyone dare touching those basic functions?

Spoiler

For the lazy ones, here are my conclusions:

  • On x86 platforms, iowrite32() and writel() are translated to just a “mov” into memory.
  • On ARM, the same functions translate into a full write synchronization barrier (stop execution until all previous writes are done), and then an “str” into memory.
  • On x86, the following functions translate into nothing: mmiowb(), smp_wmb() and smp_rmb(). wmb() and rmb() translate into “sfence” and “lfence” respectively.
  • On ARM, mmiowb() translates into nothing. The other barriers translate into sensible opcodes.

Trying memory barriers with iowrite32()

I wrote the following kernel module as minimodule.c. Obviously, it won’t do anything good except for being disassembled after compilation.

#include <linux/module.h>
#include <linux/slab.h>
#include <linux/io.h>

void try_iowrite32(void) {
  void __iomem *p = (void *) 0x12345678;

  iowrite32(0xabcd0001, p);
  iowrite32(0xabcd0001, p);
  iowrite32(0xabcd0002, p);
  mmiowb();
  iowrite32(0xabcd0003, p);
  wmb();
  iowrite32(0xabcd0004, p);
  rmb();
  iowrite32(0xabcd0005, p);
  smp_wmb();
  iowrite32(0xabcd0006, p);
  smp_rmb();
}

EXPORT_SYMBOL(try_iowrite32);

The idea: First repeat exactly the same write to see how that’s handled, and then add barriers to see what they turn into.

The related sources for iowrite32() on x86

I have to admit that I was surprised to find out that iowrite32() is a function in itself, as is shown later in the disassembly. My best understanding was that it’s just an alias for writel(), by virtue of a define statement. But since CONFIG_GENERIC_IOMAP is defined on my kernel, it’s not defined in include/asm-generic/io.h, but there’s just a header for it in include/asm-generic/iomap.h. It’s defined as a function in lib/iomap.c as follows:

void iowrite32(u32 val, void __iomem *addr)
{
	IO_COND(addr, outl(val,port), writel(val, addr));
}

where IO_COND is previously defined in the same file as follows (the comment is in the sources):

/*
 * Ugly macros are a way of life.
 */
#define IO_COND(addr, is_pio, is_mmio) do {			\
	unsigned long port = (unsigned long __force)addr;	\
	if (port >= PIO_RESERVED) {				\
		is_mmio;					\
	} else if (port > PIO_OFFSET) {				\
		port &= PIO_MASK;				\
		is_pio;						\
	} else							\
		bad_io_access(port, #is_pio );			\
} while (0)

So there we have it. iowrite32() isn’t just an alias for writel(), but it checks the address and interprets it as port I/O if that makes sense.

To be sure, iowrite32() was disassembled as follows from the kernel’s object code (32-bit version):

0020f79f <iowrite32>:
  20f79f:       81 fa ff ff 03 00       cmp    $0x3ffff,%edx
  20f7a5:       89 d1                   mov    %edx,%ecx
  20f7a7:       76 03                   jbe    20f7ac <iowrite32+0xd>
  20f7a9:       89 02                   mov    %eax,(%edx)
  20f7ab:       c3                      ret
  20f7ac:       81 fa 00 00 01 00       cmp    $0x10000,%edx
  20f7b2:       76 08                   jbe    20f7bc <iowrite32+0x1d>
  20f7b4:       81 e2 ff ff 00 00       and    $0xffff,%edx
  20f7ba:       ef                      out    %eax,(%dx)
  20f7bb:       c3                      ret
  20f7bc:       ba f2 56 03 00          mov    $0x356f2,%edx
  20f7c1:       89 c8                   mov    %ecx,%eax
  20f7c3:       e9 41 fe ff ff          jmp    20f609 <bad_io_access>

Results on x86_64

Compiled on Intel x86/64 bit:

$ objdump -d minimodule.ko

minimodule.ko:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <try_iowrite32>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	e8 00 00 00 00       	callq  9 <try_iowrite32+0x9>
   9:	be 78 56 34 12       	mov    $0x12345678,%esi
   e:	bf 01 00 cd ab       	mov    $0xabcd0001,%edi
  13:	e8 00 00 00 00       	callq  18 <try_iowrite32+0x18>
  18:	be 78 56 34 12       	mov    $0x12345678,%esi
  1d:	bf 01 00 cd ab       	mov    $0xabcd0001,%edi
  22:	e8 00 00 00 00       	callq  27 <try_iowrite32+0x27>
  27:	be 78 56 34 12       	mov    $0x12345678,%esi
  2c:	bf 02 00 cd ab       	mov    $0xabcd0002,%edi
  31:	e8 00 00 00 00       	callq  36 <try_iowrite32+0x36>
  36:	be 78 56 34 12       	mov    $0x12345678,%esi
  3b:	bf 03 00 cd ab       	mov    $0xabcd0003,%edi
  40:	e8 00 00 00 00       	callq  45 <try_iowrite32+0x45>
  45:	0f ae f8             	sfence
  48:	be 78 56 34 12       	mov    $0x12345678,%esi
  4d:	bf 04 00 cd ab       	mov    $0xabcd0004,%edi
  52:	e8 00 00 00 00       	callq  57 <try_iowrite32+0x57>
  57:	0f ae e8             	lfence
  5a:	be 78 56 34 12       	mov    $0x12345678,%esi
  5f:	bf 05 00 cd ab       	mov    $0xabcd0005,%edi
  64:	e8 00 00 00 00       	callq  69 <try_iowrite32+0x69>
  69:	be 78 56 34 12       	mov    $0x12345678,%esi
  6e:	bf 06 00 cd ab       	mov    $0xabcd0006,%edi
  73:	e8 00 00 00 00       	callq  78 <try_iowrite32+0x78>
  78:	c9                   	leaveq
  79:	c3                   	retq
	...

Those “callq” statements are modified upon linking. To resolve what these are calling, go

$ readelf -r minimodule.ko

Relocation section '.rela.text' at offset 0xa9b0 contains 8 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000005  002300000002 R_X86_64_PC32     0000000000000000 mcount - 4
000000000014  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000023  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000032  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000041  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000053  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000065  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000074  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4

(the output continues with relocation information for debug variables).

It’s quite easy to work this out: The “Offset” column tells us the offset in the object code. For example, a callq statement begins at 0x13, but the address to call starts at 0x14. The second entry in the relocation section points at offset 0x14, and says that the target is iowrite32().

So from this output we learn that all callq’s are to iowrite32(), except the first one, which goes to mcount() (which is intended for kernel call tracing).

Now to conclusions: There are no memory barriers in the code, except those generated by wmb() and rmb(), which added sfence and lfence respectively. sfence is defined as

Performs a serializing operation on all store instructions that were issued prior the SFENCE instruction. This serializing operation guarantees that every store instruction that precedes in program order the SFENCE instruction is globally visible before any store instruction that follows the SFENCE instruction is globally visible. The SFENCE instruction is ordered with respect store instructions, other SFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to load instructions or the LFENCE instruction.

and lfence as

Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction. This serializing operation guarantees that every load instruction that precedes in program order the LFENCE instruction is globally visible before any load instruction that follows the LFENCE instruction is globally visible. The LFENCE instruction is ordered with respect to load instructions, other LFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to store instructions or the SFENCE instruction.

One can feel the Intel-headache just reading this.

Results on x86 (32 bit)

Compiling this against a 32-bit kernel, with a slightly different configuration:

$ objdump -d minimodule.ko

minimodule.ko:     file format elf32-i386

Disassembly of section .text:

00000000 <try_iowrite32>:
   0:	ba 78 56 34 12       	mov    $0x12345678,%edx
   5:	b8 01 00 cd ab       	mov    $0xabcd0001,%eax
   a:	e8 fc ff ff ff       	call   b <try_iowrite32+0xb>
   f:	ba 78 56 34 12       	mov    $0x12345678,%edx
  14:	b8 01 00 cd ab       	mov    $0xabcd0001,%eax
  19:	e8 fc ff ff ff       	call   1a <try_iowrite32+0x1a>
  1e:	ba 78 56 34 12       	mov    $0x12345678,%edx
  23:	b8 02 00 cd ab       	mov    $0xabcd0002,%eax
  28:	e8 fc ff ff ff       	call   29 <try_iowrite32+0x29>
  2d:	ba 78 56 34 12       	mov    $0x12345678,%edx
  32:	b8 03 00 cd ab       	mov    $0xabcd0003,%eax
  37:	e8 fc ff ff ff       	call   38 <try_iowrite32+0x38>
  3c:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  41:	ba 78 56 34 12       	mov    $0x12345678,%edx
  46:	b8 04 00 cd ab       	mov    $0xabcd0004,%eax
  4b:	e8 fc ff ff ff       	call   4c <try_iowrite32+0x4c>
  50:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  55:	ba 78 56 34 12       	mov    $0x12345678,%edx
  5a:	b8 05 00 cd ab       	mov    $0xabcd0005,%eax
  5f:	e8 fc ff ff ff       	call   60 <try_iowrite32+0x60>
  64:	ba 78 56 34 12       	mov    $0x12345678,%edx
  69:	b8 06 00 cd ab       	mov    $0xabcd0006,%eax
  6e:	e8 fc ff ff ff       	call   6f <try_iowrite32+0x6f>
  73:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  78:	c3                   	ret
  79:	00 00                	add    %al,(%eax)
	...

Disassembly of section .altinstr_replacement:

00000000 <.altinstr_replacement>:
   0:	0f ae f8             	sfence
   3:	0f ae e8             	lfence
   6:	0f ae e8             	lfence

$ readelf -r minimodule.ko

Relocation section '.rel.text' at offset 0xc3e0 contains 7 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0000000b  00002402 R_386_PC32        00000000   iowrite32
0000001a  00002402 R_386_PC32        00000000   iowrite32
00000029  00002402 R_386_PC32        00000000   iowrite32
00000038  00002402 R_386_PC32        00000000   iowrite32
0000004c  00002402 R_386_PC32        00000000   iowrite32
00000060  00002402 R_386_PC32        00000000   iowrite32
0000006f  00002402 R_386_PC32        00000000   iowrite32

So it’s in essence the same, only the mcount() call in the beginning was skipped.

The related sources for iowrite32() on ARM

These are the key excerpts from arch/arm/include/asm/io.h:

static inline void __raw_writel(u32 val, volatile void __iomem *addr)
{
	asm volatile("str %1, %0"
		     : "+Qo" (*(volatile u32 __force *)addr)
		     : "r" (val));
}
...
#define writel_relaxed(v,c)	__raw_writel((__force u32) cpu_to_le32(v),c)
...
#define writel(v,c)		({ __iowmb(); writel_relaxed(v,c); })
...
#define iowrite32(v,p)	({ __iowmb(); __raw_writel((__force __u32)cpu_to_le32(v), p); })

As for __iowmb(), it goes

/* IO barriers */
#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
#include <asm/barrier.h>
#define __iormb()		rmb()
#define __iowmb()		wmb()
#else
#define __iormb()		do { } while (0)
#define __iowmb()		do { } while (0)
#endif

so it’s down to the configuration if __iowmb() does something. And to get the full picture, these are snips from arch/arm/include/asm/barrier.h:

#if __LINUX_ARM_ARCH__ >= 7
#define isb(option) __asm__ __volatile__ ("isb " #option : : : "memory")
#define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory")
#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
...
#ifdef CONFIG_ARCH_HAS_BARRIERS
#include <mach/barriers.h>
#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb()		do { dsb(); outer_sync(); } while (0)
#define rmb()		dsb()
#define wmb()		do { dsb(st); outer_sync(); } while (0)
#else
#define mb()		barrier()
#define rmb()		barrier()
#define wmb()		barrier()
#endif

Results on ARM

This is what the same module compiled for ARM Cortex A9, Little Endian gives (I’ve added extra newlines in the middle for clarity):

minimodule.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <try_iowrite32>:
   0:	e92d4038 	push	{r3, r4, r5, lr}

   4:	f57ff04e 	dsb	st
   8:	e59f2118 	ldr	r2, [pc, #280]	; 128 <try_iowrite32+0x128>
   c:	e1a04002 	mov	r4, r2
  10:	e5923018 	ldr	r3, [r2, #24]
  14:	e3530000 	cmp	r3, #0
  18:	0a000000 	beq	20 <try_iowrite32+0x20>
  1c:	e12fff33 	blx	r3
  20:	e59f3104 	ldr	r3, [pc, #260]	; 12c <try_iowrite32+0x12c>
  24:	e59f1104 	ldr	r1, [pc, #260]	; 130 <try_iowrite32+0x130>
  28:	e5831678 	str	r1, [r3, #1656]	; 0x678

  2c:	f57ff04e 	dsb	st
  30:	e5942018 	ldr	r2, [r4, #24]
  34:	e1a05001 	mov	r5, r1
  38:	e1a04003 	mov	r4, r3
  3c:	e3520000 	cmp	r2, #0
  40:	0a000000 	beq	48 <try_iowrite32+0x48>
  44:	e12fff32 	blx	r2
  48:	e5845678 	str	r5, [r4, #1656]	; 0x678

  4c:	f57ff04e 	dsb	st
  50:	e59f20d0 	ldr	r2, [pc, #208]	; 128 <try_iowrite32+0x128>
  54:	e1a04002 	mov	r4, r2
  58:	e5923018 	ldr	r3, [r2, #24]
  5c:	e3530000 	cmp	r3, #0
  60:	0a000000 	beq	68 <try_iowrite32+0x68>
  64:	e12fff33 	blx	r3
  68:	e59f30bc 	ldr	r3, [pc, #188]	; 12c <try_iowrite32+0x12c>
  6c:	e59f20c0 	ldr	r2, [pc, #192]	; 134 <try_iowrite32+0x134>
  70:	e5832678 	str	r2, [r3, #1656]	; 0x678

  74:	f57ff04e 	dsb	st
  78:	e5942018 	ldr	r2, [r4, #24]
  7c:	e1a04003 	mov	r4, r3
  80:	e3520000 	cmp	r2, #0
  84:	0a000000 	beq	8c <try_iowrite32+0x8c>
  88:	e12fff32 	blx	r2
  8c:	e59f30a4 	ldr	r3, [pc, #164]	; 138 <try_iowrite32+0x138>
  90:	e5843678 	str	r3, [r4, #1656]	; 0x678

  94:	f57ff04e 	dsb	st
  98:	e59f2088 	ldr	r2, [pc, #136]	; 128 <try_iowrite32+0x128>
  9c:	e1a04002 	mov	r4, r2
  a0:	e5923018 	ldr	r3, [r2, #24]
  a4:	e3530000 	cmp	r3, #0
  a8:	0a000000 	beq	b0 <try_iowrite32+0xb0>
  ac:	e12fff33 	blx	r3

  b0:	f57ff04e 	dsb	st
  b4:	e5943018 	ldr	r3, [r4, #24]
  b8:	e3530000 	cmp	r3, #0
  bc:	0a000000 	beq	c4 <try_iowrite32+0xc4>
  c0:	e12fff33 	blx	r3
  c4:	e59f3060 	ldr	r3, [pc, #96]	; 12c <try_iowrite32+0x12c>
  c8:	e59f206c 	ldr	r2, [pc, #108]	; 13c <try_iowrite32+0x13c>
  cc:	e5832678 	str	r2, [r3, #1656]	; 0x678
  d0:	f57ff04f 	dsb	sy

  d4:	f57ff04e 	dsb	st
  d8:	e59f1048 	ldr	r1, [pc, #72]	; 128 <try_iowrite32+0x128>
  dc:	e1a04003 	mov	r4, r3
  e0:	e1a05001 	mov	r5, r1
  e4:	e5912018 	ldr	r2, [r1, #24]
  e8:	e3520000 	cmp	r2, #0
  ec:	0a000000 	beq	f4 <try_iowrite32+0xf4>
  f0:	e12fff32 	blx	r2
  f4:	e59f3044 	ldr	r3, [pc, #68]	; 140 <try_iowrite32+0x140>
  f8:	e5843678 	str	r3, [r4, #1656]	; 0x678
  fc:	f57ff05a 	dmb	ishst

 100:	f57ff04e 	dsb	st
 104:	e5953018 	ldr	r3, [r5, #24]
 108:	e3530000 	cmp	r3, #0
 10c:	0a000000 	beq	114 <try_iowrite32+0x114>
 110:	e12fff33 	blx	r3
 114:	e59f3010 	ldr	r3, [pc, #16]	; 12c <try_iowrite32+0x12c>
 118:	e59f2024 	ldr	r2, [pc, #36]	; 144 <try_iowrite32+0x144>
 11c:	e5832678 	str	r2, [r3, #1656]	; 0x678
 120:	f57ff05b 	dmb	ish
 124:	e8bd8038 	pop	{r3, r4, r5, pc}
 128:	00000000 	.word	0x00000000
 12c:	12345000 	.word	0x12345000
 130:	abcd0001 	.word	0xabcd0001
 134:	abcd0002 	.word	0xabcd0002
 138:	abcd0003 	.word	0xabcd0003
 13c:	abcd0004 	.word	0xabcd0004
 140:	abcd0005 	.word	0xabcd0005
 144:	abcd0006 	.word	0xabcd0006

This was a lot of code (somehow that’s what you get with ARM). There are no calls to iowrite32(), so this is done inline for ARM (consistent with the sources).

This requires some translation from ARM opcodes to human language (taken from this page):

  • DSB SY — Data Synchronization Barrier: No instruction in program order after this instruction executes until all explicit memory accesses before this instruction complete, as well as all cache, branch predictor and TLB maintenance operations before this instruction complete.
  • DSB ST — Like DSB SY, but waits only for data writes to complete.
  • DMB ISHST — Data Memory Barrier, operation that waits only for stores to complete, and only to the inner shareable domain (whatever that “inner shareable domain” is).
  • DMB ISH — Data Memory Barrier, operation that waits only to the inner shareable domain.

Now let’s decipher the assembly code, which is quite tangled. Luckily, it’s easy to spot the seven write operations as the seven “str” commands in the assembly code. It’s also easy to see that all each iowrite32() starts with an “dsb st” which forces waiting until previous writes has completed. So each iowrite32() spans from a “dsb st” to a “str”. This matches the definition of iowrite32() as __iowmb() and then __raw_writel(…).

The memory barriers are quite clear too:

  • wmb() becomes “dsb st”, the full synchronization barrier for writes (which is also issued automatically before each iowrite32).
  • rmb() becomes “dsb sy”, the full synchronization barrier for reads and writes
  • smp_wmb() becomes “dmb ishst”, the “inner shareable domain” memory barrier for writes
  • smp_rmb() becomes “dmb ish”, the “inner shareable domain” memory barrier for reads and writes

Now with writel()

So I through it would be nice to repeat all this with writel(). Spoiler: Nothing thrilling happens here.

Module code (includes omitted):

void try_writel(void) {
  void __iomem *p = (void *) 0x12345678;

  writel(0xabcd0001, p);
  writel(0xabcd0001, p);
  writel(0xabcd0002, p);
  mmiowb();
  writel(0xabcd0003, p);
  wmb();
  writel(0xabcd0004, p);
  rmb();
  writel(0xabcd0005, p);
  smp_wmb();
  writel(0xabcd0006, p);
  smp_rmb();
}

EXPORT_SYMBOL(try_writel);

Assembly on 64-bit Intel:

minimodule.ko:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <try_writel>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	e8 00 00 00 00       	callq  9 <try_writel+0x9>
   9:	b8 01 00 cd ab       	mov    $0xabcd0001,%eax
   e:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  15:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  1c:	b8 02 00 cd ab       	mov    $0xabcd0002,%eax
  21:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  28:	b8 03 00 cd ab       	mov    $0xabcd0003,%eax
  2d:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  34:	0f ae f8             	sfence
  37:	b8 04 00 cd ab       	mov    $0xabcd0004,%eax
  3c:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  43:	0f ae e8             	lfence
  46:	b8 05 00 cd ab       	mov    $0xabcd0005,%eax
  4b:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  52:	b8 06 00 cd ab       	mov    $0xabcd0006,%eax
  57:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  5e:	c9                   	leaveq
  5f:	c3                   	retq

OK, so writel() just translated into a couple of inline “mov” opcodes. There’s even an optimization between the first and second move, so %eax isn’t set twice. Hi-tec, I’m telling you.

And on 32-bit Intel:

minimodule.ko:     file format elf32-i386

Disassembly of section .text:

00000000 <try_writel>:
   0:	b8 01 00 cd ab       	mov    $0xabcd0001,%eax
   5:	a3 78 56 34 12       	mov    %eax,0x12345678
   a:	a3 78 56 34 12       	mov    %eax,0x12345678
   f:	b0 02                	mov    $0x2,%al
  11:	a3 78 56 34 12       	mov    %eax,0x12345678
  16:	b0 03                	mov    $0x3,%al
  18:	a3 78 56 34 12       	mov    %eax,0x12345678
  1d:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  22:	b0 04                	mov    $0x4,%al
  24:	a3 78 56 34 12       	mov    %eax,0x12345678
  29:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  2e:	b0 05                	mov    $0x5,%al
  30:	a3 78 56 34 12       	mov    %eax,0x12345678
  35:	b0 06                	mov    $0x6,%al
  37:	a3 78 56 34 12       	mov    %eax,0x12345678
  3c:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  41:	c3                   	ret
	...

Disassembly of section .altinstr_replacement:

00000000 <.altinstr_replacement>:
   0:	0f ae f8             	sfence
   3:	0f ae e8             	lfence
   6:	0f ae e8             	lfence

And for ARM, it’s exactly the same code (to the byte) as iowrite32() is an alias for writel(). But I listed it here anyhow for those who don’t take my word for it:

minimodule.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <try_writel>:
   0:	e92d4038 	push	{r3, r4, r5, lr}
   4:	f57ff04e 	dsb	st
   8:	e59f2118 	ldr	r2, [pc, #280]	; 128 <try_writel+0x128>
   c:	e1a04002 	mov	r4, r2
  10:	e5923018 	ldr	r3, [r2, #24]
  14:	e3530000 	cmp	r3, #0
  18:	0a000000 	beq	20 <try_writel+0x20>
  1c:	e12fff33 	blx	r3
  20:	e59f3104 	ldr	r3, [pc, #260]	; 12c <try_writel+0x12c>
  24:	e59f1104 	ldr	r1, [pc, #260]	; 130 <try_writel+0x130>
  28:	e5831678 	str	r1, [r3, #1656]	; 0x678
  2c:	f57ff04e 	dsb	st
  30:	e5942018 	ldr	r2, [r4, #24]
  34:	e1a05001 	mov	r5, r1
  38:	e1a04003 	mov	r4, r3
  3c:	e3520000 	cmp	r2, #0
  40:	0a000000 	beq	48 <try_writel+0x48>
  44:	e12fff32 	blx	r2
  48:	e5845678 	str	r5, [r4, #1656]	; 0x678
  4c:	f57ff04e 	dsb	st
  50:	e59f20d0 	ldr	r2, [pc, #208]	; 128 <try_writel+0x128>
  54:	e1a04002 	mov	r4, r2
  58:	e5923018 	ldr	r3, [r2, #24]
  5c:	e3530000 	cmp	r3, #0
  60:	0a000000 	beq	68 <try_writel+0x68>
  64:	e12fff33 	blx	r3
  68:	e59f30bc 	ldr	r3, [pc, #188]	; 12c <try_writel+0x12c>
  6c:	e59f20c0 	ldr	r2, [pc, #192]	; 134 <try_writel+0x134>
  70:	e5832678 	str	r2, [r3, #1656]	; 0x678
  74:	f57ff04e 	dsb	st
  78:	e5942018 	ldr	r2, [r4, #24]
  7c:	e1a04003 	mov	r4, r3
  80:	e3520000 	cmp	r2, #0
  84:	0a000000 	beq	8c <try_writel+0x8c>
  88:	e12fff32 	blx	r2
  8c:	e59f30a4 	ldr	r3, [pc, #164]	; 138 <try_writel+0x138>
  90:	e5843678 	str	r3, [r4, #1656]	; 0x678
  94:	f57ff04e 	dsb	st
  98:	e59f2088 	ldr	r2, [pc, #136]	; 128 <try_writel+0x128>
  9c:	e1a04002 	mov	r4, r2
  a0:	e5923018 	ldr	r3, [r2, #24]
  a4:	e3530000 	cmp	r3, #0
  a8:	0a000000 	beq	b0 <try_writel+0xb0>
  ac:	e12fff33 	blx	r3
  b0:	f57ff04e 	dsb	st
  b4:	e5943018 	ldr	r3, [r4, #24]
  b8:	e3530000 	cmp	r3, #0
  bc:	0a000000 	beq	c4 <try_writel+0xc4>
  c0:	e12fff33 	blx	r3
  c4:	e59f3060 	ldr	r3, [pc, #96]	; 12c <try_writel+0x12c>
  c8:	e59f206c 	ldr	r2, [pc, #108]	; 13c <try_writel+0x13c>
  cc:	e5832678 	str	r2, [r3, #1656]	; 0x678
  d0:	f57ff04f 	dsb	sy
  d4:	f57ff04e 	dsb	st
  d8:	e59f1048 	ldr	r1, [pc, #72]	; 128 <try_writel+0x128>
  dc:	e1a04003 	mov	r4, r3
  e0:	e1a05001 	mov	r5, r1
  e4:	e5912018 	ldr	r2, [r1, #24]
  e8:	e3520000 	cmp	r2, #0
  ec:	0a000000 	beq	f4 <try_writel+0xf4>
  f0:	e12fff32 	blx	r2
  f4:	e59f3044 	ldr	r3, [pc, #68]	; 140 <try_writel+0x140>
  f8:	e5843678 	str	r3, [r4, #1656]	; 0x678
  fc:	f57ff05a 	dmb	ishst
 100:	f57ff04e 	dsb	st
 104:	e5953018 	ldr	r3, [r5, #24]
 108:	e3530000 	cmp	r3, #0
 10c:	0a000000 	beq	114 <try_writel+0x114>
 110:	e12fff33 	blx	r3
 114:	e59f3010 	ldr	r3, [pc, #16]	; 12c <try_writel+0x12c>
 118:	e59f2024 	ldr	r2, [pc, #36]	; 144 <try_writel+0x144>
 11c:	e5832678 	str	r2, [r3, #1656]	; 0x678
 120:	f57ff05b 	dmb	ish
 124:	e8bd8038 	pop	{r3, r4, r5, pc}
 128:	00000000 	.word	0x00000000
 12c:	12345000 	.word	0x12345000
 130:	abcd0001 	.word	0xabcd0001
 134:	abcd0002 	.word	0xabcd0002
 138:	abcd0003 	.word	0xabcd0003
 13c:	abcd0004 	.word	0xabcd0004
 140:	abcd0005 	.word	0xabcd0005
 144:	abcd0006 	.word	0xabcd0006

Add a Comment

required, use real name
required, will not be published
optional, your blog address