Fixing the mouse sensitivity on Gnome 2

This related to my Fedora 12 machine with a Logitech M705 mouse. It had a generally bad feeling, I would say.

This is actually written on this post already, with some more details on this one, but I prefer having my own routine and final values written down.

So first get a list of input devices:

$ xinput list
⎡ Virtual core pointer                        id=2    [master pointer  (3)]
⎜   ↳ Virtual core XTEST pointer                  id=4    [slave  pointer  (2)]
⎜   ↳ Microsoft Microsoft 5-Button Mouse with IntelliEye(TM)    id=6    [slave  pointer  (2)]
⎜   ↳ HID 04f3:0103                               id=7    [slave  pointer  (2)]
⎜   ↳ Logitech USB Receiver                       id=9    [slave  pointer  (2)]
⎣ Virtual core keyboard                       id=3    [master keyboard (2)]
 ↳ Virtual core XTEST keyboard                 id=5    [slave  keyboard (3)]
 ↳ Power Button                                id=12    [slave  keyboard (3)]
 ↳ Power Button                                id=13    [slave  keyboard (3)]
 ↳ USB  AUDIO                                  id=14    [slave  keyboard (3)]
 ↳ HID 04f3:0103                               id=8    [slave  keyboard (3)]
 ↳ Logitech USB Receiver                       id=10    [slave  keyboard (3)]

Then get the properties of the USB mouse. Since the string “Logitech USB Receiver” refers to a keyboard input as well as a mouse input, this has to be disambiguated with a pointer: prefix to the identifier. Or just use the ID (not safe on a script, though):

So

$ xinput list-props 9

and

$ xinput list-props pointer:"Logitech USB Receiver"

give the same result, given the list of input devices above.

The output:

$ xinput list-props pointer:"Logitech USB Receiver"
Device 'Logitech USB Receiver':
 Device Enabled (131):    1
 Device Accel Profile (264):    0
 Device Accel Constant Deceleration (265):    1.000000
 Device Accel Adaptive Deceleration (267):    1.000000
 Device Accel Velocity Scaling (268):    10.000000
 Evdev Reopen Attempts (269):    10
 Evdev Axis Inversion (270):    0, 0
 Evdev Axes Swap (272):    0
 Axis Labels (273):    "Rel X" (139), "Rel Y" (140)
 Button Labels (274):    "Button Left" (132), "Button Middle" (133), "Button Right" (134), "Button Wheel Up" (135), "Button Wheel Down" (136), "Button Horiz Wheel Left" (137), "Button Horiz Wheel Right" (138), "Button Side" (283), "Button Extra" (284), "Button Forward" (1205), "Button Back" (1206), "Button Task" (1207), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249)
 Evdev Middle Button Emulation (275):    2
 Evdev Middle Button Timeout (276):    50
 Evdev Wheel Emulation (277):    0
 Evdev Wheel Emulation Axes (278):    0, 0, 4, 5
 Evdev Wheel Emulation Inertia (279):    10
 Evdev Wheel Emulation Timeout (280):    200
 Evdev Wheel Emulation Button (281):    4
 Evdev Drag Lock Buttons (282):    0

It turns out, that the required change on my machine was

$ xinput set-prop pointer:"Logitech USB Receiver" "Device Accel Adaptive Deceleration" 3

This is not what I expected to do — it slows down the pointer’s movement when the mouse moves slowly. Surprisingly enough, this makes the pointing more intuitive, because hitting that exact spot requires more physical motion, and mouse doesn’t get stuck on millimeters.

As the said post mentions, these settings won’t survive a session restart. But that’s a rare event on my computer. Anyhow, the method suggsted for making it persistent is to add a small script as a starter application. To do this, prepare a small script doing the required setup, and add it as a starter script with

$ gnome-session-properties &

Or, maybe the correct way is to add/edit ~/.xinitrc or ~/.xprofile? Will figure that out when I logout next time (happens once in a few months…).

 

Vivado 2014.1 eliminating instantiations of IP (black boxes)

I discovered this problem in a project that instantiated a 512-bit wide FIFO consisting many (>16) times in different modules. For some unknown reason (it’s called a bug, I suppose) Vivado treated the instantiation as if it wasn’t there, and optimized all surrounding logic as if the black box’ output ports were all zero. For a lower number of instantiations, Vivado handled the instantiation as expected.

I should point out that Vivado did issue synthesis warnings related to the instantiation as if it was OK (e.g. mismatches between port widths and the wires applied), and yet, there was no trace of these instantiation in the post-synthesis netlist view.

In Vivado, the cores from the IP Catalog is treated as a black box module: The IP is typically first compiled into a DCP, and then a black-box module (empty Verilog module, for example) is used to represent it during the synthesis stage. The DCP is then fused into the project during the implementation (like ngdbuild in ISE).

One clue that this happens takes the form of a critical warning from the implementation stage saying something like

CRITICAL WARNING: [Designutils 20-1280] Could not find module 'fifo_wide'. The XDC file /path/to/fifo_wide/fifo_wide/fifo_wide.xdc will not be read for any cell of this module.

Another way to tell this has taken place is to look in the synthesis’ runme.log file (as in vivado-project/project.runs/synth_1/runme.log). The black boxes are listed in the “Report BlackBoxes” section, and each of their instantiation in “Report Cell Usage”. So if the instantiated module doesn’t appear at all in the former, or not enough times in the latter — this is a clear indication something went wrong.

Workaround

After trying out a lot of things, the workaround was to define two IP cores — fifo_wide_rd and fifo_wide_wr — which are identical. The root of them problem seems to have been that the same FIFO was used in two different modules (one that writes from a DDR memory, and one that reads). Due to the different usage contexts and the huge amount of logic involved, it seems like the tools messed up trying to optimize things.

So using one core for the write module and one for the read module got the tools back on track. This is of course no sensible reason to use different cores in different modules, except for a bug in Vivado.

I should mention, that another FIFO is instantiated 20 times in the design, also from two different modules, and nothing bad happened there. However its width is only 32 bits.

Failed attempt

This solves the problem on synthesis, but not all the way through. I left it here just for reference.

The simple solution is to tell Vivado not to attempt optimizing anything related to this module. For example, if the instance name is fifo_wide_inst, the following line in any of the XDC constraints files will do:

set_property DONT_TOUCH true [get_cells -hier -filter {name=~*/fifo_wide_inst}]

This should be completely harmless, as there’s nothing to optimize anyhow — the logic is already optimized inside the DCP. It may be a good idea to do this to all instantiations, just to be sure.

What actually happened with this constraint is that many groups of twenty BUF elements (not IBUF or something like that. Just BUF), named for example ‘bbstub_dout[194]__xx’ (xx going from 1 to 20) were created in the netlist. All had non-connected inputs and the outputs of all twenty buffers connected to the same net. So obviously, nothing good came out of this. The fifo_wide_inst block was non-existent in the netlist, even though twenty instances of it appeared in the synthesis’ runme.log file.

So there were twenty groups of bbstubs for each of the 512 wires of the FIFO, and this applied for each of the twenty modules on which one of these FIFOs was instantiated. No wonder the implementation took a lot of time.

Vivado 2014.1 / Linux Ubuntu 14.04 license activation notes

Introduction

After installing Vivado 2014.1 on my laptop running Ubuntu 14.04 (64 bits), I went for license activation. All I wanted was a plain node-locked license. Not a server, and not a floating one. Baseline.

Xilinx abandoned the good old certificate licensing in favor of activation licensing. That is causing some headaches lately…

Going through the process, I had several problems. The most commonly reported is that when one enters the web page on which the license should be generated (see image below), the activation region is greyed out. A “helpful” message next to the greyed area gives suggestions on why this area is disabled: Either a license has already been submitted based upon that certain request ID, or the page was entered directly, and not through Vivado Licensing Manager.

But there’s another important possible reason: The request is maybe invalid. In particular because the computer’s identification is lacking.

If this is the case, there is no special error message for this. Just that “important information” note. See “What the problem was” below.

Xilinx' licensing page, activation area greyed outAs a side note: Ubuntu 14.04 is not in the list of supported OS’s, but that’s what I happen to have. Besides, the problem wasn’t with the OS, it turned out.

The activation process (in brief)

It seems like the whole idea about this activation process is that the licensing file that is returned from Xilinx won’t be usable more than once. So instead of making the licensing file valid for a computer ID, it’s made valid with respect to a request ID. Hence the licensing tools on the user’s computer first needs to prepare itself for receiving a licensing file by creating some random data and call it a request ID. That data is conveyed to the licensing web server (Xilinx’ server, that is) along with information about the machine.

The licensing server creates a licensing file, which corresponds to the request ID, enabling the licensed features the user requested on the site. The user feeds this licensing file into the licensing tools (locally on its computer), which match the request ID in its own records with the one in the licensing file. If there is a match, it makes a note to itself that the relevant features are activated. Also, it deletes the information about that request ID from its records.

The database containing the requests and what features are enabled is kept in the “trusted area”. Which is a fine name for some obscured database.

In practice, the process goes as follows: When clicking “Connect Now”, Xilinx licensing client on your computer collects identifying information about your computer, and creates some mumbo-jumbo hex data hashes to represent that information + creates a request ID. It then stores this information in the computer’s own “trusted area” (which must be generated manually prior to this on a Linux machine) so it remembers this request when its response returns.

It then opens a web browser (looks like it just tries running Google Chrome first and then Firefox) with an URL that contains those mumbo-jumbo hex hashes. That eventually leads to that famous licensing page. The user checks some licensing features, a licensing file is generated (and those features are counted as “taken” on the site).

The thing is, that in order to create an activation license, the web server needs those mumbo-jumbo hashes in the URL, so it knows which request ID it works against. Also, if a request ID has already been used to create a license, it can’t be reused, because the licensing tools at the user’s side may have deleted the information about that request ID after accepting the previous licensing file.

What the problem was

The reason turned out to be that my laptop lacks a wired Ethernet NIC, but has only a wireless LAN interface. The FLEXnet license manager obviously didn’t consider wlan0 to be an eligible candidate for supplying the identifying MAC number (even though it’s an Ethernet card for all purposes), so the request that was generated for the computer was rejected.

This can be seen in the XML file that is generated by the command-line tools (see below) in the absence of any identifying method:

<UniqueMachineNumbers>
<UniqueMachineNumber><Type>1</Type><Value></Value></UniqueMachineNumber>
<UniqueMachineNumber><Type>2</Type><Value></Value></UniqueMachineNumber>
<UniqueMachineNumber><Type>4</Type><Value></Value></UniqueMachineNumber>
</UniqueMachineNumbers>

Compare this with after adding a (fake) NIC, as shown below:

<UniqueMachineNumbers>
<UniqueMachineNumber><Type>1</Type><Value></Value></UniqueMachineNumber>
<UniqueMachineNumber><Type>2</Type><Value>51692BAD76FCCBBFAA0D635F0CA3674E0F7FADBC</Value></UniqueMachineNumber>
<UniqueMachineNumber><Type>4</Type><Value></Value></UniqueMachineNumber>
</UniqueMachineNumbers>

But these XML files aren’t really used. What counts is the URL that is used to enter Xilinx site.

Without any identifying means, it looks like this (important part marked in read):

<META HTTP-EQUIV="Refresh" CONTENT="0; URL=http://license.xilinx.com/getLicense?group=esd_oms&os=lin64&version=2014&licensetype=4&ea=&ds=&di=&hn=&umn1=&umn2=&umn4=&req_hash=297B4710327A0F933FF3382961787271D94FE8CD&uuid=961710372713387A02297B4F3F78F93D924FE8CD&isserver=0&sqn=1&trustedid=1&machine_id=E83C0C895A751459C7449FF5ABFC083849233D7A&revision=DefaultOne&revisiontype=SRV&status=OK&isvirtual=0">

And a proper URL like this:

<META HTTP-EQUIV="Refresh" CONTENT="0; URL=http://license.xilinx.com/getLicense?group=esd_oms&os=lin64&version=2014&licensetype=4&ea=&ds=&di=&hn=&umn1=&umn2=51692BAD76FCCBBFAA0D635F0CA3674E0F7FADBC&umn4=&req_hash=8BD92BFBA481BFD3CA64EF6DB30133A24CA961D5&uuid=8BDCA113ABFD64EF6D392BFBA483A24CB30961D5&isserver=0&sqn=1&trustedid=1&machine_id=483F4E15B8491F0482A56C0E253B8F9D78DCD114&revision=DefaultOne&revisiontype=SRV&status=OK&isvirtual=0">

So quite evidently, the UniqueMachineNumber elements in the XML file appear as unm1, unm2 and unm4 CGI variables in the URL. They’re all empty string for the URL that caused the greyed out authentication region.

So fake a NIC

Since the laptop really doesn’t have a wired Ethernet card, let’s fake one assign it a MAC address:

# /sbin/ip tuntap add dev eth0 mode tap
# /sbin/ifconfig eth0 up
# /sbin/ip link set dev eth0 address 11:22:33:44:55:66

(pick any random MAC address, of course)

The quick and dirty way to get this running on every bootup was to add it to /etc/rc.local on my machine. The more graceful way would be to create an upstart script executing on network activation. But I’ve had enough already…

By the way, I picked eth1 on my own computer, because eth0 is used by my Ethernet-over-USB device. Works the same.

If “Connect Now” does nothing

Even though Vivado started off apparently OK, Vivado License Manager refused to open a browser window for obtaining a license on Ubuntu 14.04: I clicked the “Connect Now”, but nothing happened. Some extra packages were installed and it fixed it. Not clear if all are necessary:

# apt-get install libgnomevfs2-0 libgnome2-0
# apt-get install lib32z1 lib32ncurses5 lib32bz2-1.0

As usual, strace was used to find out that this was the problem.

Dec 2018 update: Running Vivado 2015.2 on a Mint 19 machine, I got a new error message:

/usr/lib/firefox/firefox: /path/to/Vivado/2015.2/lib/lnx64.o/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/lib/firefox/firefox)
/usr/lib/firefox/firefox: /path/to/Vivado/2015.2/lib/lnx64.o/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/lib/firefox/firefox)

Apparently, opening Firefox from within Vivado caused it to use Vivado’s C++ runtime library, which was too old for it. Simple fix:

$ cd /path/to/Vivado/2015.2/lib/lnx64.o/
$ mv libstdc++.so.6 old-libstdc++.so.6
$ ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6

Installing FLEX license manager

This is partly documented in Xilinx’ installation guide. It has to be done once before attempting to acquire a license.

First and foremost, clean up previous installation, in case you’ve been struggling with this for a while already. The license manager keeps its file in these directories. Just delete (or move to another directory) the following directories, to get a fresh start

  • /tmp/FLEXnet (empty files with UUID-like file names)
  • /usr/local/share/macrovision
  • /usr/local/share/FNP
  • /usr/local/share/applications/.com.flexnetlicensing
  • ~/.Xilinx/*.lic (in particular ~/.Xilinx/trial.lic). Not sure if this is related.

Having this done, become root (or use sudo) and run install_fnp.sh. This is what it looked like when I did this based on what was installed along with Vivado 2014.1:

# software/xilinx/Vivado/2014.1/ids_lite/ISE/bin/lin64/install_fnp.sh ./software/xilinx/Vivado/2014.1/bin/unwrapped/lnx64.o/FNPLicensingService
Installing anchor service from ./software/xilinx/Vivado/2014.1/bin/unwrapped/lnx64.o/FNPLicensingService to /usr/local/share/FNP/service64/11.11.0

Checking system for trusted storage area...
Configuring for Linux, Trusted Storage path /usr/local/share/macrovision/storage...
Creating /usr/local/share/macrovision/storage...
Setting permissions on /usr/local/share/macrovision/storage...
Permissions set...

Checking system for Replicated Anchor area...
Configuring Replicated Anchor area...
Creating Replicated Anchor area...
Setting permissions on Replicated Anchor area...
Replicated Anchor area permissions set...
Configuring Temporary area...
Temporary area already exists...
Setting permissions on Temporary area...
Temporary area permissions set...
Configuration completed successfully.

Working with latest library tool chain

As the latest version of Vivado was 2014.4 at the time, I downloaded Vivado 2014.4′s license manager tools. The rationale was that maybe the interaction with the site had changed. With hindsight, it would probably be OK to use 2014.1′s licensing tools, but this is how I eventually got it working.

I extracted the zipfile into ~/software/xilinx/licensing_tools/linux_flexlm_v11.11.0_201410/.

Then went to the lnx64.o, ran install_fnp.sh again as root and verified that there are no pending requests:

$ ./xlicclientmgr -l
ERROR: flxActCommonInit result 2   .
Exit(2) FLEXnet initialisation error.

The reason for this error was not finding the libXLicClientMgrFNP.so library, which is in the same directory (strace saved the day again).

The quick and dirty solution is to add the current directory to the library search path (this works only if it’s done in the directory the library is in):

$ export LD_LIBRARY_PATH=$(pwd)

And then prepare a request:

 ./xlicclientmgr -cr ~/Desktop/newrequest.xml
Request written to /home/eli/Desktop/newrequest.xml
Request (html) written to /home/eli/Desktop/newrequest.html

The tools indeed remember that they have a pending request:

$ ./xlicclientmgr -l
 SeqNo Status     Date       Time  Reference
     1 Pending    2015-01-19 18:45 ""

Listed 1 of 1 composite requests.

Then double-clicked newrequest.html to get a license file.

With the XML file that was emailed back:

$ ./xlicclientmgr -p ~/Desktop/Xilinx_License.xml
Response processed successfully. Actions were:
    Create         fulfillment "215469875"

    FLEXnet response dictionary:
                   COMMENT=This activation license file is generated on Tue Jan 20 16:26:40 UTC 2015
$ ./xlicclientmgr -l

No stored composite requests.

(but there was one listed before using the response).

MuseScore notes on Fedora Core 12

Random notes playing with MuseScore 0.9.6 (pun not intended):

Installation (after grabbing the RPM file from the web):

# yum install --nogpgcheck MuseScore-0.9.6-1.fc12.x86_64.rpm

Which installs the file with no signature. Somewhat dangerous in theory, as the RPM could in theory contain malicious code (as if a signature helps in this case).

The command line for kicking it off is

$ mscore &

Crashes

MuseScore may enter an infinite memory hogging loop, ending up with a system global OOM and a lot of disk activity. To keep this mishap’s impact small, allow it no more than 2 GB of virtual memory, for example. It will never need nearly as much as that, and once it gets into this “all memory is mine” loop, it gets a kick in the bottom, and that’s it. So before calling mscore, go

$ ulimit -v 2048000

and possibly check it with

$ ulimit -a

Note that this limits any program running from the same shell.

Editing notes (to self, that is)

  • Selection: In no-note-writing mode, press Shift and mark an area. It’s also possible to mark a note, and shift-click the last note to select (including from the beginning to end).
  • Beaming: That’s the name of connecting eighth and sixteenth notes with those lines. Look for “beam properties” in the palette to get separate notes, as commonly written in notes for singing.
  • In the Display menu, select Play, Mixer and Synthesizer panels, to control sound and played tempo. Note that the mixer panel remains in place when closing and opening files, but it becomes dysfunctional at best after that. Just reopen the panel after reloading or such.

Hearing something

To get some audio playing, given errors like this on startup

Alsa_driver: the interface doesn't support mmap-based access.
init ALSA audio driver failed
init ALSA driver failed
init audio driver failed
sequencer init failed

go to Edit > Preferences, pick I/O, choose ALSA Audio only, and set the Device from “default” to “hw:0″.

But ehm, there’s a problem: Musescore requests exclusive access to the sound device, so if anything else happens to be producing sound when Musescore starts, it will fail to initialize its sound interface (and therefore not play anything during that session). And if it manages to grab the soundcard, all other programs attempting to play sound get stuck. This is true even when using a TCP socket to connect to the PulseAudio server.

Portaudio doesn’t make things better. To begin with, it’s a bit confusing, as the API and device entries are empty. But just select it and click “OK” and these become populated after a restart of the program. Not graceful, but it works. Anyhow, picking the ALSA API and the hw:0,0 device (which is my sound card) gets the same result as with ALSA directly, minus I can’t control the volume with the Pulseaudio controls. But the card is still grabbed exclusively, messing up other programs.

Portaudio with OSS didn’t work either, despite running mscore with padsp. No devices appeared in the list.

Loading the OSS compatible driver (modprobe snd-pcm-oss) created a /dev/dsp file indeed, but again, the sound card was exclusively taken.

My ugly solution was to find a couple of USB speakers and plug them in. And use hw:2,0 as the ALSA target in Musescore.

The elegant solution would be to create a bogus hardware card in Pulseaudio, that routes all sound to hw:0,0. I’m sure it’s possible. I’m also sure that I’ve wasted enough time on this nonsense.

Reprogramming a Series-7 MMCM for fractional division ratios

Introduction

Xilinx’ Series-7 FPGAs (Virtex-7, Kintex-7, Atrix-7 and Zynq-7000) offer a rather flexible frequency synthesizer, (MMCE2) allowing steps of 0.125 for setting the VCO’s multiplier and one of its dividers. The MMCE can be reprogrammed through its DRP interface, so it can be used as a source of a variable clock frequency.

These are a few notes taken while implementing a reprogrammable frequency source using the MMCE2_ADV.

Resources

The main resource on this matter is Xilinx’ application note 888 (XAPP888) as well as the reference design that can be downloaded from Xilinx’ web site.

As of XAPP888 v1.3 (October 2014), there are a few typos:

  • Table 6: PHASE_MUX_F_CLKFB is on bits [13:11] and not [15:13]
  • Table 6: FRAC_WF_F_CLKFB is on bit 10 and not 12.
  • Table 7: FRAC_EN is related to CLKFBOUT, and not CLKOUT0

The reference design is written in synthesizable Verilog, but the parts that calculate the assignments to the DRP registers are written as Verilog functions, so they can’t be used as is for an arbitrary frequency clock generator. To make things even trickier, the coding style employed in this reference looks like a quiz in deciphering obscured code (or just a Verilog parody).

As a result, it’s somewhat difficult to obtain functional logic (or possibly a computer program) for setting the registers correctly for any allowed combination of parameters. The notes below may assist in getting things straight.

A sample set of DRP registers

For reference, an MMCE was implemented on a Kintex-7 device, after which the entire DRP space was read.

The instantiation of this MMCE was

MMCME2_ADV
 #(.BANDWIDTH          ("OPTIMIZED"),
 .CLKOUT4_CASCADE      ("FALSE"),
 .COMPENSATION         ("ZHOLD"),
 .STARTUP_WAIT         ("FALSE"),
 .DIVCLK_DIVIDE        (1),
 .CLKFBOUT_MULT_F      (5.125),
 .CLKFBOUT_PHASE       (0.000),
 .CLKFBOUT_USE_FINE_PS ("FALSE"),
 .CLKOUT0_DIVIDE_F     (40.250),
 .CLKOUT0_PHASE        (0.000),
 .CLKOUT0_DUTY_CYCLE   (0.500),
 .CLKOUT0_USE_FINE_PS  ("FALSE"),
 .CLKIN1_PERIOD        (5.0),
 .REF_JITTER1          (0.010)

The register map:

 00:  a600 0082 0003 0000 0127 9814 0041 0c40
 08:  14d3 2c00 0041 0040 0041 0040 0041 0040
 10:  0041 0040 0041 2440 1081 1880 1041 1041
 18:  03e8 3801 bbe9 0000 0000 0210 0000 01e9
 20:  0000 0000 0000 0000 0000 0000 0000 0000
 28:  9900 0000 0000 0000 0000 0000 0000 0000
 30:  0000 0000 0000 0000 0000 0000 0000 0000
 38:  0000 0000 0000 0000 0000 0000 0000 0000
 40:  0000 0000 8080 0000 0000 0800 0001 0000
 48:  0000 7800 01e9 0000 0000 0000 9108 1900

(Note to self: Use “predesign” git bundle, checkout e.g. ’138358c’, run build TCL script on Vivado 2014.1 and then on PC compile and run ./dump_drp_regs)

Fractional divider register settings

Two dividers in each MMCE2 allow a fractional division ratio: The feedback divider (CLKFBOUT_MULT_F, effectively the clock multiplier) and the output divider for one clocks (CLKOUT0_DIVIDE_F).

The reference design assign correct values in the relevant registers, but is exceptionally difficult to decipher.

The algorithm for calculating the register’s value is the same for CLKFBOUT_MULT_F and CLKOUT0_DIVIDE_F. The values obtained for all registers, except high_time and low_time, depend only on (8x mod 16), where x is either CLKFBOUT_MULT_F or CLKOUT0_DIVIDE_F, given as the actual division ratio.

The values of the registers as set by Vivado are given for the division ratio going from 4.000 to 5.875, in steps of 0.125. (high_time and low_time shown below may appear not to agree with this, but these are the actual numbers).

frac_en high_time low_time edge frac phase_mux_f frac_wf_r frac_wf_f
0 2 2 0 0 0 0 0
1 1 1 0 1 0 1 0
1 1 1 0 2 1 1 1
1 1 1 0 3 1 1 1
1 1 1 0 4 2 1 1
1 1 1 0 5 2 1 1
1 1 1 0 6 3 1 1
1 1 1 0 7 3 1 1
0 2 3 1 0 0 0 0
1 2 1 1 1 4 0 1
1 2 2 1 2 5 0 0
1 2 2 1 3 5 0 0
1 2 2 1 4 6 0 0
1 2 2 1 5 6 0 0
1 2 2 1 6 7 0 0
1 2 2 1 7 7 0 0

Loop filter and lock parameters

Depending on the feedback divider’s integer value (MULT_F in the table below), several registers, which are related to the lock detection and the loop filter, are set with values taken from a lookup table in the reference design. Comparing the values assigned by Vivado 2014.1 (again, by reading back the DRP registers) with those in the reference design for a selected set of MULT_Fs reveals a match as far as the lock detection registers are concerned. However the registers related to the digital loop filter were set to completely different values by Vivado. As there is no documentation available on these registers, it’s not clear what impact this difference has, if at all.

The following table shows the values assigned by Vivado 2014.1 for a set of MULT_F’s. The rightmost columns show the bits of of the loop filter bits, in the same order that they appear in the reference design (MSB to LSB, left to right). All other columns are given in plain decimal notation.

MULT_F LockRefDly LockFBDly LockCnt LockSatHigh UnlockCnt x x x x x x x x x x
4 11 11 1000 1001 1 0 1 1 1 0 1 1 1 0 0
8 22 22 1000 1001 1 1 1 1 1 0 0 1 1 0 0
12 31 31 825 1001 1 1 1 0 1 0 0 0 1 0 0
16 31 31 625 1001 1 1 1 1 1 1 0 0 1 0 0
20 31 31 500 1001 1 1 1 0 0 0 0 0 1 0 0
24 31 31 400 1001 1 0 1 0 1 1 1 0 0 0 0
28 31 31 350 1001 1 0 0 1 1 0 1 0 0 0 0
32 31 31 300 1001 1 0 0 1 1 0 1 0 0 0 0

Sporadic tests with setting these registers as if the MULT_F was completely different (e.g. as if MULT_F=64 for much lower actual setting) reveals that nothing apparent happens — no loss of locks, and no apparent difference in jitter performance (not measured though). Also, the VCO of the tested FPGA (with speed grade 2) remained firmly locked at frequencies going as low as 20 MHz (using non-fractional ratios) and as high as 3000 MHz (even though the datasheet ensures 600-1440 MHz only). This was run for several minutes on each frequency with a junction temperature of 56°C.

All in all, there’s an uncertainty regarding the loop filter parameters, but there’s reason to hope that this has no practical significance.

Linux/Gnome: When selection of text goes inactive as soon as the mouse button is released

… go though any gitk window on the desktop, and click on it, to release it from some unexpected GUI state.

Just wanted that written down for the next time I try to select a segment in XEmacs or the gnome-terminal window, and the selection goes away as I release the mouse button.

Debian package notes (when apt and Automatic Updater in Ubuntu isn’t good enough)

Just a few jots on handling packages in Ubuntu. This post is a true mess.

Pinning

The bottom line seems to be not to use the Software Updater, but instead go

# apt-get upgrade

How to prevent certain packages from being updated, based upon this Pinning Howto page and the Apt Preferences page which cover the internals as well.

There also the manpage:

$ man apt_preferences

Repositories

The repositories known by apt-get are listed in /etc/apt/sources.list.d/ and /etc/apt/sources.list. For example, adding a repository:

# add-apt-repository ppa:vovoid/vsxu-release

Removing a repository e.g.

# add-apt-repository --remove ppa:vovoid/vsxu-release

and always do

# apt-get update

after changing the repository set. Now, you might get something like

E: The repository 'http://ppa.launchpad.net/...' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default

If you want to insist on using such repository (at your own risk), go

# apt update --allow-insecure-repositories

and may the force be with you.

Among others, adding the repository above creates /etc/apt/sources.list.d/vovoid-vsxu-release-trusty.list saying

deb http://ppa.launchpad.net/vovoid/vsxu-release/ubuntu trusty main
# deb-src http://ppa.launchpad.net/vovoid/vsxu-release/ubuntu trusty main

“trusty” refers to Ubuntu 14.04, of course.

Look at this page for more info.

Checking what apt-get would install

# apt-get -s upgrade | less

The packages related to the Linux kernel: linux-generic linux-headers-generic linux-image-generic

It’s worth looking here on  this regrading what packages “kept back means” (but the bottom line is that these packages won’t be installed).

Being open to suggestions

Kodi, for example, has a lot of “side packages” that are good to install along. This is how to tell apt-get to grab them as well:

# apt-get install --install-suggests kodi

Pinning with dpkg

This doesn’t work with apt-get nor Automatic Updater, following this and this web pages:

List all packages

$ dpkg -l

Wildcards can be used to find specific packages. For example, those related to the current kernel:

$ dpkg -l "*$(uname -r)*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                          Version                     Architecture                Description
+++-=============================================-===========================-===========================-===============================================================================================
ii  linux-headers-3.13.0-35-generic               3.13.0-35.62                amd64                       Linux kernel headers for version 3.13.0 on 64 bit x86 SMP
ii  linux-image-3.13.0-35-generic                 3.13.0-35.62                amd64                       Linux kernel image for version 3.13.0 on 64 bit x86 SMP
ii  linux-image-extra-3.13.0-35-generic           3.13.0-35.62                amd64                       Linux kernel extra modules for version 3.13.0 on 64 bit x86 SMP

Or, to get just the package names:

$ dpkg -l | awk '{ print $2; }' | grep "$(uname -r)"

Pinning a package

Aug 2019 update: Maybe with apt-mark? Haven’t tried that yet.

In order to prevent a certain package from being updated, use the “hold” setting for the package. For example, holding the kernel related package automatically (all three packages) as root:

# dpkg -l | awk '{ print $2; }' | grep "$(uname -r)" | while read i ; do echo $i hold ; done | dpkg --set-selections

After this, the listing of these packages is:

$ dpkg -l "*$(uname -r)*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                          Version                     Architecture                Description
+++-=============================================-===========================-===========================-===============================================================================================
hi  linux-headers-3.13.0-35-generic               3.13.0-35.62                amd64                       Linux kernel headers for version 3.13.0 on 64 bit x86 SMP
hi  linux-image-3.13.0-35-generic                 3.13.0-35.62                amd64                       Linux kernel image for version 3.13.0 on 64 bit x86 SMP
hi  linux-image-extra-3.13.0-35-generic           3.13.0-35.62                amd64                       Linux kernel extra modules for version 3.13.0 on 64 bit x86 SMP

Indeed, the “h” notes that the packages are held. To revert this, use “install” instead of “hold” in the input to dpkg –set-selections above.

Which package provides file X?

Following this page, install apt-file (simply with apt-get install apt-file), go “apt-file update” once and then go something like (not necessarily as root):

$ apt-file find libgnome-2.so

Note that the pattern can be a substring (as in the example above).

What files does package X generate?

$ dpkg -L libpulse-dev

Installing a deb file locally

# dpkg -i thepackage.deb

If there are failed dependencies, fix them with apt-get subsequently:

# apt-get -f install

and if it says that it wants to remove the package you tried to install, go

# apt-get install -f --fix-missing

That will probably not help directly, but odds are apt-get will at least explain why it wants to kick out the package.

To make apt ignore a failed post-installation script, consider this post.

Extracting the files from a repository

This can be used for running more than one version of Google Chrome on a computer. See this post for a few more words on this.

Extract the .deb package:

$ ar x google-chrome-stable_current_amd64.deb

Note that the files go into the current directory (yuck).

Extract the package’s files:

$ mkdir files
$ cd files
$ tar -xJf ../data.tar.xz

Extract the installation scripts:

$ mkdir scripts
$ cd scripts/
$ tar -xJf ../control.tar.xz

A word on repositories

Say that we have a line like this in /etc/apt/sources.list:

deb http://archive.ubuntu.com/ubuntu xenial main universe updates restricted security backports

It tells apt-update to go to http://archive.ubuntu.com/ubuntu/dists/ and look into xenial/main for the “main” part, xenial/universe for the “universe” part but e.g. xenial-updates/ for the “updates”. This site help with a better understanding on how a sources.list file is set up.

If we look at e.g. ubuntu/dists/xenial/main/, there’s a binary-amd64/ subdirectory for the amd64 platforms (64-bit Intel/AMD). That’s where the Packages.gz and Packages.xz files are found. These list the packages available in the repositories, but even more important: where to find them.

For example, the entry for the “adduser” package looks like this:

Package: adduser
Priority: required
Section: admin
Installed-Size: 648
Maintainer: Ubuntu Core Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Debian Adduser Developers <adduser-devel@lists.alioth.debian.org>
Architecture: all
Version: 3.113+nmu3ubuntu4
Replaces: manpages-it (<< 0.3.4-2), manpages-pl (<= 20051117-1)
Depends: perl-base (>= 5.6.0), passwd (>= 1:4.1.5.1-1.1ubuntu6), debconf | debconf-2.0
Suggests: liblocale-gettext-perl, perl-modules, ecryptfs-utils (>= 67-1)
Filename: pool/main/a/adduser/adduser_3.113+nmu3ubuntu4_all.deb
Size: 161698
MD5sum: 36f79d952ced9bde3359b63cf9cf44fb
SHA1: 6a5b8f58e33d5c9a25f79c6da80a64bf104e6268
SHA256: ca6c86cb229082cc22874ed320eac8d128cc91f086fe5687946e7d05758516a3
Description: add and remove users and groups
Multi-Arch: foreign
Homepage: http://alioth.debian.org/projects/adduser/
Description-md5: 7965b5cd83972a254552a570bcd32c93
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu
Supported: 5y
Task: minimal

As is evident, this entry contains dependency information, but most important: It points at where the package can be downloaded from: pool/main/a/adduser/adduser_3.113+nmu3ubuntu4_all.deb in this case, which actually points to http://archive.ubuntu.com/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu4_all.deb.

Note that the URL’s base is repository’s root, and not necessarily the root of the domain. Since the Package file contains the SHA1 sum of the .deb file, its own SHA1 sum is listed in http://archive.ubuntu.com/ubuntu/dists/xenial/InRelease, which also contains a PGP signature.

The various “Contents” files (e.g. dists/xenial/Contents-amd64.gz) seem to contain a list of files, and the packages they belong to. Probably for use of apt-file.

Vivado: Random notes about the XDC constraints file

These are a few jots about constraining in Vivado. With no particular common topic, and in no particular order. Note that I have another post on similar topics.

Setting the default IOSTANDARD for all ports

In a design where all ports have the same IOSTANDARD, it’s daunting to set them for all. So if there’s just one exception, one can go

set_property IOSTANDARD LVCMOS33 [get_ports -filter { LOC =~ IOB_* } ]
set_property IOSTANDARD LVDS_25 [get_ports clk_100_p]

It’s important to do this after placement constraints of the ports, because the LOC property is set only in conjunction with setting package_pin. Filtering based upon LOC is required to avoid inclusion of MGT ports in the get_ports command, and in turn fail the set_property command altogether (yielding not only a critical warning, but none of the IOSTANDARDs will be set).

Tell the truth about your heat sink

Vivado makes an estimation of the junction temperature, based upon its power estimation. That’s the figure that you want to keep below 85°C (if you’re using a commercial temperature version of the FPGA).

With all my reservations on the accuracy of the power estimation, and hence the temperature it calculates based upon in, it makes sense to tell Vivado about the chosen cooling solution. Otherwise, it assumes a heatsink with a healthy fan above it. So if you like to live on the edge, like me, and work without a noisy fan, these two lines in the XDC file tell Vivado to adjust the effective Junction-to-Air thermal resistance (Θ_JA).

set_operating_conditions -airflow 0
set_operating_conditions -heatsink low

It’s also possible to set Θ_JA explicitly with -thetaja. Try “help set_operating_conditions” at the Tcl prompt for a list of options.

Frankly speaking, the predicted junction temperature stated on the power report is probably rubbish anyhow, even if the power estimation is accurate. The reason is that there’s a low thermal resistance towards the board: If the board remains on 25°C, the junction temperature will be lower than predicted. On the other hand, if the board heats up from adjacent components, a higher temperature will be measured. In a way, the FPGA will serve as a cooling path from the board to air. With extra power flowing through this path, the temperature rises all along it.

For example, the temperature I got with the setting above on a KC705, with the fan taken off, was significantly higher (~60°C) than Vivado’s prediction (44°C) on a design that had little uncertainty (90% of the estimated power was covered by static power and GTXs with a fixed rate — there was almost no logic in the design). The junction temperature was measured through JTAG from the Hardware Manager.

So the only thing that really counts is a junction temperature after 15 minutes or so.

Search patterns for finding elements

Pattern-matching in Vivado is slash-sensitive. e.g.

foreach x  [get_pins "*/*/a*"] { puts $x }

prints elements pins three hierarchies down beginning with “a”, but “a*” matches only pins on the top level.

The “foreach” is given here to demonstrate loops. It’s actually easier to go

join [get_pins "*/*/a*"] "\n"

To make “*” match any character “/” included, it’s possible, yet not such a good idea, to use UCF-style, e.g.

foreach x [get_pins -match_style ucf "*/fifo*"] { puts $x }

or a more relevant example

get_pins -match_style ucf */PS7_i/FCLKCLK[1]

The better way is to forget about the old UCF file format. The Vivado way to allow “*” to match any character, including a slash, is filters:

set_property LOC GTXE2_CHANNEL_X0Y8 [get_cells -hier -filter {name=~*/gt0_mygtx_i/gtxe2_i}]

Another important concept in Vivado is the “-of” flag which allows to find all nets connected to a cell, all cells connected to a net etc.

For example,

get_nets -of [get_cells -hier -filter {name=~*/gt_top_i/phy_rdy_n_int_reg}]

Group clocks instead of a lot of false paths

Unlike ISE, Vivado assumes that all clocks are “related” — if two clocks come from sources, which the tools have no reason to assume a common source for, ISE will consider all paths between the clock domains as false paths. Vivado, on the other hand, will assume that these paths are real, and probably end up with an extreme constraint, take ages in the attempt to meet timing, and then fail the timing, of course.

Even in reference designs, this is handled by issuing false paths between each pair of unrelated clocks (two false path statements for each pair, usually). This is messy, often with complicated expressions appearing twice. And a lot of issues every time a new clock is introduced.

The clean way is to group the clocks. Each group contains all clocks that are considered related. Paths inside a group are constrained. Paths between groups are false. Simple and intuitive.

set_clock_groups -asynchronous \
  -group [list \
     [get_clocks -include_generated_clocks -of_objects [get_pins -hier -filter {name=~*gt0_mygtx_i*gtxe2_i*TXOUTCLK}]] \
     [get_clocks -include_generated_clocks "gt0_txusrclk_i"]] \
  -group [get_clocks -include_generated_clocks "drpclk_in_i"] \
  -group [list \
     [get_clocks -include_generated_clocks "sys_clk"] \
     [get_clocks -include_generated_clocks -of_objects [get_pins -hier -filter {name=~*/pipe_clock/pipe_clock/mmcm_i/*}]]]

In the example above, three clock groups are declared.

As a group often consists of several clocks, each requiring a tangled expression to pin down, it may come handy to define a list of clocks, with the “list” TCL statement, as shown above.

Another thing to note is that clocks can be obtained as all clocks connected to a certain MMCM or PLL, as shown above, with -of_objects. To keep the definitions short, it’s possible to use create_generated_clock to name clocks that can be found in certain parts of the design (create_clock is applied to external pins only).

If a clock is accidentally not included in this statement, don’t worry: Vivado will assume valid paths for all clock domain crossings involving it, and it will probably take a place of honor in the timing report.

Finally, it’s often desired to tell Vivado to consider clocks that are created by an MCMM / PLL as independent. If a Clock Wizard IP was used, it boils down to something as simple as this:

set_clock_groups -asynchronous \
 -group [get_clocks -include_generated_clocks -of_objects [get_pins -hier -filter {name=~*clk_gen_ins/clk_in1}]] \
 -group [get_clocks -include_generated_clocks -of_objects [get_pins -hier -filter {name=~*clk_gen_ins/clk_out1}]]

which simply says “the input and output clocks of the clock module are independent”. This can be expanded to more outputs, of course.

Telling the tools what the BUFGCE/BUFGMUX is set to

Suppose a segment like this:

BUFGCE clkout1_buf
 (.O   (slow_clk),
 .CE  (seq_reg1[7]),
 .I   (clkout1));

To tell the tools that the timing analysis should be made with the assumption that BUFGCE is enabled,

set_case_analysis 1 [get_pins -hier -filter {name=~*/clkout1_buf/CE0}]
set_case_analysis 1 [get_pins -hier -filter {name=~*/clkout1_buf/S0}]

The truth is that it’s redundant in this case, as the tools assume that CE=1. But this is the syntax anyhow.

Constant clock? Who? Where? Why?

One of the things to verify before being happy with a design’s timing (a.k.a. “signing off timing”), according to UG906 (Design Analysis and Closure Techniques) is that there are no constant clocks nor unconstrained internal endpoints. But hey, what if there are? Like, when running “Report Timing Summary”, under “Check Timing” the number for “constant clock” is far from zero. And the timing summary says this:

2. checking constant clock
--------------------------
 There are 2574 register/latch pins with constant_clock. (MEDIUM)

3. checking pulse_width_clock
-----------------------------
 There are 0 register/latch pins which need pulse_width check

4. checking unconstrained_internal_endpoints
--------------------------------------------
 There are 0 pins that are not constrained for maximum delay.

 There are 5824 pins that are not constrained for maximum delay due to constant clock. (MEDIUM)

Ehm. So which clock caused this, and what are the endpoints involved? It’s actually simple to get that information. Just go

check_timing -verbose -file my_timing_report.txt

on the Tcl prompt, and read the file. The registers and endpoints are listed in the output file.

Floorplanning (placement constraints for logic)

The name of the game is Pblocks. Didn’t dive much into the semantics, but used the GUI’s Tools > Floorplanning menus to create a Pblock and auto-place it. Then saved the constraints, and manipulated the Tcl commands manually (i.e. the get_cells command and the choice of slices).

create_pblock pblock_registers_ins
add_cells_to_pblock [get_pblocks pblock_registers_ins] [get_cells -quiet -hierarchical -filter { NAME =~  "registers_ins/*" && PRIMITIVE_TYPE =~ FLOP_LATCH.*.* && NAME !~  "registers_ins/fifo_*" }]
resize_pblock [get_pblocks pblock_registers_ins] -add {SLICE_X0Y0:SLICE_X151Y99}

The snippet above places all flip-flops (that is, registers) of a certain module, except for those belonging to a couple of submodules (excluded by the second NAME filter) to the bottom area of a 7V330T. The constraint is non-exclusive (other logic is allowed in the region as well).

The desired slice region was found by hovering with the mouse over a zoomed in chip view of an implemented design.

The tools obeyed this constraint strictly, even with post-route optimization, so it’s important not to shoot yourself in the foot when using this for timing improvement (in my case it worked).

To see how the logic is spread out, use the “highlight leaf cells” option when right-clicking a hierarchy in the netlist view to the left of a chip view. Or even better, use Tcl commands on the console:

unhighlight_objects
highlight_objects -color red [get_cells -hierarchical -filter { NAME =~  "registers_ins/*" && PRIMITIVE_TYPE =~ FLOP_LATCH.*.* && NAME !~  "registers_ins/fifo_*" }]

The first command removes existing highlight. There’s an -rgb flag too for selecting the exact color.

There’s also show_objects -name mysearch [ get_cells ... ] which is how the GUI’s “find” operation creates those lists in GUI to inspect elements.

System File Checker: The savior for Windows 7 and 8

When a Windows 7 or Windows 8 starts to behave weirdly, this is the general-purpose command that can save your day (in the Command Prompt):

sfc /scannow

It scans all system files and fixes whatever looks bad. In my case, it started off as a “Limited” Wireless connection on a laptop (after it had been fine for a year), which turned out to be the lack of a DHCP request, and ended up with the understanding the the DHCP request service can’t be started because some “element” was missing. Now go fix that manually.

The scan took some 30 minutes, but after the reboot, all was fine again.

For more of my war stories, click here.

iowrite32(), writel() and memory barriers taken apart

Introduction

Needing to remove superfluous memory barriers from a Linux kernel device driver, I wondered what they actually do. The issue is discussed down to painful detail in Documentation/memory-barriers.txt, but somehow it’s quite difficult to figure out if they’re really needed and where. Most drivers rely on subsequent iowrite32′s (or writel’s) to arrive to the hardware in the same order they appear in the code, and this is backed up the following clause in memory-barriers.txt:

Inside of the Linux kernel, I/O should be done through the appropriate accessor routines – such as inb() or writel() – which know how to make such accesses appropriately sequential. Whilst this, for the most part, renders the explicit use of memory barriers unnecessary, there are a couple of situations where they might be needed:

  1. On some systems, I/O stores are not strongly ordered across all CPUs, and so for _all_ general drivers locks should be used and miowb() must be issued prior to unlocking the critical section.
  2. If the accessor functions are used to refer to an I/O memory window with relaxed memory access properties, then _mandatory_ memory barriers are required to enforce ordering.

See Documentation/DocBook/deviceiobook.tmpl for more information.

So what they’re saying is that a memory barrier should be used before releasing a lock (spinlock? mutex? both? The examples show only a spinlock) and when prefetching is allowed by hardware.

Nice. Are they doing anything?

April 2020 update: I’ve written a new post on a similar topic. Also, on top of memory-barriers.txt mentioned above, there are some excellent explanations in the kernel tree’s tools/memory-model/Documentation/explanation.txt and tools/memory-model/Documentation/recipes.txt. There are relatively new (from v4.17, beginning of 2018).

May 2021 update: I’ve also written the parallel post for Windows device driver coding, which occasionally brings up Linux.

The practical take

Since I care most about x86 and ARM, I decided to figure out what the memory barriers actually do. The driver’s code should be formally correct, but in the end, if I remove a memory barrier and then test the driver — have I really made a difference? Have I really tested anything?

Ah, and in case you wonder why I didn’t check ioread32() and readl(): I don’t use them in my driver. Odd as it may sound.

The kernel sources in this post are ~3.12 but how often does anyone dare touching those basic functions?

Spoiler

For the lazy ones, here are my conclusions:

  • On x86 platforms, iowrite32() and writel() are translated to just a “mov” into memory.
  • On ARM, the same functions translate into a full write synchronization barrier (stop execution until all previous writes are done), and then an “str” into memory.
  • On x86, the following functions translate into nothing: mmiowb(), smp_wmb() and smp_rmb(). wmb() and rmb() translate into “sfence” and “lfence” respectively.
  • On ARM, mmiowb() translates into nothing. The other barriers translate into sensible opcodes.

Trying memory barriers with iowrite32()

I wrote the following kernel module as minimodule.c. Obviously, it won’t do anything good except for being disassembled after compilation.

#include <linux/module.h>
#include <linux/slab.h>
#include <linux/io.h>

void try_iowrite32(void) {
  void __iomem *p = (void *) 0x12345678;

  iowrite32(0xabcd0001, p);
  iowrite32(0xabcd0001, p);
  iowrite32(0xabcd0002, p);
  mmiowb();
  iowrite32(0xabcd0003, p);
  wmb();
  iowrite32(0xabcd0004, p);
  rmb();
  iowrite32(0xabcd0005, p);
  smp_wmb();
  iowrite32(0xabcd0006, p);
  smp_rmb();
}

EXPORT_SYMBOL(try_iowrite32);

The idea: First repeat exactly the same write to see how that’s handled, and then add barriers to see what they turn into.

The related sources for iowrite32() on x86

I have to admit that I was surprised to find out that iowrite32() is a function in itself, as is shown later in the disassembly. My best understanding was that it’s just an alias for writel(), by virtue of a define statement. But since CONFIG_GENERIC_IOMAP is defined on my kernel, it’s not defined in include/asm-generic/io.h, but there’s just a header for it in include/asm-generic/iomap.h. It’s defined as a function in lib/iomap.c as follows:

void iowrite32(u32 val, void __iomem *addr)
{
	IO_COND(addr, outl(val,port), writel(val, addr));
}

where IO_COND is previously defined in the same file as follows (the comment is in the sources):

/*
 * Ugly macros are a way of life.
 */
#define IO_COND(addr, is_pio, is_mmio) do {			\
	unsigned long port = (unsigned long __force)addr;	\
	if (port >= PIO_RESERVED) {				\
		is_mmio;					\
	} else if (port > PIO_OFFSET) {				\
		port &= PIO_MASK;				\
		is_pio;						\
	} else							\
		bad_io_access(port, #is_pio );			\
} while (0)

So there we have it. iowrite32() isn’t just an alias for writel(), but it checks the address and interprets it as port I/O if that makes sense.

To be sure, iowrite32() was disassembled as follows from the kernel’s object code (32-bit version):

0020f79f <iowrite32>:
  20f79f:       81 fa ff ff 03 00       cmp    $0x3ffff,%edx
  20f7a5:       89 d1                   mov    %edx,%ecx
  20f7a7:       76 03                   jbe    20f7ac <iowrite32+0xd>
  20f7a9:       89 02                   mov    %eax,(%edx)
  20f7ab:       c3                      ret
  20f7ac:       81 fa 00 00 01 00       cmp    $0x10000,%edx
  20f7b2:       76 08                   jbe    20f7bc <iowrite32+0x1d>
  20f7b4:       81 e2 ff ff 00 00       and    $0xffff,%edx
  20f7ba:       ef                      out    %eax,(%dx)
  20f7bb:       c3                      ret
  20f7bc:       ba f2 56 03 00          mov    $0x356f2,%edx
  20f7c1:       89 c8                   mov    %ecx,%eax
  20f7c3:       e9 41 fe ff ff          jmp    20f609 <bad_io_access>

Results on x86_64

Compiled on Intel x86/64 bit:

$ objdump -d minimodule.ko

minimodule.ko:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <try_iowrite32>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	e8 00 00 00 00       	callq  9 <try_iowrite32+0x9>
   9:	be 78 56 34 12       	mov    $0x12345678,%esi
   e:	bf 01 00 cd ab       	mov    $0xabcd0001,%edi
  13:	e8 00 00 00 00       	callq  18 <try_iowrite32+0x18>
  18:	be 78 56 34 12       	mov    $0x12345678,%esi
  1d:	bf 01 00 cd ab       	mov    $0xabcd0001,%edi
  22:	e8 00 00 00 00       	callq  27 <try_iowrite32+0x27>
  27:	be 78 56 34 12       	mov    $0x12345678,%esi
  2c:	bf 02 00 cd ab       	mov    $0xabcd0002,%edi
  31:	e8 00 00 00 00       	callq  36 <try_iowrite32+0x36>
  36:	be 78 56 34 12       	mov    $0x12345678,%esi
  3b:	bf 03 00 cd ab       	mov    $0xabcd0003,%edi
  40:	e8 00 00 00 00       	callq  45 <try_iowrite32+0x45>
  45:	0f ae f8             	sfence
  48:	be 78 56 34 12       	mov    $0x12345678,%esi
  4d:	bf 04 00 cd ab       	mov    $0xabcd0004,%edi
  52:	e8 00 00 00 00       	callq  57 <try_iowrite32+0x57>
  57:	0f ae e8             	lfence
  5a:	be 78 56 34 12       	mov    $0x12345678,%esi
  5f:	bf 05 00 cd ab       	mov    $0xabcd0005,%edi
  64:	e8 00 00 00 00       	callq  69 <try_iowrite32+0x69>
  69:	be 78 56 34 12       	mov    $0x12345678,%esi
  6e:	bf 06 00 cd ab       	mov    $0xabcd0006,%edi
  73:	e8 00 00 00 00       	callq  78 <try_iowrite32+0x78>
  78:	c9                   	leaveq
  79:	c3                   	retq
	...

Those “callq” statements are modified upon linking. To resolve what these are calling, go

$ readelf -r minimodule.ko

Relocation section '.rela.text' at offset 0xa9b0 contains 8 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000005  002300000002 R_X86_64_PC32     0000000000000000 mcount - 4
000000000014  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000023  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000032  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000041  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000053  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000065  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4
000000000074  002000000002 R_X86_64_PC32     0000000000000000 iowrite32 - 4

(the output continues with relocation information for debug variables).

It’s quite easy to work this out: The “Offset” column tells us the offset in the object code. For example, a callq statement begins at 0x13, but the address to call starts at 0x14. The second entry in the relocation section points at offset 0x14, and says that the target is iowrite32().

So from this output we learn that all callq’s are to iowrite32(), except the first one, which goes to mcount() (which is intended for kernel call tracing).

Now to conclusions: There are no memory barriers in the code, except those generated by wmb() and rmb(), which added sfence and lfence respectively. sfence is defined as

Performs a serializing operation on all store instructions that were issued prior the SFENCE instruction. This serializing operation guarantees that every store instruction that precedes in program order the SFENCE instruction is globally visible before any store instruction that follows the SFENCE instruction is globally visible. The SFENCE instruction is ordered with respect store instructions, other SFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to load instructions or the LFENCE instruction.

and lfence as

Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction. This serializing operation guarantees that every load instruction that precedes in program order the LFENCE instruction is globally visible before any load instruction that follows the LFENCE instruction is globally visible. The LFENCE instruction is ordered with respect to load instructions, other LFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to store instructions or the SFENCE instruction.

One can feel the Intel-headache just reading this.

Results on x86 (32 bit)

Compiling this against a 32-bit kernel, with a slightly different configuration:

$ objdump -d minimodule.ko

minimodule.ko:     file format elf32-i386

Disassembly of section .text:

00000000 <try_iowrite32>:
   0:	ba 78 56 34 12       	mov    $0x12345678,%edx
   5:	b8 01 00 cd ab       	mov    $0xabcd0001,%eax
   a:	e8 fc ff ff ff       	call   b <try_iowrite32+0xb>
   f:	ba 78 56 34 12       	mov    $0x12345678,%edx
  14:	b8 01 00 cd ab       	mov    $0xabcd0001,%eax
  19:	e8 fc ff ff ff       	call   1a <try_iowrite32+0x1a>
  1e:	ba 78 56 34 12       	mov    $0x12345678,%edx
  23:	b8 02 00 cd ab       	mov    $0xabcd0002,%eax
  28:	e8 fc ff ff ff       	call   29 <try_iowrite32+0x29>
  2d:	ba 78 56 34 12       	mov    $0x12345678,%edx
  32:	b8 03 00 cd ab       	mov    $0xabcd0003,%eax
  37:	e8 fc ff ff ff       	call   38 <try_iowrite32+0x38>
  3c:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  41:	ba 78 56 34 12       	mov    $0x12345678,%edx
  46:	b8 04 00 cd ab       	mov    $0xabcd0004,%eax
  4b:	e8 fc ff ff ff       	call   4c <try_iowrite32+0x4c>
  50:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  55:	ba 78 56 34 12       	mov    $0x12345678,%edx
  5a:	b8 05 00 cd ab       	mov    $0xabcd0005,%eax
  5f:	e8 fc ff ff ff       	call   60 <try_iowrite32+0x60>
  64:	ba 78 56 34 12       	mov    $0x12345678,%edx
  69:	b8 06 00 cd ab       	mov    $0xabcd0006,%eax
  6e:	e8 fc ff ff ff       	call   6f <try_iowrite32+0x6f>
  73:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  78:	c3                   	ret
  79:	00 00                	add    %al,(%eax)
	...

Disassembly of section .altinstr_replacement:

00000000 <.altinstr_replacement>:
   0:	0f ae f8             	sfence
   3:	0f ae e8             	lfence
   6:	0f ae e8             	lfence

$ readelf -r minimodule.ko

Relocation section '.rel.text' at offset 0xc3e0 contains 7 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0000000b  00002402 R_386_PC32        00000000   iowrite32
0000001a  00002402 R_386_PC32        00000000   iowrite32
00000029  00002402 R_386_PC32        00000000   iowrite32
00000038  00002402 R_386_PC32        00000000   iowrite32
0000004c  00002402 R_386_PC32        00000000   iowrite32
00000060  00002402 R_386_PC32        00000000   iowrite32
0000006f  00002402 R_386_PC32        00000000   iowrite32

So it’s in essence the same, only the mcount() call in the beginning was skipped.

The related sources for iowrite32() on ARM

These are the key excerpts from arch/arm/include/asm/io.h:

static inline void __raw_writel(u32 val, volatile void __iomem *addr)
{
	asm volatile("str %1, %0"
		     : "+Qo" (*(volatile u32 __force *)addr)
		     : "r" (val));
}
...
#define writel_relaxed(v,c)	__raw_writel((__force u32) cpu_to_le32(v),c)
...
#define writel(v,c)		({ __iowmb(); writel_relaxed(v,c); })
...
#define iowrite32(v,p)	({ __iowmb(); __raw_writel((__force __u32)cpu_to_le32(v), p); })

As for __iowmb(), it goes

/* IO barriers */
#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
#include <asm/barrier.h>
#define __iormb()		rmb()
#define __iowmb()		wmb()
#else
#define __iormb()		do { } while (0)
#define __iowmb()		do { } while (0)
#endif

so it’s down to the configuration if __iowmb() does something. And to get the full picture, these are snips from arch/arm/include/asm/barrier.h:

#if __LINUX_ARM_ARCH__ >= 7
#define isb(option) __asm__ __volatile__ ("isb " #option : : : "memory")
#define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory")
#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
...
#ifdef CONFIG_ARCH_HAS_BARRIERS
#include <mach/barriers.h>
#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb()		do { dsb(); outer_sync(); } while (0)
#define rmb()		dsb()
#define wmb()		do { dsb(st); outer_sync(); } while (0)
#else
#define mb()		barrier()
#define rmb()		barrier()
#define wmb()		barrier()
#endif

Results on ARM

This is what the same module compiled for ARM Cortex A9, Little Endian gives (I’ve added extra newlines in the middle for clarity):

minimodule.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <try_iowrite32>:
   0:	e92d4038 	push	{r3, r4, r5, lr}

   4:	f57ff04e 	dsb	st
   8:	e59f2118 	ldr	r2, [pc, #280]	; 128 <try_iowrite32+0x128>
   c:	e1a04002 	mov	r4, r2
  10:	e5923018 	ldr	r3, [r2, #24]
  14:	e3530000 	cmp	r3, #0
  18:	0a000000 	beq	20 <try_iowrite32+0x20>
  1c:	e12fff33 	blx	r3
  20:	e59f3104 	ldr	r3, [pc, #260]	; 12c <try_iowrite32+0x12c>
  24:	e59f1104 	ldr	r1, [pc, #260]	; 130 <try_iowrite32+0x130>
  28:	e5831678 	str	r1, [r3, #1656]	; 0x678

  2c:	f57ff04e 	dsb	st
  30:	e5942018 	ldr	r2, [r4, #24]
  34:	e1a05001 	mov	r5, r1
  38:	e1a04003 	mov	r4, r3
  3c:	e3520000 	cmp	r2, #0
  40:	0a000000 	beq	48 <try_iowrite32+0x48>
  44:	e12fff32 	blx	r2
  48:	e5845678 	str	r5, [r4, #1656]	; 0x678

  4c:	f57ff04e 	dsb	st
  50:	e59f20d0 	ldr	r2, [pc, #208]	; 128 <try_iowrite32+0x128>
  54:	e1a04002 	mov	r4, r2
  58:	e5923018 	ldr	r3, [r2, #24]
  5c:	e3530000 	cmp	r3, #0
  60:	0a000000 	beq	68 <try_iowrite32+0x68>
  64:	e12fff33 	blx	r3
  68:	e59f30bc 	ldr	r3, [pc, #188]	; 12c <try_iowrite32+0x12c>
  6c:	e59f20c0 	ldr	r2, [pc, #192]	; 134 <try_iowrite32+0x134>
  70:	e5832678 	str	r2, [r3, #1656]	; 0x678

  74:	f57ff04e 	dsb	st
  78:	e5942018 	ldr	r2, [r4, #24]
  7c:	e1a04003 	mov	r4, r3
  80:	e3520000 	cmp	r2, #0
  84:	0a000000 	beq	8c <try_iowrite32+0x8c>
  88:	e12fff32 	blx	r2
  8c:	e59f30a4 	ldr	r3, [pc, #164]	; 138 <try_iowrite32+0x138>
  90:	e5843678 	str	r3, [r4, #1656]	; 0x678

  94:	f57ff04e 	dsb	st
  98:	e59f2088 	ldr	r2, [pc, #136]	; 128 <try_iowrite32+0x128>
  9c:	e1a04002 	mov	r4, r2
  a0:	e5923018 	ldr	r3, [r2, #24]
  a4:	e3530000 	cmp	r3, #0
  a8:	0a000000 	beq	b0 <try_iowrite32+0xb0>
  ac:	e12fff33 	blx	r3

  b0:	f57ff04e 	dsb	st
  b4:	e5943018 	ldr	r3, [r4, #24]
  b8:	e3530000 	cmp	r3, #0
  bc:	0a000000 	beq	c4 <try_iowrite32+0xc4>
  c0:	e12fff33 	blx	r3
  c4:	e59f3060 	ldr	r3, [pc, #96]	; 12c <try_iowrite32+0x12c>
  c8:	e59f206c 	ldr	r2, [pc, #108]	; 13c <try_iowrite32+0x13c>
  cc:	e5832678 	str	r2, [r3, #1656]	; 0x678
  d0:	f57ff04f 	dsb	sy

  d4:	f57ff04e 	dsb	st
  d8:	e59f1048 	ldr	r1, [pc, #72]	; 128 <try_iowrite32+0x128>
  dc:	e1a04003 	mov	r4, r3
  e0:	e1a05001 	mov	r5, r1
  e4:	e5912018 	ldr	r2, [r1, #24]
  e8:	e3520000 	cmp	r2, #0
  ec:	0a000000 	beq	f4 <try_iowrite32+0xf4>
  f0:	e12fff32 	blx	r2
  f4:	e59f3044 	ldr	r3, [pc, #68]	; 140 <try_iowrite32+0x140>
  f8:	e5843678 	str	r3, [r4, #1656]	; 0x678
  fc:	f57ff05a 	dmb	ishst

 100:	f57ff04e 	dsb	st
 104:	e5953018 	ldr	r3, [r5, #24]
 108:	e3530000 	cmp	r3, #0
 10c:	0a000000 	beq	114 <try_iowrite32+0x114>
 110:	e12fff33 	blx	r3
 114:	e59f3010 	ldr	r3, [pc, #16]	; 12c <try_iowrite32+0x12c>
 118:	e59f2024 	ldr	r2, [pc, #36]	; 144 <try_iowrite32+0x144>
 11c:	e5832678 	str	r2, [r3, #1656]	; 0x678
 120:	f57ff05b 	dmb	ish
 124:	e8bd8038 	pop	{r3, r4, r5, pc}
 128:	00000000 	.word	0x00000000
 12c:	12345000 	.word	0x12345000
 130:	abcd0001 	.word	0xabcd0001
 134:	abcd0002 	.word	0xabcd0002
 138:	abcd0003 	.word	0xabcd0003
 13c:	abcd0004 	.word	0xabcd0004
 140:	abcd0005 	.word	0xabcd0005
 144:	abcd0006 	.word	0xabcd0006

This was a lot of code (somehow that’s what you get with ARM). There are no calls to iowrite32(), so this is done inline for ARM (consistent with the sources).

This requires some translation from ARM opcodes to human language (taken from this page):

  • DSB SY — Data Synchronization Barrier: No instruction in program order after this instruction executes until all explicit memory accesses before this instruction complete, as well as all cache, branch predictor and TLB maintenance operations before this instruction complete.
  • DSB ST — Like DSB SY, but waits only for data writes to complete.
  • DMB ISHST — Data Memory Barrier, operation that waits only for stores to complete, and only to the inner shareable domain (whatever that “inner shareable domain” is).
  • DMB ISH — Data Memory Barrier, operation that waits only to the inner shareable domain.

Now let’s decipher the assembly code, which is quite tangled. Luckily, it’s easy to spot the seven write operations as the seven “str” commands in the assembly code. It’s also easy to see that all each iowrite32() starts with an “dsb st” which forces waiting until previous writes has completed. So each iowrite32() spans from a “dsb st” to a “str”. This matches the definition of iowrite32() as __iowmb() and then __raw_writel(…).

The memory barriers are quite clear too:

  • wmb() becomes “dsb st”, the full synchronization barrier for writes (which is also issued automatically before each iowrite32).
  • rmb() becomes “dsb sy”, the full synchronization barrier for reads and writes
  • smp_wmb() becomes “dmb ishst”, the “inner shareable domain” memory barrier for writes
  • smp_rmb() becomes “dmb ish”, the “inner shareable domain” memory barrier for reads and writes

Now with writel()

So I through it would be nice to repeat all this with writel(). Spoiler: Nothing thrilling happens here.

Module code (includes omitted):

void try_writel(void) {
  void __iomem *p = (void *) 0x12345678;

  writel(0xabcd0001, p);
  writel(0xabcd0001, p);
  writel(0xabcd0002, p);
  mmiowb();
  writel(0xabcd0003, p);
  wmb();
  writel(0xabcd0004, p);
  rmb();
  writel(0xabcd0005, p);
  smp_wmb();
  writel(0xabcd0006, p);
  smp_rmb();
}

EXPORT_SYMBOL(try_writel);

Assembly on 64-bit Intel:

minimodule.ko:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <try_writel>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	e8 00 00 00 00       	callq  9 <try_writel+0x9>
   9:	b8 01 00 cd ab       	mov    $0xabcd0001,%eax
   e:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  15:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  1c:	b8 02 00 cd ab       	mov    $0xabcd0002,%eax
  21:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  28:	b8 03 00 cd ab       	mov    $0xabcd0003,%eax
  2d:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  34:	0f ae f8             	sfence
  37:	b8 04 00 cd ab       	mov    $0xabcd0004,%eax
  3c:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  43:	0f ae e8             	lfence
  46:	b8 05 00 cd ab       	mov    $0xabcd0005,%eax
  4b:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  52:	b8 06 00 cd ab       	mov    $0xabcd0006,%eax
  57:	89 04 25 78 56 34 12 	mov    %eax,0x12345678
  5e:	c9                   	leaveq
  5f:	c3                   	retq

OK, so writel() just translated into a couple of inline “mov” opcodes. There’s even an optimization between the first and second move, so %eax isn’t set twice. Hi-tec, I’m telling you.

And on 32-bit Intel:

minimodule.ko:     file format elf32-i386

Disassembly of section .text:

00000000 <try_writel>:
   0:	b8 01 00 cd ab       	mov    $0xabcd0001,%eax
   5:	a3 78 56 34 12       	mov    %eax,0x12345678
   a:	a3 78 56 34 12       	mov    %eax,0x12345678
   f:	b0 02                	mov    $0x2,%al
  11:	a3 78 56 34 12       	mov    %eax,0x12345678
  16:	b0 03                	mov    $0x3,%al
  18:	a3 78 56 34 12       	mov    %eax,0x12345678
  1d:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  22:	b0 04                	mov    $0x4,%al
  24:	a3 78 56 34 12       	mov    %eax,0x12345678
  29:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  2e:	b0 05                	mov    $0x5,%al
  30:	a3 78 56 34 12       	mov    %eax,0x12345678
  35:	b0 06                	mov    $0x6,%al
  37:	a3 78 56 34 12       	mov    %eax,0x12345678
  3c:	f0 83 04 24 00       	lock addl $0x0,(%esp)
  41:	c3                   	ret
	...

Disassembly of section .altinstr_replacement:

00000000 <.altinstr_replacement>:
   0:	0f ae f8             	sfence
   3:	0f ae e8             	lfence
   6:	0f ae e8             	lfence

And for ARM, it’s exactly the same code (to the byte) as iowrite32() is an alias for writel(). But I listed it here anyhow for those who don’t take my word for it:

minimodule.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <try_writel>:
   0:	e92d4038 	push	{r3, r4, r5, lr}
   4:	f57ff04e 	dsb	st
   8:	e59f2118 	ldr	r2, [pc, #280]	; 128 <try_writel+0x128>
   c:	e1a04002 	mov	r4, r2
  10:	e5923018 	ldr	r3, [r2, #24]
  14:	e3530000 	cmp	r3, #0
  18:	0a000000 	beq	20 <try_writel+0x20>
  1c:	e12fff33 	blx	r3
  20:	e59f3104 	ldr	r3, [pc, #260]	; 12c <try_writel+0x12c>
  24:	e59f1104 	ldr	r1, [pc, #260]	; 130 <try_writel+0x130>
  28:	e5831678 	str	r1, [r3, #1656]	; 0x678
  2c:	f57ff04e 	dsb	st
  30:	e5942018 	ldr	r2, [r4, #24]
  34:	e1a05001 	mov	r5, r1
  38:	e1a04003 	mov	r4, r3
  3c:	e3520000 	cmp	r2, #0
  40:	0a000000 	beq	48 <try_writel+0x48>
  44:	e12fff32 	blx	r2
  48:	e5845678 	str	r5, [r4, #1656]	; 0x678
  4c:	f57ff04e 	dsb	st
  50:	e59f20d0 	ldr	r2, [pc, #208]	; 128 <try_writel+0x128>
  54:	e1a04002 	mov	r4, r2
  58:	e5923018 	ldr	r3, [r2, #24]
  5c:	e3530000 	cmp	r3, #0
  60:	0a000000 	beq	68 <try_writel+0x68>
  64:	e12fff33 	blx	r3
  68:	e59f30bc 	ldr	r3, [pc, #188]	; 12c <try_writel+0x12c>
  6c:	e59f20c0 	ldr	r2, [pc, #192]	; 134 <try_writel+0x134>
  70:	e5832678 	str	r2, [r3, #1656]	; 0x678
  74:	f57ff04e 	dsb	st
  78:	e5942018 	ldr	r2, [r4, #24]
  7c:	e1a04003 	mov	r4, r3
  80:	e3520000 	cmp	r2, #0
  84:	0a000000 	beq	8c <try_writel+0x8c>
  88:	e12fff32 	blx	r2
  8c:	e59f30a4 	ldr	r3, [pc, #164]	; 138 <try_writel+0x138>
  90:	e5843678 	str	r3, [r4, #1656]	; 0x678
  94:	f57ff04e 	dsb	st
  98:	e59f2088 	ldr	r2, [pc, #136]	; 128 <try_writel+0x128>
  9c:	e1a04002 	mov	r4, r2
  a0:	e5923018 	ldr	r3, [r2, #24]
  a4:	e3530000 	cmp	r3, #0
  a8:	0a000000 	beq	b0 <try_writel+0xb0>
  ac:	e12fff33 	blx	r3
  b0:	f57ff04e 	dsb	st
  b4:	e5943018 	ldr	r3, [r4, #24]
  b8:	e3530000 	cmp	r3, #0
  bc:	0a000000 	beq	c4 <try_writel+0xc4>
  c0:	e12fff33 	blx	r3
  c4:	e59f3060 	ldr	r3, [pc, #96]	; 12c <try_writel+0x12c>
  c8:	e59f206c 	ldr	r2, [pc, #108]	; 13c <try_writel+0x13c>
  cc:	e5832678 	str	r2, [r3, #1656]	; 0x678
  d0:	f57ff04f 	dsb	sy
  d4:	f57ff04e 	dsb	st
  d8:	e59f1048 	ldr	r1, [pc, #72]	; 128 <try_writel+0x128>
  dc:	e1a04003 	mov	r4, r3
  e0:	e1a05001 	mov	r5, r1
  e4:	e5912018 	ldr	r2, [r1, #24]
  e8:	e3520000 	cmp	r2, #0
  ec:	0a000000 	beq	f4 <try_writel+0xf4>
  f0:	e12fff32 	blx	r2
  f4:	e59f3044 	ldr	r3, [pc, #68]	; 140 <try_writel+0x140>
  f8:	e5843678 	str	r3, [r4, #1656]	; 0x678
  fc:	f57ff05a 	dmb	ishst
 100:	f57ff04e 	dsb	st
 104:	e5953018 	ldr	r3, [r5, #24]
 108:	e3530000 	cmp	r3, #0
 10c:	0a000000 	beq	114 <try_writel+0x114>
 110:	e12fff33 	blx	r3
 114:	e59f3010 	ldr	r3, [pc, #16]	; 12c <try_writel+0x12c>
 118:	e59f2024 	ldr	r2, [pc, #36]	; 144 <try_writel+0x144>
 11c:	e5832678 	str	r2, [r3, #1656]	; 0x678
 120:	f57ff05b 	dmb	ish
 124:	e8bd8038 	pop	{r3, r4, r5, pc}
 128:	00000000 	.word	0x00000000
 12c:	12345000 	.word	0x12345000
 130:	abcd0001 	.word	0xabcd0001
 134:	abcd0002 	.word	0xabcd0002
 138:	abcd0003 	.word	0xabcd0003
 13c:	abcd0004 	.word	0xabcd0004
 140:	abcd0005 	.word	0xabcd0005
 144:	abcd0006 	.word	0xabcd0006