This related to my Fedora 12 machine with a Logitech M705 mouse. It had a generally bad feeling, I would say.
This is actually written on this post already, with some more details on this one, but I prefer having my own routine and final values written down.
So first get a list of input devices:
$ xinput list
⎡ Virtual core pointer id=2 [master pointer (3)]
⎜ ↳ Virtual core XTEST pointer id=4 [slave pointer (2)]
⎜ ↳ Microsoft Microsoft 5-Button Mouse with IntelliEye(TM) id=6 [slave pointer (2)]
⎜ ↳ HID 04f3:0103 id=7 [slave pointer (2)]
⎜ ↳ Logitech USB Receiver id=9 [slave pointer (2)]
⎣ Virtual core keyboard id=3 [master keyboard (2)]
↳ Virtual core XTEST keyboard id=5 [slave keyboard (3)]
↳ Power Button id=12 [slave keyboard (3)]
↳ Power Button id=13 [slave keyboard (3)]
↳ USB AUDIO id=14 [slave keyboard (3)]
↳ HID 04f3:0103 id=8 [slave keyboard (3)]
↳ Logitech USB Receiver id=10 [slave keyboard (3)]
Then get the properties of the USB mouse. Since the string “Logitech USB Receiver” refers to a keyboard input as well as a mouse input, this has to be disambiguated with a pointer: prefix to the identifier. Or just use the ID (not safe on a script, though):
So
$ xinput list-props 9
and
$ xinput list-props pointer:"Logitech USB Receiver"
give the same result, given the list of input devices above.
The output:
$ xinput list-props pointer:"Logitech USB Receiver"
Device 'Logitech USB Receiver':
Device Enabled (131): 1
Device Accel Profile (264): 0
Device Accel Constant Deceleration (265): 1.000000
Device Accel Adaptive Deceleration (267): 1.000000
Device Accel Velocity Scaling (268): 10.000000
Evdev Reopen Attempts (269): 10
Evdev Axis Inversion (270): 0, 0
Evdev Axes Swap (272): 0
Axis Labels (273): "Rel X" (139), "Rel Y" (140)
Button Labels (274): "Button Left" (132), "Button Middle" (133), "Button Right" (134), "Button Wheel Up" (135), "Button Wheel Down" (136), "Button Horiz Wheel Left" (137), "Button Horiz Wheel Right" (138), "Button Side" (283), "Button Extra" (284), "Button Forward" (1205), "Button Back" (1206), "Button Task" (1207), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249)
Evdev Middle Button Emulation (275): 2
Evdev Middle Button Timeout (276): 50
Evdev Wheel Emulation (277): 0
Evdev Wheel Emulation Axes (278): 0, 0, 4, 5
Evdev Wheel Emulation Inertia (279): 10
Evdev Wheel Emulation Timeout (280): 200
Evdev Wheel Emulation Button (281): 4
Evdev Drag Lock Buttons (282): 0
It turns out, that the required change on my machine was
$ xinput set-prop pointer:"Logitech USB Receiver" "Device Accel Adaptive Deceleration" 3
This is not what I expected to do — it slows down the pointer’s movement when the mouse moves slowly. Surprisingly enough, this makes the pointing more intuitive, because hitting that exact spot requires more physical motion, and mouse doesn’t get stuck on millimeters.
As the said post mentions, these settings won’t survive a session restart. But that’s a rare event on my computer. Anyhow, the method suggsted for making it persistent is to add a small script as a starter application. To do this, prepare a small script doing the required setup, and add it as a starter script with
$ gnome-session-properties &
Or, maybe the correct way is to add/edit ~/.xinitrc or ~/.xprofile? Will figure that out when I logout next time (happens once in a few months…).
I discovered this problem in a project that instantiated a 512-bit wide FIFO consisting many (>16) times in different modules. For some unknown reason (it’s called a bug, I suppose) Vivado treated the instantiation as if it wasn’t there, and optimized all surrounding logic as if the black box’ output ports were all zero. For a lower number of instantiations, Vivado handled the instantiation as expected.
I should point out that Vivado did issue synthesis warnings related to the instantiation as if it was OK (e.g. mismatches between port widths and the wires applied), and yet, there was no trace of these instantiation in the post-synthesis netlist view.
In Vivado, the cores from the IP Catalog is treated as a black box module: The IP is typically first compiled into a DCP, and then a black-box module (empty Verilog module, for example) is used to represent it during the synthesis stage. The DCP is then fused into the project during the implementation (like ngdbuild in ISE).
One clue that this happens takes the form of a critical warning from the implementation stage saying something like
CRITICAL WARNING: [Designutils 20-1280] Could not find module 'fifo_wide'. The XDC file /path/to/fifo_wide/fifo_wide/fifo_wide.xdc will not be read for any cell of this module.
Another way to tell this has taken place is to look in the synthesis’ runme.log file (as in vivado-project/project.runs/synth_1/runme.log). The black boxes are listed in the “Report BlackBoxes” section, and each of their instantiation in “Report Cell Usage”. So if the instantiated module doesn’t appear at all in the former, or not enough times in the latter — this is a clear indication something went wrong.
Workaround
After trying out a lot of things, the workaround was to define two IP cores — fifo_wide_rd and fifo_wide_wr — which are identical. The root of them problem seems to have been that the same FIFO was used in two different modules (one that writes from a DDR memory, and one that reads). Due to the different usage contexts and the huge amount of logic involved, it seems like the tools messed up trying to optimize things.
So using one core for the write module and one for the read module got the tools back on track. This is of course no sensible reason to use different cores in different modules, except for a bug in Vivado.
I should mention, that another FIFO is instantiated 20 times in the design, also from two different modules, and nothing bad happened there. However its width is only 32 bits.
Failed attempt
This solves the problem on synthesis, but not all the way through. I left it here just for reference.
The simple solution is to tell Vivado not to attempt optimizing anything related to this module. For example, if the instance name is fifo_wide_inst, the following line in any of the XDC constraints files will do:
set_property DONT_TOUCH true [get_cells -hier -filter {name=~*/fifo_wide_inst}]
This should be completely harmless, as there’s nothing to optimize anyhow — the logic is already optimized inside the DCP. It may be a good idea to do this to all instantiations, just to be sure.
What actually happened with this constraint is that many groups of twenty BUF elements (not IBUF or something like that. Just BUF), named for example ‘bbstub_dout[194]__xx’ (xx going from 1 to 20) were created in the netlist. All had non-connected inputs and the outputs of all twenty buffers connected to the same net. So obviously, nothing good came out of this. The fifo_wide_inst block was non-existent in the netlist, even though twenty instances of it appeared in the synthesis’ runme.log file.
So there were twenty groups of bbstubs for each of the 512 wires of the FIFO, and this applied for each of the twenty modules on which one of these FIFOs was instantiated. No wonder the implementation took a lot of time.
Introduction
After installing Vivado 2014.1 on my laptop running Ubuntu 14.04 (64 bits), I went for license activation. All I wanted was a plain node-locked license. Not a server, and not a floating one. Baseline.
Xilinx abandoned the good old certificate licensing in favor of activation licensing. That is causing some headaches lately…
Going through the process, I had several problems. The most commonly reported is that when one enters the web page on which the license should be generated (see image below), the activation region is greyed out. A “helpful” message next to the greyed area gives suggestions on why this area is disabled: Either a license has already been submitted based upon that certain request ID, or the page was entered directly, and not through Vivado Licensing Manager.
But there’s another important possible reason: The request is maybe invalid. In particular because the computer’s identification is lacking.
If this is the case, there is no special error message for this. Just that “important information” note. See “What the problem was” below.
As a side note: Ubuntu 14.04 is not in the list of supported OS’s, but that’s what I happen to have. Besides, the problem wasn’t with the OS, it turned out.
The activation process (in brief)
It seems like the whole idea about this activation process is that the licensing file that is returned from Xilinx won’t be usable more than once. So instead of making the licensing file valid for a computer ID, it’s made valid with respect to a request ID. Hence the licensing tools on the user’s computer first needs to prepare itself for receiving a licensing file by creating some random data and call it a request ID. That data is conveyed to the licensing web server (Xilinx’ server, that is) along with information about the machine.
The licensing server creates a licensing file, which corresponds to the request ID, enabling the licensed features the user requested on the site. The user feeds this licensing file into the licensing tools (locally on its computer), which match the request ID in its own records with the one in the licensing file. If there is a match, it makes a note to itself that the relevant features are activated. Also, it deletes the information about that request ID from its records.
The database containing the requests and what features are enabled is kept in the “trusted area”. Which is a fine name for some obscured database.
In practice, the process goes as follows: When clicking “Connect Now”, Xilinx licensing client on your computer collects identifying information about your computer, and creates some mumbo-jumbo hex data hashes to represent that information + creates a request ID. It then stores this information in the computer’s own “trusted area” (which must be generated manually prior to this on a Linux machine) so it remembers this request when its response returns.
It then opens a web browser (looks like it just tries running Google Chrome first and then Firefox) with an URL that contains those mumbo-jumbo hex hashes. That eventually leads to that famous licensing page. The user checks some licensing features, a licensing file is generated (and those features are counted as “taken” on the site).
The thing is, that in order to create an activation license, the web server needs those mumbo-jumbo hashes in the URL, so it knows which request ID it works against. Also, if a request ID has already been used to create a license, it can’t be reused, because the licensing tools at the user’s side may have deleted the information about that request ID after accepting the previous licensing file.
What the problem was
The reason turned out to be that my laptop lacks a wired Ethernet NIC, but has only a wireless LAN interface. The FLEXnet license manager obviously didn’t consider wlan0 to be an eligible candidate for supplying the identifying MAC number (even though it’s an Ethernet card for all purposes), so the request that was generated for the computer was rejected.
This can be seen in the XML file that is generated by the command-line tools (see below) in the absence of any identifying method:
<UniqueMachineNumbers>
<UniqueMachineNumber><Type>1</Type><Value></Value></UniqueMachineNumber>
<UniqueMachineNumber><Type>2</Type><Value></Value></UniqueMachineNumber>
<UniqueMachineNumber><Type>4</Type><Value></Value></UniqueMachineNumber>
</UniqueMachineNumbers>
Compare this with after adding a (fake) NIC, as shown below:
<UniqueMachineNumbers>
<UniqueMachineNumber><Type>1</Type><Value></Value></UniqueMachineNumber>
<UniqueMachineNumber><Type>2</Type><Value>51692BAD76FCCBBFAA0D635F0CA3674E0F7FADBC</Value></UniqueMachineNumber>
<UniqueMachineNumber><Type>4</Type><Value></Value></UniqueMachineNumber>
</UniqueMachineNumbers>
But these XML files aren’t really used. What counts is the URL that is used to enter Xilinx site.
Without any identifying means, it looks like this (important part marked in read):
<META HTTP-EQUIV="Refresh" CONTENT="0; URL=http://license.xilinx.com/getLicense?group=esd_oms&os=lin64&version=2014&licensetype=4&ea=&ds=&di=&hn=&umn1=&umn2=&umn4=&req_hash=297B4710327A0F933FF3382961787271D94FE8CD&uuid=961710372713387A02297B4F3F78F93D924FE8CD&isserver=0&sqn=1&trustedid=1&machine_id=E83C0C895A751459C7449FF5ABFC083849233D7A&revision=DefaultOne&revisiontype=SRV&status=OK&isvirtual=0">
And a proper URL like this:
<META HTTP-EQUIV="Refresh" CONTENT="0; URL=http://license.xilinx.com/getLicense?group=esd_oms&os=lin64&version=2014&licensetype=4&ea=&ds=&di=&hn=&umn1=&umn2=51692BAD76FCCBBFAA0D635F0CA3674E0F7FADBC&umn4=&req_hash=8BD92BFBA481BFD3CA64EF6DB30133A24CA961D5&uuid=8BDCA113ABFD64EF6D392BFBA483A24CB30961D5&isserver=0&sqn=1&trustedid=1&machine_id=483F4E15B8491F0482A56C0E253B8F9D78DCD114&revision=DefaultOne&revisiontype=SRV&status=OK&isvirtual=0">
So quite evidently, the UniqueMachineNumber elements in the XML file appear as unm1, unm2 and unm4 CGI variables in the URL. They’re all empty string for the URL that caused the greyed out authentication region.
So fake a NIC
Since the laptop really doesn’t have a wired Ethernet card, let’s fake one assign it a MAC address:
# /sbin/ip tuntap add dev eth0 mode tap
# /sbin/ifconfig eth0 up
# /sbin/ip link set dev eth0 address 11:22:33:44:55:66
(pick any random MAC address, of course)
The quick and dirty way to get this running on every bootup was to add it to /etc/rc.local on my machine. The more graceful way would be to create an upstart script executing on network activation. But I’ve had enough already…
By the way, I picked eth1 on my own computer, because eth0 is used by my Ethernet-over-USB device. Works the same.
If “Connect Now” does nothing
Even though Vivado started off apparently OK, Vivado License Manager refused to open a browser window for obtaining a license on Ubuntu 14.04: I clicked the “Connect Now”, but nothing happened. Some extra packages were installed and it fixed it. Not clear if all are necessary:
# apt-get install libgnomevfs2-0 libgnome2-0
# apt-get install lib32z1 lib32ncurses5 lib32bz2-1.0
As usual, strace was used to find out that this was the problem.
Dec 2018 update: Running Vivado 2015.2 on a Mint 19 machine, I got a new error message:
/usr/lib/firefox/firefox: /path/to/Vivado/2015.2/lib/lnx64.o/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/lib/firefox/firefox)
/usr/lib/firefox/firefox: /path/to/Vivado/2015.2/lib/lnx64.o/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/lib/firefox/firefox)
Apparently, opening Firefox from within Vivado caused it to use Vivado’s C++ runtime library, which was too old for it. Simple fix:
$ cd /path/to/Vivado/2015.2/lib/lnx64.o/
$ mv libstdc++.so.6 old-libstdc++.so.6
$ ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6
Installing FLEX license manager
This is partly documented in Xilinx’ installation guide. It has to be done once before attempting to acquire a license.
First and foremost, clean up previous installation, in case you’ve been struggling with this for a while already. The license manager keeps its file in these directories. Just delete (or move to another directory) the following directories, to get a fresh start
- /tmp/FLEXnet (empty files with UUID-like file names)
- /usr/local/share/macrovision
- /usr/local/share/FNP
- /usr/local/share/applications/.com.flexnetlicensing
- ~/.Xilinx/*.lic (in particular ~/.Xilinx/trial.lic). Not sure if this is related.
Having this done, become root (or use sudo) and run install_fnp.sh. This is what it looked like when I did this based on what was installed along with Vivado 2014.1:
# software/xilinx/Vivado/2014.1/ids_lite/ISE/bin/lin64/install_fnp.sh ./software/xilinx/Vivado/2014.1/bin/unwrapped/lnx64.o/FNPLicensingService
Installing anchor service from ./software/xilinx/Vivado/2014.1/bin/unwrapped/lnx64.o/FNPLicensingService to /usr/local/share/FNP/service64/11.11.0
Checking system for trusted storage area...
Configuring for Linux, Trusted Storage path /usr/local/share/macrovision/storage...
Creating /usr/local/share/macrovision/storage...
Setting permissions on /usr/local/share/macrovision/storage...
Permissions set...
Checking system for Replicated Anchor area...
Configuring Replicated Anchor area...
Creating Replicated Anchor area...
Setting permissions on Replicated Anchor area...
Replicated Anchor area permissions set...
Configuring Temporary area...
Temporary area already exists...
Setting permissions on Temporary area...
Temporary area permissions set...
Configuration completed successfully.
Working with latest library tool chain
As the latest version of Vivado was 2014.4 at the time, I downloaded Vivado 2014.4′s license manager tools. The rationale was that maybe the interaction with the site had changed. With hindsight, it would probably be OK to use 2014.1′s licensing tools, but this is how I eventually got it working.
I extracted the zipfile into ~/software/xilinx/licensing_tools/linux_flexlm_v11.11.0_201410/.
Then went to the lnx64.o, ran install_fnp.sh again as root and verified that there are no pending requests:
$ ./xlicclientmgr -l
ERROR: flxActCommonInit result 2 .
Exit(2) FLEXnet initialisation error.
The reason for this error was not finding the libXLicClientMgrFNP.so library, which is in the same directory (strace saved the day again).
The quick and dirty solution is to add the current directory to the library search path (this works only if it’s done in the directory the library is in):
$ export LD_LIBRARY_PATH=$(pwd)
And then prepare a request:
./xlicclientmgr -cr ~/Desktop/newrequest.xml
Request written to /home/eli/Desktop/newrequest.xml
Request (html) written to /home/eli/Desktop/newrequest.html
The tools indeed remember that they have a pending request:
$ ./xlicclientmgr -l
SeqNo Status Date Time Reference
1 Pending 2015-01-19 18:45 ""
Listed 1 of 1 composite requests.
Then double-clicked newrequest.html to get a license file.
With the XML file that was emailed back:
$ ./xlicclientmgr -p ~/Desktop/Xilinx_License.xml
Response processed successfully. Actions were:
Create fulfillment "215469875"
FLEXnet response dictionary:
COMMENT=This activation license file is generated on Tue Jan 20 16:26:40 UTC 2015
$ ./xlicclientmgr -l
No stored composite requests.
(but there was one listed before using the response).
Random notes playing with MuseScore 0.9.6 (pun not intended):
Installation (after grabbing the RPM file from the web):
# yum install --nogpgcheck MuseScore-0.9.6-1.fc12.x86_64.rpm
Which installs the file with no signature. Somewhat dangerous in theory, as the RPM could in theory contain malicious code (as if a signature helps in this case).
The command line for kicking it off is
$ mscore &
Crashes
MuseScore may enter an infinite memory hogging loop, ending up with a system global OOM and a lot of disk activity. To keep this mishap’s impact small, allow it no more than 2 GB of virtual memory, for example. It will never need nearly as much as that, and once it gets into this “all memory is mine” loop, it gets a kick in the bottom, and that’s it. So before calling mscore, go
$ ulimit -v 2048000
and possibly check it with
$ ulimit -a
Note that this limits any program running from the same shell.
Editing notes (to self, that is)
- Selection: In no-note-writing mode, press Shift and mark an area. It’s also possible to mark a note, and shift-click the last note to select (including from the beginning to end).
- Beaming: That’s the name of connecting eighth and sixteenth notes with those lines. Look for “beam properties” in the palette to get separate notes, as commonly written in notes for singing.
- In the Display menu, select Play, Mixer and Synthesizer panels, to control sound and played tempo. Note that the mixer panel remains in place when closing and opening files, but it becomes dysfunctional at best after that. Just reopen the panel after reloading or such.
Hearing something
To get some audio playing, given errors like this on startup
Alsa_driver: the interface doesn't support mmap-based access.
init ALSA audio driver failed
init ALSA driver failed
init audio driver failed
sequencer init failed
go to Edit > Preferences, pick I/O, choose ALSA Audio only, and set the Device from “default” to “hw:0″.
But ehm, there’s a problem: Musescore requests exclusive access to the sound device, so if anything else happens to be producing sound when Musescore starts, it will fail to initialize its sound interface (and therefore not play anything during that session). And if it manages to grab the soundcard, all other programs attempting to play sound get stuck. This is true even when using a TCP socket to connect to the PulseAudio server.
Portaudio doesn’t make things better. To begin with, it’s a bit confusing, as the API and device entries are empty. But just select it and click “OK” and these become populated after a restart of the program. Not graceful, but it works. Anyhow, picking the ALSA API and the hw:0,0 device (which is my sound card) gets the same result as with ALSA directly, minus I can’t control the volume with the Pulseaudio controls. But the card is still grabbed exclusively, messing up other programs.
Portaudio with OSS didn’t work either, despite running mscore with padsp. No devices appeared in the list.
Loading the OSS compatible driver (modprobe snd-pcm-oss) created a /dev/dsp file indeed, but again, the sound card was exclusively taken.
My ugly solution was to find a couple of USB speakers and plug them in. And use hw:2,0 as the ALSA target in Musescore.
The elegant solution would be to create a bogus hardware card in Pulseaudio, that routes all sound to hw:0,0. I’m sure it’s possible. I’m also sure that I’ve wasted enough time on this nonsense.
Introduction
Xilinx’ Series-7 FPGAs (Virtex-7, Kintex-7, Atrix-7 and Zynq-7000) offer a rather flexible frequency synthesizer, (MMCE2) allowing steps of 0.125 for setting the VCO’s multiplier and one of its dividers. The MMCE can be reprogrammed through its DRP interface, so it can be used as a source of a variable clock frequency.
These are a few notes taken while implementing a reprogrammable frequency source using the MMCE2_ADV.
Resources
The main resource on this matter is Xilinx’ application note 888 (XAPP888) as well as the reference design that can be downloaded from Xilinx’ web site.
As of XAPP888 v1.3 (October 2014), there are a few typos:
- Table 6: PHASE_MUX_F_CLKFB is on bits [13:11] and not [15:13]
- Table 6: FRAC_WF_F_CLKFB is on bit 10 and not 12.
- Table 7: FRAC_EN is related to CLKFBOUT, and not CLKOUT0
The reference design is written in synthesizable Verilog, but the parts that calculate the assignments to the DRP registers are written as Verilog functions, so they can’t be used as is for an arbitrary frequency clock generator. To make things even trickier, the coding style employed in this reference looks like a quiz in deciphering obscured code (or just a Verilog parody).
As a result, it’s somewhat difficult to obtain functional logic (or possibly a computer program) for setting the registers correctly for any allowed combination of parameters. The notes below may assist in getting things straight.
A sample set of DRP registers
For reference, an MMCE was implemented on a Kintex-7 device, after which the entire DRP space was read.
The instantiation of this MMCE was
MMCME2_ADV
#(.BANDWIDTH ("OPTIMIZED"),
.CLKOUT4_CASCADE ("FALSE"),
.COMPENSATION ("ZHOLD"),
.STARTUP_WAIT ("FALSE"),
.DIVCLK_DIVIDE (1),
.CLKFBOUT_MULT_F (5.125),
.CLKFBOUT_PHASE (0.000),
.CLKFBOUT_USE_FINE_PS ("FALSE"),
.CLKOUT0_DIVIDE_F (40.250),
.CLKOUT0_PHASE (0.000),
.CLKOUT0_DUTY_CYCLE (0.500),
.CLKOUT0_USE_FINE_PS ("FALSE"),
.CLKIN1_PERIOD (5.0),
.REF_JITTER1 (0.010)
The register map:
00: a600 0082 0003 0000 0127 9814 0041 0c40
08: 14d3 2c00 0041 0040 0041 0040 0041 0040
10: 0041 0040 0041 2440 1081 1880 1041 1041
18: 03e8 3801 bbe9 0000 0000 0210 0000 01e9
20: 0000 0000 0000 0000 0000 0000 0000 0000
28: 9900 0000 0000 0000 0000 0000 0000 0000
30: 0000 0000 0000 0000 0000 0000 0000 0000
38: 0000 0000 0000 0000 0000 0000 0000 0000
40: 0000 0000 8080 0000 0000 0800 0001 0000
48: 0000 7800 01e9 0000 0000 0000 9108 1900
(Note to self: Use “predesign” git bundle, checkout e.g. ’138358c’, run build TCL script on Vivado 2014.1 and then on PC compile and run ./dump_drp_regs)
Fractional divider register settings
Two dividers in each MMCE2 allow a fractional division ratio: The feedback divider (CLKFBOUT_MULT_F, effectively the clock multiplier) and the output divider for one clocks (CLKOUT0_DIVIDE_F).
The reference design assign correct values in the relevant registers, but is exceptionally difficult to decipher.
The algorithm for calculating the register’s value is the same for CLKFBOUT_MULT_F and CLKOUT0_DIVIDE_F. The values obtained for all registers, except high_time and low_time, depend only on (8x mod 16), where x is either CLKFBOUT_MULT_F or CLKOUT0_DIVIDE_F, given as the actual division ratio.
The values of the registers as set by Vivado are given for the division ratio going from 4.000 to 5.875, in steps of 0.125. (high_time and low_time shown below may appear not to agree with this, but these are the actual numbers).
frac_en |
high_time |
low_time |
edge |
frac |
phase_mux_f |
frac_wf_r |
frac_wf_f |
0 |
2 |
2 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
2 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
3 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
4 |
2 |
1 |
1 |
1 |
1 |
1 |
0 |
5 |
2 |
1 |
1 |
1 |
1 |
1 |
0 |
6 |
3 |
1 |
1 |
1 |
1 |
1 |
0 |
7 |
3 |
1 |
1 |
0 |
2 |
3 |
1 |
0 |
0 |
0 |
0 |
1 |
2 |
1 |
1 |
1 |
4 |
0 |
1 |
1 |
2 |
2 |
1 |
2 |
5 |
0 |
0 |
1 |
2 |
2 |
1 |
3 |
5 |
0 |
0 |
1 |
2 |
2 |
1 |
4 |
6 |
0 |
0 |
1 |
2 |
2 |
1 |
5 |
6 |
0 |
0 |
1 |
2 |
2 |
1 |
6 |
7 |
0 |
0 |
1 |
2 |
2 |
1 |
7 |
7 |
0 |
0 |
Loop filter and lock parameters
Depending on the feedback divider’s integer value (MULT_F in the table below), several registers, which are related to the lock detection and the loop filter, are set with values taken from a lookup table in the reference design. Comparing the values assigned by Vivado 2014.1 (again, by reading back the DRP registers) with those in the reference design for a selected set of MULT_Fs reveals a match as far as the lock detection registers are concerned. However the registers related to the digital loop filter were set to completely different values by Vivado. As there is no documentation available on these registers, it’s not clear what impact this difference has, if at all.
The following table shows the values assigned by Vivado 2014.1 for a set of MULT_F’s. The rightmost columns show the bits of of the loop filter bits, in the same order that they appear in the reference design (MSB to LSB, left to right). All other columns are given in plain decimal notation.
MULT_F |
LockRefDly |
LockFBDly |
LockCnt |
LockSatHigh |
UnlockCnt |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
4 |
11 |
11 |
1000 |
1001 |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
0 |
8 |
22 |
22 |
1000 |
1001 |
1 |
1 |
1 |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
0 |
12 |
31 |
31 |
825 |
1001 |
1 |
1 |
1 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
16 |
31 |
31 |
625 |
1001 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
0 |
1 |
0 |
0 |
20 |
31 |
31 |
500 |
1001 |
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
24 |
31 |
31 |
400 |
1001 |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
0 |
0 |
0 |
28 |
31 |
31 |
350 |
1001 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
32 |
31 |
31 |
300 |
1001 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
Sporadic tests with setting these registers as if the MULT_F was completely different (e.g. as if MULT_F=64 for much lower actual setting) reveals that nothing apparent happens — no loss of locks, and no apparent difference in jitter performance (not measured though). Also, the VCO of the tested FPGA (with speed grade 2) remained firmly locked at frequencies going as low as 20 MHz (using non-fractional ratios) and as high as 3000 MHz (even though the datasheet ensures 600-1440 MHz only). This was run for several minutes on each frequency with a junction temperature of 56°C.
All in all, there’s an uncertainty regarding the loop filter parameters, but there’s reason to hope that this has no practical significance.
… go though any gitk window on the desktop, and click on it, to release it from some unexpected GUI state.
Just wanted that written down for the next time I try to select a segment in XEmacs or the gnome-terminal window, and the selection goes away as I release the mouse button.
Just a few jots on handling packages in Ubuntu. This post is a true mess.
Pinning
The bottom line seems to be not to use the Software Updater, but instead go
# apt-get upgrade
How to prevent certain packages from being updated, based upon this Pinning Howto page and the Apt Preferences page which cover the internals as well.
There also the manpage:
$ man apt_preferences
Repositories
The repositories known by apt-get are listed in /etc/apt/sources.list.d/ and /etc/apt/sources.list. For example, adding a repository:
# add-apt-repository ppa:vovoid/vsxu-release
Removing a repository e.g.
# add-apt-repository --remove ppa:vovoid/vsxu-release
and always do
# apt-get update
after changing the repository set. Now, you might get something like
E: The repository 'http://ppa.launchpad.net/...' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default
If you want to insist on using such repository (at your own risk), go
# apt update --allow-insecure-repositories
and may the force be with you.
Among others, adding the repository above creates /etc/apt/sources.list.d/vovoid-vsxu-release-trusty.list saying
deb http://ppa.launchpad.net/vovoid/vsxu-release/ubuntu trusty main
# deb-src http://ppa.launchpad.net/vovoid/vsxu-release/ubuntu trusty main
“trusty” refers to Ubuntu 14.04, of course.
Look at this page for more info.
Checking what apt-get would install
# apt-get -s upgrade | less
The packages related to the Linux kernel: linux-generic linux-headers-generic linux-image-generic
It’s worth looking here on this regrading what packages “kept back means” (but the bottom line is that these packages won’t be installed).
Being open to suggestions
Kodi, for example, has a lot of “side packages” that are good to install along. This is how to tell apt-get to grab them as well:
# apt-get install --install-suggests kodi
Pinning with dpkg
This doesn’t work with apt-get nor Automatic Updater, following this and this web pages:
List all packages
$ dpkg -l
Wildcards can be used to find specific packages. For example, those related to the current kernel:
$ dpkg -l "*$(uname -r)*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================================-===========================-===========================-===============================================================================================
ii linux-headers-3.13.0-35-generic 3.13.0-35.62 amd64 Linux kernel headers for version 3.13.0 on 64 bit x86 SMP
ii linux-image-3.13.0-35-generic 3.13.0-35.62 amd64 Linux kernel image for version 3.13.0 on 64 bit x86 SMP
ii linux-image-extra-3.13.0-35-generic 3.13.0-35.62 amd64 Linux kernel extra modules for version 3.13.0 on 64 bit x86 SMP
Or, to get just the package names:
$ dpkg -l | awk '{ print $2; }' | grep "$(uname -r)"
Pinning a package
Aug 2019 update: Maybe with apt-mark? Haven’t tried that yet.
In order to prevent a certain package from being updated, use the “hold” setting for the package. For example, holding the kernel related package automatically (all three packages) as root:
# dpkg -l | awk '{ print $2; }' | grep "$(uname -r)" | while read i ; do echo $i hold ; done | dpkg --set-selections
After this, the listing of these packages is:
$ dpkg -l "*$(uname -r)*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================================-===========================-===========================-===============================================================================================
hi linux-headers-3.13.0-35-generic 3.13.0-35.62 amd64 Linux kernel headers for version 3.13.0 on 64 bit x86 SMP
hi linux-image-3.13.0-35-generic 3.13.0-35.62 amd64 Linux kernel image for version 3.13.0 on 64 bit x86 SMP
hi linux-image-extra-3.13.0-35-generic 3.13.0-35.62 amd64 Linux kernel extra modules for version 3.13.0 on 64 bit x86 SMP
Indeed, the “h” notes that the packages are held. To revert this, use “install” instead of “hold” in the input to dpkg –set-selections above.
Which package provides file X?
Following this page, install apt-file (simply with apt-get install apt-file), go “apt-file update” once and then go something like (not necessarily as root):
$ apt-file find libgnome-2.so
Note that the pattern can be a substring (as in the example above).
What files does package X generate?
$ dpkg -L libpulse-dev
Installing a deb file locally
# dpkg -i thepackage.deb
If there are failed dependencies, fix them with apt-get subsequently:
# apt-get -f install
and if it says that it wants to remove the package you tried to install, go
# apt-get install -f --fix-missing
That will probably not help directly, but odds are apt-get will at least explain why it wants to kick out the package.
To make apt ignore a failed post-installation script, consider this post.
Extracting the files from a repository
This can be used for running more than one version of Google Chrome on a computer. See this post for a few more words on this.
Extract the .deb package:
$ ar x google-chrome-stable_current_amd64.deb
Note that the files go into the current directory (yuck).
Extract the package’s files:
$ mkdir files
$ cd files
$ tar -xJf ../data.tar.xz
Extract the installation scripts:
$ mkdir scripts
$ cd scripts/
$ tar -xJf ../control.tar.xz
A word on repositories
Say that we have a line like this in /etc/apt/sources.list:
deb http://archive.ubuntu.com/ubuntu xenial main universe updates restricted security backports
It tells apt-update to go to http://archive.ubuntu.com/ubuntu/dists/ and look into xenial/main for the “main” part, xenial/universe for the “universe” part but e.g. xenial-updates/ for the “updates”. This site help with a better understanding on how a sources.list file is set up.
If we look at e.g. ubuntu/dists/xenial/main/, there’s a binary-amd64/ subdirectory for the amd64 platforms (64-bit Intel/AMD). That’s where the Packages.gz and Packages.xz files are found. These list the packages available in the repositories, but even more important: where to find them.
For example, the entry for the “adduser” package looks like this:
Package: adduser
Priority: required
Section: admin
Installed-Size: 648
Maintainer: Ubuntu Core Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Debian Adduser Developers <adduser-devel@lists.alioth.debian.org>
Architecture: all
Version: 3.113+nmu3ubuntu4
Replaces: manpages-it (<< 0.3.4-2), manpages-pl (<= 20051117-1)
Depends: perl-base (>= 5.6.0), passwd (>= 1:4.1.5.1-1.1ubuntu6), debconf | debconf-2.0
Suggests: liblocale-gettext-perl, perl-modules, ecryptfs-utils (>= 67-1)
Filename: pool/main/a/adduser/adduser_3.113+nmu3ubuntu4_all.deb
Size: 161698
MD5sum: 36f79d952ced9bde3359b63cf9cf44fb
SHA1: 6a5b8f58e33d5c9a25f79c6da80a64bf104e6268
SHA256: ca6c86cb229082cc22874ed320eac8d128cc91f086fe5687946e7d05758516a3
Description: add and remove users and groups
Multi-Arch: foreign
Homepage: http://alioth.debian.org/projects/adduser/
Description-md5: 7965b5cd83972a254552a570bcd32c93
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu
Supported: 5y
Task: minimal
As is evident, this entry contains dependency information, but most important: It points at where the package can be downloaded from: pool/main/a/adduser/adduser_3.113+nmu3ubuntu4_all.deb in this case, which actually points to http://archive.ubuntu.com/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu4_all.deb.
Note that the URL’s base is repository’s root, and not necessarily the root of the domain. Since the Package file contains the SHA1 sum of the .deb file, its own SHA1 sum is listed in http://archive.ubuntu.com/ubuntu/dists/xenial/InRelease, which also contains a PGP signature.
The various “Contents” files (e.g. dists/xenial/Contents-amd64.gz) seem to contain a list of files, and the packages they belong to. Probably for use of apt-file.
These are a few jots about constraining in Vivado. With no particular common topic, and in no particular order. Note that I have another post on similar topics.
Setting the default IOSTANDARD for all ports
In a design where all ports have the same IOSTANDARD, it’s daunting to set them for all. So if there’s just one exception, one can go
set_property IOSTANDARD LVCMOS33 [get_ports -filter { LOC =~ IOB_* } ]
set_property IOSTANDARD LVDS_25 [get_ports clk_100_p]
It’s important to do this after placement constraints of the ports, because the LOC property is set only in conjunction with setting package_pin. Filtering based upon LOC is required to avoid inclusion of MGT ports in the get_ports command, and in turn fail the set_property command altogether (yielding not only a critical warning, but none of the IOSTANDARDs will be set).
Tell the truth about your heat sink
Vivado makes an estimation of the junction temperature, based upon its power estimation. That’s the figure that you want to keep below 85°C (if you’re using a commercial temperature version of the FPGA).
With all my reservations on the accuracy of the power estimation, and hence the temperature it calculates based upon in, it makes sense to tell Vivado about the chosen cooling solution. Otherwise, it assumes a heatsink with a healthy fan above it. So if you like to live on the edge, like me, and work without a noisy fan, these two lines in the XDC file tell Vivado to adjust the effective Junction-to-Air thermal resistance (Θ_JA).
set_operating_conditions -airflow 0
set_operating_conditions -heatsink low
It’s also possible to set Θ_JA explicitly with -thetaja. Try “help set_operating_conditions” at the Tcl prompt for a list of options.
Frankly speaking, the predicted junction temperature stated on the power report is probably rubbish anyhow, even if the power estimation is accurate. The reason is that there’s a low thermal resistance towards the board: If the board remains on 25°C, the junction temperature will be lower than predicted. On the other hand, if the board heats up from adjacent components, a higher temperature will be measured. In a way, the FPGA will serve as a cooling path from the board to air. With extra power flowing through this path, the temperature rises all along it.
For example, the temperature I got with the setting above on a KC705, with the fan taken off, was significantly higher (~60°C) than Vivado’s prediction (44°C) on a design that had little uncertainty (90% of the estimated power was covered by static power and GTXs with a fixed rate — there was almost no logic in the design). The junction temperature was measured through JTAG from the Hardware Manager.
So the only thing that really counts is a junction temperature after 15 minutes or so.
Search patterns for finding elements
Pattern-matching in Vivado is slash-sensitive. e.g.
foreach x [get_pins "*/*/a*"] { puts $x }
prints elements pins three hierarchies down beginning with “a”, but “a*” matches only pins on the top level.
The “foreach” is given here to demonstrate loops. It’s actually easier to go
join [get_pins "*/*/a*"] "\n"
To make “*” match any character “/” included, it’s possible, yet not such a good idea, to use UCF-style, e.g.
foreach x [get_pins -match_style ucf "*/fifo*"] { puts $x }
or a more relevant example
get_pins -match_style ucf */PS7_i/FCLKCLK[1]
The better way is to forget about the old UCF file format. The Vivado way to allow “*” to match any character, including a slash, is filters:
set_property LOC GTXE2_CHANNEL_X0Y8 [get_cells -hier -filter {name=~*/gt0_mygtx_i/gtxe2_i}]
Another important concept in Vivado is the “-of” flag which allows to find all nets connected to a cell, all cells connected to a net etc.
For example,
get_nets -of [get_cells -hier -filter {name=~*/gt_top_i/phy_rdy_n_int_reg}]
Group clocks instead of a lot of false paths
Unlike ISE, Vivado assumes that all clocks are “related” — if two clocks come from sources, which the tools have no reason to assume a common source for, ISE will consider all paths between the clock domains as false paths. Vivado, on the other hand, will assume that these paths are real, and probably end up with an extreme constraint, take ages in the attempt to meet timing, and then fail the timing, of course.
Even in reference designs, this is handled by issuing false paths between each pair of unrelated clocks (two false path statements for each pair, usually). This is messy, often with complicated expressions appearing twice. And a lot of issues every time a new clock is introduced.
The clean way is to group the clocks. Each group contains all clocks that are considered related. Paths inside a group are constrained. Paths between groups are false. Simple and intuitive.
set_clock_groups -asynchronous \
-group [list \
[get_clocks -include_generated_clocks -of_objects [get_pins -hier -filter {name=~*gt0_mygtx_i*gtxe2_i*TXOUTCLK}]] \
[get_clocks -include_generated_clocks "gt0_txusrclk_i"]] \
-group [get_clocks -include_generated_clocks "drpclk_in_i"] \
-group [list \
[get_clocks -include_generated_clocks "sys_clk"] \
[get_clocks -include_generated_clocks -of_objects [get_pins -hier -filter {name=~*/pipe_clock/pipe_clock/mmcm_i/*}]]]
In the example above, three clock groups are declared.
As a group often consists of several clocks, each requiring a tangled expression to pin down, it may come handy to define a list of clocks, with the “list” TCL statement, as shown above.
Another thing to note is that clocks can be obtained as all clocks connected to a certain MMCM or PLL, as shown above, with -of_objects. To keep the definitions short, it’s possible to use create_generated_clock to name clocks that can be found in certain parts of the design (create_clock is applied to external pins only).
If a clock is accidentally not included in this statement, don’t worry: Vivado will assume valid paths for all clock domain crossings involving it, and it will probably take a place of honor in the timing report.
Finally, it’s often desired to tell Vivado to consider clocks that are created by an MCMM / PLL as independent. If a Clock Wizard IP was used, it boils down to something as simple as this:
set_clock_groups -asynchronous \
-group [get_clocks -include_generated_clocks -of_objects [get_pins -hier -filter {name=~*clk_gen_ins/clk_in1}]] \
-group [get_clocks -include_generated_clocks -of_objects [get_pins -hier -filter {name=~*clk_gen_ins/clk_out1}]]
which simply says “the input and output clocks of the clock module are independent”. This can be expanded to more outputs, of course.
Telling the tools what the BUFGCE/BUFGMUX is set to
Suppose a segment like this:
BUFGCE clkout1_buf
(.O (slow_clk),
.CE (seq_reg1[7]),
.I (clkout1));
To tell the tools that the timing analysis should be made with the assumption that BUFGCE is enabled,
set_case_analysis 1 [get_pins -hier -filter {name=~*/clkout1_buf/CE0}]
set_case_analysis 1 [get_pins -hier -filter {name=~*/clkout1_buf/S0}]
The truth is that it’s redundant in this case, as the tools assume that CE=1. But this is the syntax anyhow.
Constant clock? Who? Where? Why?
One of the things to verify before being happy with a design’s timing (a.k.a. “signing off timing”), according to UG906 (Design Analysis and Closure Techniques) is that there are no constant clocks nor unconstrained internal endpoints. But hey, what if there are? Like, when running “Report Timing Summary”, under “Check Timing” the number for “constant clock” is far from zero. And the timing summary says this:
2. checking constant clock
--------------------------
There are 2574 register/latch pins with constant_clock. (MEDIUM)
3. checking pulse_width_clock
-----------------------------
There are 0 register/latch pins which need pulse_width check
4. checking unconstrained_internal_endpoints
--------------------------------------------
There are 0 pins that are not constrained for maximum delay.
There are 5824 pins that are not constrained for maximum delay due to constant clock. (MEDIUM)
Ehm. So which clock caused this, and what are the endpoints involved? It’s actually simple to get that information. Just go
check_timing -verbose -file my_timing_report.txt
on the Tcl prompt, and read the file. The registers and endpoints are listed in the output file.
Floorplanning (placement constraints for logic)
The name of the game is Pblocks. Didn’t dive much into the semantics, but used the GUI’s Tools > Floorplanning menus to create a Pblock and auto-place it. Then saved the constraints, and manipulated the Tcl commands manually (i.e. the get_cells command and the choice of slices).
create_pblock pblock_registers_ins
add_cells_to_pblock [get_pblocks pblock_registers_ins] [get_cells -quiet -hierarchical -filter { NAME =~ "registers_ins/*" && PRIMITIVE_TYPE =~ FLOP_LATCH.*.* && NAME !~ "registers_ins/fifo_*" }]
resize_pblock [get_pblocks pblock_registers_ins] -add {SLICE_X0Y0:SLICE_X151Y99}
The snippet above places all flip-flops (that is, registers) of a certain module, except for those belonging to a couple of submodules (excluded by the second NAME filter) to the bottom area of a 7V330T. The constraint is non-exclusive (other logic is allowed in the region as well).
The desired slice region was found by hovering with the mouse over a zoomed in chip view of an implemented design.
The tools obeyed this constraint strictly, even with post-route optimization, so it’s important not to shoot yourself in the foot when using this for timing improvement (in my case it worked).
To see how the logic is spread out, use the “highlight leaf cells” option when right-clicking a hierarchy in the netlist view to the left of a chip view. Or even better, use Tcl commands on the console:
unhighlight_objects
highlight_objects -color red [get_cells -hierarchical -filter { NAME =~ "registers_ins/*" && PRIMITIVE_TYPE =~ FLOP_LATCH.*.* && NAME !~ "registers_ins/fifo_*" }]
The first command removes existing highlight. There’s an -rgb flag too for selecting the exact color.
There’s also show_objects -name mysearch [ get_cells ... ] which is how the GUI’s “find” operation creates those lists in GUI to inspect elements.
When a Windows 7 or Windows 8 starts to behave weirdly, this is the general-purpose command that can save your day (in the Command Prompt):
sfc /scannow
It scans all system files and fixes whatever looks bad. In my case, it started off as a “Limited” Wireless connection on a laptop (after it had been fine for a year), which turned out to be the lack of a DHCP request, and ended up with the understanding the the DHCP request service can’t be started because some “element” was missing. Now go fix that manually.
The scan took some 30 minutes, but after the reboot, all was fine again.
For more of my war stories, click here.
Introduction
Needing to remove superfluous memory barriers from a Linux kernel device driver, I wondered what they actually do. The issue is discussed down to painful detail in Documentation/memory-barriers.txt, but somehow it’s quite difficult to figure out if they’re really needed and where. Most drivers rely on subsequent iowrite32′s (or writel’s) to arrive to the hardware in the same order they appear in the code, and this is backed up the following clause in memory-barriers.txt:
Inside of the Linux kernel, I/O should be done through the appropriate accessor routines – such as inb() or writel() – which know how to make such accesses appropriately sequential. Whilst this, for the most part, renders the explicit use of memory barriers unnecessary, there are a couple of situations where they might be needed:
- On some systems, I/O stores are not strongly ordered across all CPUs, and so for _all_ general drivers locks should be used and miowb() must be issued prior to unlocking the critical section.
- If the accessor functions are used to refer to an I/O memory window with relaxed memory access properties, then _mandatory_ memory barriers are required to enforce ordering.
See Documentation/DocBook/deviceiobook.tmpl for more information.
So what they’re saying is that a memory barrier should be used before releasing a lock (spinlock? mutex? both? The examples show only a spinlock) and when prefetching is allowed by hardware.
Nice. Are they doing anything?
April 2020 update: I’ve written a new post on a similar topic. Also, on top of memory-barriers.txt mentioned above, there are some excellent explanations in the kernel tree’s tools/memory-model/Documentation/explanation.txt and tools/memory-model/Documentation/recipes.txt. There are relatively new (from v4.17, beginning of 2018).
May 2021 update: I’ve also written the parallel post for Windows device driver coding, which occasionally brings up Linux.
The practical take
Since I care most about x86 and ARM, I decided to figure out what the memory barriers actually do. The driver’s code should be formally correct, but in the end, if I remove a memory barrier and then test the driver — have I really made a difference? Have I really tested anything?
Ah, and in case you wonder why I didn’t check ioread32() and readl(): I don’t use them in my driver. Odd as it may sound.
The kernel sources in this post are ~3.12 but how often does anyone dare touching those basic functions?
Spoiler
For the lazy ones, here are my conclusions:
- On x86 platforms, iowrite32() and writel() are translated to just a “mov” into memory.
- On ARM, the same functions translate into a full write synchronization barrier (stop execution until all previous writes are done), and then an “str” into memory.
- On x86, the following functions translate into nothing: mmiowb(), smp_wmb() and smp_rmb(). wmb() and rmb() translate into “sfence” and “lfence” respectively.
- On ARM, mmiowb() translates into nothing. The other barriers translate into sensible opcodes.
Trying memory barriers with iowrite32()
I wrote the following kernel module as minimodule.c. Obviously, it won’t do anything good except for being disassembled after compilation.
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/io.h>
void try_iowrite32(void) {
void __iomem *p = (void *) 0x12345678;
iowrite32(0xabcd0001, p);
iowrite32(0xabcd0001, p);
iowrite32(0xabcd0002, p);
mmiowb();
iowrite32(0xabcd0003, p);
wmb();
iowrite32(0xabcd0004, p);
rmb();
iowrite32(0xabcd0005, p);
smp_wmb();
iowrite32(0xabcd0006, p);
smp_rmb();
}
EXPORT_SYMBOL(try_iowrite32);
The idea: First repeat exactly the same write to see how that’s handled, and then add barriers to see what they turn into.
The related sources for iowrite32() on x86
I have to admit that I was surprised to find out that iowrite32() is a function in itself, as is shown later in the disassembly. My best understanding was that it’s just an alias for writel(), by virtue of a define statement. But since CONFIG_GENERIC_IOMAP is defined on my kernel, it’s not defined in include/asm-generic/io.h, but there’s just a header for it in include/asm-generic/iomap.h. It’s defined as a function in lib/iomap.c as follows:
void iowrite32(u32 val, void __iomem *addr)
{
IO_COND(addr, outl(val,port), writel(val, addr));
}
where IO_COND is previously defined in the same file as follows (the comment is in the sources):
/*
* Ugly macros are a way of life.
*/
#define IO_COND(addr, is_pio, is_mmio) do { \
unsigned long port = (unsigned long __force)addr; \
if (port >= PIO_RESERVED) { \
is_mmio; \
} else if (port > PIO_OFFSET) { \
port &= PIO_MASK; \
is_pio; \
} else \
bad_io_access(port, #is_pio ); \
} while (0)
So there we have it. iowrite32() isn’t just an alias for writel(), but it checks the address and interprets it as port I/O if that makes sense.
To be sure, iowrite32() was disassembled as follows from the kernel’s object code (32-bit version):
0020f79f <iowrite32>:
20f79f: 81 fa ff ff 03 00 cmp $0x3ffff,%edx
20f7a5: 89 d1 mov %edx,%ecx
20f7a7: 76 03 jbe 20f7ac <iowrite32+0xd>
20f7a9: 89 02 mov %eax,(%edx)
20f7ab: c3 ret
20f7ac: 81 fa 00 00 01 00 cmp $0x10000,%edx
20f7b2: 76 08 jbe 20f7bc <iowrite32+0x1d>
20f7b4: 81 e2 ff ff 00 00 and $0xffff,%edx
20f7ba: ef out %eax,(%dx)
20f7bb: c3 ret
20f7bc: ba f2 56 03 00 mov $0x356f2,%edx
20f7c1: 89 c8 mov %ecx,%eax
20f7c3: e9 41 fe ff ff jmp 20f609 <bad_io_access>
Results on x86_64
Compiled on Intel x86/64 bit:
$ objdump -d minimodule.ko
minimodule.ko: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <try_iowrite32>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <try_iowrite32+0x9>
9: be 78 56 34 12 mov $0x12345678,%esi
e: bf 01 00 cd ab mov $0xabcd0001,%edi
13: e8 00 00 00 00 callq 18 <try_iowrite32+0x18>
18: be 78 56 34 12 mov $0x12345678,%esi
1d: bf 01 00 cd ab mov $0xabcd0001,%edi
22: e8 00 00 00 00 callq 27 <try_iowrite32+0x27>
27: be 78 56 34 12 mov $0x12345678,%esi
2c: bf 02 00 cd ab mov $0xabcd0002,%edi
31: e8 00 00 00 00 callq 36 <try_iowrite32+0x36>
36: be 78 56 34 12 mov $0x12345678,%esi
3b: bf 03 00 cd ab mov $0xabcd0003,%edi
40: e8 00 00 00 00 callq 45 <try_iowrite32+0x45>
45: 0f ae f8 sfence
48: be 78 56 34 12 mov $0x12345678,%esi
4d: bf 04 00 cd ab mov $0xabcd0004,%edi
52: e8 00 00 00 00 callq 57 <try_iowrite32+0x57>
57: 0f ae e8 lfence
5a: be 78 56 34 12 mov $0x12345678,%esi
5f: bf 05 00 cd ab mov $0xabcd0005,%edi
64: e8 00 00 00 00 callq 69 <try_iowrite32+0x69>
69: be 78 56 34 12 mov $0x12345678,%esi
6e: bf 06 00 cd ab mov $0xabcd0006,%edi
73: e8 00 00 00 00 callq 78 <try_iowrite32+0x78>
78: c9 leaveq
79: c3 retq
...
Those “callq” statements are modified upon linking. To resolve what these are calling, go
$ readelf -r minimodule.ko
Relocation section '.rela.text' at offset 0xa9b0 contains 8 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000005 002300000002 R_X86_64_PC32 0000000000000000 mcount - 4
000000000014 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000023 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000032 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000041 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000053 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000065 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
000000000074 002000000002 R_X86_64_PC32 0000000000000000 iowrite32 - 4
(the output continues with relocation information for debug variables).
It’s quite easy to work this out: The “Offset” column tells us the offset in the object code. For example, a callq statement begins at 0x13, but the address to call starts at 0x14. The second entry in the relocation section points at offset 0x14, and says that the target is iowrite32().
So from this output we learn that all callq’s are to iowrite32(), except the first one, which goes to mcount() (which is intended for kernel call tracing).
Now to conclusions: There are no memory barriers in the code, except those generated by wmb() and rmb(), which added sfence and lfence respectively. sfence is defined as
Performs a serializing operation on all store instructions that were issued prior the SFENCE instruction. This serializing operation guarantees that every store instruction that precedes in program order the SFENCE instruction is globally visible before any store instruction that follows the SFENCE instruction is globally visible. The SFENCE instruction is ordered with respect store instructions, other SFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to load instructions or the LFENCE instruction.
and lfence as
Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction. This serializing operation guarantees that every load instruction that precedes in program order the LFENCE instruction is globally visible before any load instruction that follows the LFENCE instruction is globally visible. The LFENCE instruction is ordered with respect to load instructions, other LFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to store instructions or the SFENCE instruction.
One can feel the Intel-headache just reading this.
Results on x86 (32 bit)
Compiling this against a 32-bit kernel, with a slightly different configuration:
$ objdump -d minimodule.ko
minimodule.ko: file format elf32-i386
Disassembly of section .text:
00000000 <try_iowrite32>:
0: ba 78 56 34 12 mov $0x12345678,%edx
5: b8 01 00 cd ab mov $0xabcd0001,%eax
a: e8 fc ff ff ff call b <try_iowrite32+0xb>
f: ba 78 56 34 12 mov $0x12345678,%edx
14: b8 01 00 cd ab mov $0xabcd0001,%eax
19: e8 fc ff ff ff call 1a <try_iowrite32+0x1a>
1e: ba 78 56 34 12 mov $0x12345678,%edx
23: b8 02 00 cd ab mov $0xabcd0002,%eax
28: e8 fc ff ff ff call 29 <try_iowrite32+0x29>
2d: ba 78 56 34 12 mov $0x12345678,%edx
32: b8 03 00 cd ab mov $0xabcd0003,%eax
37: e8 fc ff ff ff call 38 <try_iowrite32+0x38>
3c: f0 83 04 24 00 lock addl $0x0,(%esp)
41: ba 78 56 34 12 mov $0x12345678,%edx
46: b8 04 00 cd ab mov $0xabcd0004,%eax
4b: e8 fc ff ff ff call 4c <try_iowrite32+0x4c>
50: f0 83 04 24 00 lock addl $0x0,(%esp)
55: ba 78 56 34 12 mov $0x12345678,%edx
5a: b8 05 00 cd ab mov $0xabcd0005,%eax
5f: e8 fc ff ff ff call 60 <try_iowrite32+0x60>
64: ba 78 56 34 12 mov $0x12345678,%edx
69: b8 06 00 cd ab mov $0xabcd0006,%eax
6e: e8 fc ff ff ff call 6f <try_iowrite32+0x6f>
73: f0 83 04 24 00 lock addl $0x0,(%esp)
78: c3 ret
79: 00 00 add %al,(%eax)
...
Disassembly of section .altinstr_replacement:
00000000 <.altinstr_replacement>:
0: 0f ae f8 sfence
3: 0f ae e8 lfence
6: 0f ae e8 lfence
$ readelf -r minimodule.ko
Relocation section '.rel.text' at offset 0xc3e0 contains 7 entries:
Offset Info Type Sym.Value Sym. Name
0000000b 00002402 R_386_PC32 00000000 iowrite32
0000001a 00002402 R_386_PC32 00000000 iowrite32
00000029 00002402 R_386_PC32 00000000 iowrite32
00000038 00002402 R_386_PC32 00000000 iowrite32
0000004c 00002402 R_386_PC32 00000000 iowrite32
00000060 00002402 R_386_PC32 00000000 iowrite32
0000006f 00002402 R_386_PC32 00000000 iowrite32
So it’s in essence the same, only the mcount() call in the beginning was skipped.
The related sources for iowrite32() on ARM
These are the key excerpts from arch/arm/include/asm/io.h:
static inline void __raw_writel(u32 val, volatile void __iomem *addr)
{
asm volatile("str %1, %0"
: "+Qo" (*(volatile u32 __force *)addr)
: "r" (val));
}
...
#define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c)
...
#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })
...
#define iowrite32(v,p) ({ __iowmb(); __raw_writel((__force __u32)cpu_to_le32(v), p); })
As for __iowmb(), it goes
/* IO barriers */
#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
#include <asm/barrier.h>
#define __iormb() rmb()
#define __iowmb() wmb()
#else
#define __iormb() do { } while (0)
#define __iowmb() do { } while (0)
#endif
so it’s down to the configuration if __iowmb() does something. And to get the full picture, these are snips from arch/arm/include/asm/barrier.h:
#if __LINUX_ARM_ARCH__ >= 7
#define isb(option) __asm__ __volatile__ ("isb " #option : : : "memory")
#define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory")
#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
...
#ifdef CONFIG_ARCH_HAS_BARRIERS
#include <mach/barriers.h>
#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb() do { dsb(); outer_sync(); } while (0)
#define rmb() dsb()
#define wmb() do { dsb(st); outer_sync(); } while (0)
#else
#define mb() barrier()
#define rmb() barrier()
#define wmb() barrier()
#endif
Results on ARM
This is what the same module compiled for ARM Cortex A9, Little Endian gives (I’ve added extra newlines in the middle for clarity):
minimodule.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <try_iowrite32>:
0: e92d4038 push {r3, r4, r5, lr}
4: f57ff04e dsb st
8: e59f2118 ldr r2, [pc, #280] ; 128 <try_iowrite32+0x128>
c: e1a04002 mov r4, r2
10: e5923018 ldr r3, [r2, #24]
14: e3530000 cmp r3, #0
18: 0a000000 beq 20 <try_iowrite32+0x20>
1c: e12fff33 blx r3
20: e59f3104 ldr r3, [pc, #260] ; 12c <try_iowrite32+0x12c>
24: e59f1104 ldr r1, [pc, #260] ; 130 <try_iowrite32+0x130>
28: e5831678 str r1, [r3, #1656] ; 0x678
2c: f57ff04e dsb st
30: e5942018 ldr r2, [r4, #24]
34: e1a05001 mov r5, r1
38: e1a04003 mov r4, r3
3c: e3520000 cmp r2, #0
40: 0a000000 beq 48 <try_iowrite32+0x48>
44: e12fff32 blx r2
48: e5845678 str r5, [r4, #1656] ; 0x678
4c: f57ff04e dsb st
50: e59f20d0 ldr r2, [pc, #208] ; 128 <try_iowrite32+0x128>
54: e1a04002 mov r4, r2
58: e5923018 ldr r3, [r2, #24]
5c: e3530000 cmp r3, #0
60: 0a000000 beq 68 <try_iowrite32+0x68>
64: e12fff33 blx r3
68: e59f30bc ldr r3, [pc, #188] ; 12c <try_iowrite32+0x12c>
6c: e59f20c0 ldr r2, [pc, #192] ; 134 <try_iowrite32+0x134>
70: e5832678 str r2, [r3, #1656] ; 0x678
74: f57ff04e dsb st
78: e5942018 ldr r2, [r4, #24]
7c: e1a04003 mov r4, r3
80: e3520000 cmp r2, #0
84: 0a000000 beq 8c <try_iowrite32+0x8c>
88: e12fff32 blx r2
8c: e59f30a4 ldr r3, [pc, #164] ; 138 <try_iowrite32+0x138>
90: e5843678 str r3, [r4, #1656] ; 0x678
94: f57ff04e dsb st
98: e59f2088 ldr r2, [pc, #136] ; 128 <try_iowrite32+0x128>
9c: e1a04002 mov r4, r2
a0: e5923018 ldr r3, [r2, #24]
a4: e3530000 cmp r3, #0
a8: 0a000000 beq b0 <try_iowrite32+0xb0>
ac: e12fff33 blx r3
b0: f57ff04e dsb st
b4: e5943018 ldr r3, [r4, #24]
b8: e3530000 cmp r3, #0
bc: 0a000000 beq c4 <try_iowrite32+0xc4>
c0: e12fff33 blx r3
c4: e59f3060 ldr r3, [pc, #96] ; 12c <try_iowrite32+0x12c>
c8: e59f206c ldr r2, [pc, #108] ; 13c <try_iowrite32+0x13c>
cc: e5832678 str r2, [r3, #1656] ; 0x678
d0: f57ff04f dsb sy
d4: f57ff04e dsb st
d8: e59f1048 ldr r1, [pc, #72] ; 128 <try_iowrite32+0x128>
dc: e1a04003 mov r4, r3
e0: e1a05001 mov r5, r1
e4: e5912018 ldr r2, [r1, #24]
e8: e3520000 cmp r2, #0
ec: 0a000000 beq f4 <try_iowrite32+0xf4>
f0: e12fff32 blx r2
f4: e59f3044 ldr r3, [pc, #68] ; 140 <try_iowrite32+0x140>
f8: e5843678 str r3, [r4, #1656] ; 0x678
fc: f57ff05a dmb ishst
100: f57ff04e dsb st
104: e5953018 ldr r3, [r5, #24]
108: e3530000 cmp r3, #0
10c: 0a000000 beq 114 <try_iowrite32+0x114>
110: e12fff33 blx r3
114: e59f3010 ldr r3, [pc, #16] ; 12c <try_iowrite32+0x12c>
118: e59f2024 ldr r2, [pc, #36] ; 144 <try_iowrite32+0x144>
11c: e5832678 str r2, [r3, #1656] ; 0x678
120: f57ff05b dmb ish
124: e8bd8038 pop {r3, r4, r5, pc}
128: 00000000 .word 0x00000000
12c: 12345000 .word 0x12345000
130: abcd0001 .word 0xabcd0001
134: abcd0002 .word 0xabcd0002
138: abcd0003 .word 0xabcd0003
13c: abcd0004 .word 0xabcd0004
140: abcd0005 .word 0xabcd0005
144: abcd0006 .word 0xabcd0006
This was a lot of code (somehow that’s what you get with ARM). There are no calls to iowrite32(), so this is done inline for ARM (consistent with the sources).
This requires some translation from ARM opcodes to human language (taken from this page):
- DSB SY — Data Synchronization Barrier: No instruction in program order after this instruction executes until all explicit memory accesses before this instruction complete, as well as all cache, branch predictor and TLB maintenance operations before this instruction complete.
- DSB ST — Like DSB SY, but waits only for data writes to complete.
- DMB ISHST — Data Memory Barrier, operation that waits only for stores to complete, and only to the inner shareable domain (whatever that “inner shareable domain” is).
- DMB ISH — Data Memory Barrier, operation that waits only to the inner shareable domain.
Now let’s decipher the assembly code, which is quite tangled. Luckily, it’s easy to spot the seven write operations as the seven “str” commands in the assembly code. It’s also easy to see that all each iowrite32() starts with an “dsb st” which forces waiting until previous writes has completed. So each iowrite32() spans from a “dsb st” to a “str”. This matches the definition of iowrite32() as __iowmb() and then __raw_writel(…).
The memory barriers are quite clear too:
- wmb() becomes “dsb st”, the full synchronization barrier for writes (which is also issued automatically before each iowrite32).
- rmb() becomes “dsb sy”, the full synchronization barrier for reads and writes
- smp_wmb() becomes “dmb ishst”, the “inner shareable domain” memory barrier for writes
- smp_rmb() becomes “dmb ish”, the “inner shareable domain” memory barrier for reads and writes
Now with writel()
So I through it would be nice to repeat all this with writel(). Spoiler: Nothing thrilling happens here.
Module code (includes omitted):
void try_writel(void) {
void __iomem *p = (void *) 0x12345678;
writel(0xabcd0001, p);
writel(0xabcd0001, p);
writel(0xabcd0002, p);
mmiowb();
writel(0xabcd0003, p);
wmb();
writel(0xabcd0004, p);
rmb();
writel(0xabcd0005, p);
smp_wmb();
writel(0xabcd0006, p);
smp_rmb();
}
EXPORT_SYMBOL(try_writel);
Assembly on 64-bit Intel:
minimodule.ko: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <try_writel>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <try_writel+0x9>
9: b8 01 00 cd ab mov $0xabcd0001,%eax
e: 89 04 25 78 56 34 12 mov %eax,0x12345678
15: 89 04 25 78 56 34 12 mov %eax,0x12345678
1c: b8 02 00 cd ab mov $0xabcd0002,%eax
21: 89 04 25 78 56 34 12 mov %eax,0x12345678
28: b8 03 00 cd ab mov $0xabcd0003,%eax
2d: 89 04 25 78 56 34 12 mov %eax,0x12345678
34: 0f ae f8 sfence
37: b8 04 00 cd ab mov $0xabcd0004,%eax
3c: 89 04 25 78 56 34 12 mov %eax,0x12345678
43: 0f ae e8 lfence
46: b8 05 00 cd ab mov $0xabcd0005,%eax
4b: 89 04 25 78 56 34 12 mov %eax,0x12345678
52: b8 06 00 cd ab mov $0xabcd0006,%eax
57: 89 04 25 78 56 34 12 mov %eax,0x12345678
5e: c9 leaveq
5f: c3 retq
OK, so writel() just translated into a couple of inline “mov” opcodes. There’s even an optimization between the first and second move, so %eax isn’t set twice. Hi-tec, I’m telling you.
And on 32-bit Intel:
minimodule.ko: file format elf32-i386
Disassembly of section .text:
00000000 <try_writel>:
0: b8 01 00 cd ab mov $0xabcd0001,%eax
5: a3 78 56 34 12 mov %eax,0x12345678
a: a3 78 56 34 12 mov %eax,0x12345678
f: b0 02 mov $0x2,%al
11: a3 78 56 34 12 mov %eax,0x12345678
16: b0 03 mov $0x3,%al
18: a3 78 56 34 12 mov %eax,0x12345678
1d: f0 83 04 24 00 lock addl $0x0,(%esp)
22: b0 04 mov $0x4,%al
24: a3 78 56 34 12 mov %eax,0x12345678
29: f0 83 04 24 00 lock addl $0x0,(%esp)
2e: b0 05 mov $0x5,%al
30: a3 78 56 34 12 mov %eax,0x12345678
35: b0 06 mov $0x6,%al
37: a3 78 56 34 12 mov %eax,0x12345678
3c: f0 83 04 24 00 lock addl $0x0,(%esp)
41: c3 ret
...
Disassembly of section .altinstr_replacement:
00000000 <.altinstr_replacement>:
0: 0f ae f8 sfence
3: 0f ae e8 lfence
6: 0f ae e8 lfence
And for ARM, it’s exactly the same code (to the byte) as iowrite32() is an alias for writel(). But I listed it here anyhow for those who don’t take my word for it:
minimodule.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <try_writel>:
0: e92d4038 push {r3, r4, r5, lr}
4: f57ff04e dsb st
8: e59f2118 ldr r2, [pc, #280] ; 128 <try_writel+0x128>
c: e1a04002 mov r4, r2
10: e5923018 ldr r3, [r2, #24]
14: e3530000 cmp r3, #0
18: 0a000000 beq 20 <try_writel+0x20>
1c: e12fff33 blx r3
20: e59f3104 ldr r3, [pc, #260] ; 12c <try_writel+0x12c>
24: e59f1104 ldr r1, [pc, #260] ; 130 <try_writel+0x130>
28: e5831678 str r1, [r3, #1656] ; 0x678
2c: f57ff04e dsb st
30: e5942018 ldr r2, [r4, #24]
34: e1a05001 mov r5, r1
38: e1a04003 mov r4, r3
3c: e3520000 cmp r2, #0
40: 0a000000 beq 48 <try_writel+0x48>
44: e12fff32 blx r2
48: e5845678 str r5, [r4, #1656] ; 0x678
4c: f57ff04e dsb st
50: e59f20d0 ldr r2, [pc, #208] ; 128 <try_writel+0x128>
54: e1a04002 mov r4, r2
58: e5923018 ldr r3, [r2, #24]
5c: e3530000 cmp r3, #0
60: 0a000000 beq 68 <try_writel+0x68>
64: e12fff33 blx r3
68: e59f30bc ldr r3, [pc, #188] ; 12c <try_writel+0x12c>
6c: e59f20c0 ldr r2, [pc, #192] ; 134 <try_writel+0x134>
70: e5832678 str r2, [r3, #1656] ; 0x678
74: f57ff04e dsb st
78: e5942018 ldr r2, [r4, #24]
7c: e1a04003 mov r4, r3
80: e3520000 cmp r2, #0
84: 0a000000 beq 8c <try_writel+0x8c>
88: e12fff32 blx r2
8c: e59f30a4 ldr r3, [pc, #164] ; 138 <try_writel+0x138>
90: e5843678 str r3, [r4, #1656] ; 0x678
94: f57ff04e dsb st
98: e59f2088 ldr r2, [pc, #136] ; 128 <try_writel+0x128>
9c: e1a04002 mov r4, r2
a0: e5923018 ldr r3, [r2, #24]
a4: e3530000 cmp r3, #0
a8: 0a000000 beq b0 <try_writel+0xb0>
ac: e12fff33 blx r3
b0: f57ff04e dsb st
b4: e5943018 ldr r3, [r4, #24]
b8: e3530000 cmp r3, #0
bc: 0a000000 beq c4 <try_writel+0xc4>
c0: e12fff33 blx r3
c4: e59f3060 ldr r3, [pc, #96] ; 12c <try_writel+0x12c>
c8: e59f206c ldr r2, [pc, #108] ; 13c <try_writel+0x13c>
cc: e5832678 str r2, [r3, #1656] ; 0x678
d0: f57ff04f dsb sy
d4: f57ff04e dsb st
d8: e59f1048 ldr r1, [pc, #72] ; 128 <try_writel+0x128>
dc: e1a04003 mov r4, r3
e0: e1a05001 mov r5, r1
e4: e5912018 ldr r2, [r1, #24]
e8: e3520000 cmp r2, #0
ec: 0a000000 beq f4 <try_writel+0xf4>
f0: e12fff32 blx r2
f4: e59f3044 ldr r3, [pc, #68] ; 140 <try_writel+0x140>
f8: e5843678 str r3, [r4, #1656] ; 0x678
fc: f57ff05a dmb ishst
100: f57ff04e dsb st
104: e5953018 ldr r3, [r5, #24]
108: e3530000 cmp r3, #0
10c: 0a000000 beq 114 <try_writel+0x114>
110: e12fff33 blx r3
114: e59f3010 ldr r3, [pc, #16] ; 12c <try_writel+0x12c>
118: e59f2024 ldr r2, [pc, #36] ; 144 <try_writel+0x144>
11c: e5832678 str r2, [r3, #1656] ; 0x678
120: f57ff05b dmb ish
124: e8bd8038 pop {r3, r4, r5, pc}
128: 00000000 .word 0x00000000
12c: 12345000 .word 0x12345000
130: abcd0001 .word 0xabcd0001
134: abcd0002 .word 0xabcd0002
138: abcd0003 .word 0xabcd0003
13c: abcd0004 .word 0xabcd0004
140: abcd0005 .word 0xabcd0005
144: abcd0006 .word 0xabcd0006