Linux on Microblaze HOWTO (part III)

This is part III of my HOWTO on running Linux on Microblaze. The outline is as follows:

Generating the ACE file

The ACE file is what the System ACE chip reads from, and programs the FPGA accordingly. It consists of a sequence of JTAG operations for each necessary task: Configure the FPGA itself, load the software into memory, set the software execution entry point, and kick the software off. All is done with JTAG commands, which the System ACE generates as it scans through its ACE file.

So let’s get down to business.

Create a directory to gather the relevant files, and copy the following into it:

  • The Tcl script for generating ACE file: Found at ISE_DS/EDK/data/xmd/genace.tcl (relative to the path where Xilinx ISE is installed)
  • The bitstream (system.bit) file created by the EDK (explained in part I). Found in the ‘hw’ subdirectory in the export bundle from EDK to SDK. Or just under ‘implementation’ in the processor’s working directory. It’s the same file.
  • The kernel ELF file (simpleImage.xilinx, or the unstripped simpleImage.xilinx.unstrip) created by the kernel build system (explained in part II), found in arch/microblaze/boot/ in the kernel source tree.

Open a command shell (Project > Launch Xilinx Shell if you like), change to this directory and go:

xmd -tcl genace.tcl -hw system.bit -elf simpleImage.xilinx -ace linuxmb.ace -board sp605 -target mdm

which generates a lot of junk files (.svf most notably, which contain JTAG commands in a portable format), and eventually the linuxmb.ace is created (any file name is OK).

In the example above, I assumed that the target is the SP605 board. Looking at the genace.tcl script reveals easily which boards are supported. If it isn’t, it’s not such a big deal. The only reason the board matters is because the System ACE needs to know which device in the JTAG chain to talk with plus some programming parameters. The -board flags to this scrips allows setting the options in a “genace option file” (whatever that means). I would hack the script, though. It looks easier. See here for more information.

Writing to the Compact Flash

First and foremost: If you have a compact flash which boots anything to the FPGA, don’t format it unless you really have to. The System ACE chip (by Xilinx) which reads from the flash directly is a bit picky about the file system format. Preferably use the card which came with the development kit.

And this too: If you just bought a 2 GB flash or so in a general electronics store, odds are that you’ll need to format it.

I explain how to format the flash in another post of mine.

Assuming that the flash is formatted OK, copy the ACE file to the Compact Flash’ root directory. Make sure that

  • there is no other *.ace file in the root directory
  • there is no xilinx.sys in the root directory

It is perfectly OK to have unrelated directories on the flash, so if there are some files on the flash already, I’d suggest creating a directory with just any name (say, “prevroot”) and move everything in the root directory into that one. And then copy the desired ACE file (linuxmb.ace in the example above) into the root directory.

That’s it. The Linux kernel should now boot, but it will complain (the kernel will panic, actually) that it doesn’t have any root filesystem. So…

Setting up the root filesystem

Once the kernel is up, it needs something to mount as a root filesystem, in which it expects to find its init executable and quite a few other files. Xilinx supplies an image of this bundle which were downloaded along with the cross compilers (see part II), in the same directory.

You may recall that I chose to mount root over the network, using NFS. So to create a useful root directory to work with, just change directory to whatever is going to be root (in my case, the one exposed via NFS) and go

zcat /path/to/microblaze_v1.0_le/initramfs_minimal_le.cpio.gz | cpio -i -d -H newc --no-absolute-filenames

This bundle includes a practical set of executables (well, it’s actually a lot of symbolic links to busybox) including vi, watch, dd, grep, gzip, tar, rpm, nc and even httpd (a web server…!). There’s also a rootfs.cpio.gz in the kernel sources when downloaded from Xilinx’ git (linux-2.6-xlnx.git in part II) which I haven’t tried out. But it’s opened in the same way.

You may, of course, compile your own programs, which is discussed in part IV.

There’s no “shutdown” executable, though. There’s “halt” instead.

A test run

Well, plug in the Compact Flash card, turn the power on, and hope to see a green LED blinking, which turns to steady green after a few seconds. When the LED is steady, expect some output on the UART. A typical log for SP605 is given at the end of this post.

At times, the SP605 board’s green LED went on, but nothing runs until SYS_ACE_RESET is pressed (the middle button out of three close to the Compact Flash jack). Looks like a powerup issue.

Is it fast? Is it fast?

This is maybe not such a fair comparison, and still the facts speak for themselves:

On Microblaze @ 75 MHz clock (37 BogoMIPS):

# dd if=/dev/zero of=/dev/null bs=1M count=10k
10240+0 records in
10240+0 records out
10737418240 bytes (10.0GB) copied, 1058.304486 seconds, 9.7MB/s
# dd if=/dev/zero of=/dev/null bs=512 count=100k
102400+0 records in
102400+0 records out
52428800 bytes (50.0MB) copied, 9.531130 seconds, 5.2MB/s

The same thing on my own computer  @ 1.2 GHz (5600 BogoMIPS):

$ dd if=/dev/zero of=/dev/null bs=1M count=10k
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 0.941238 s, 11.4 GB/s
$ dd if=/dev/zero of=/dev/null bs=512 count=100k
102400+0 records in
102400+0 records out
52428800 bytes (52 MB) copied, 0.0443318 s, 1.2 GB/s

According to the BogoMIPSes, Microblaze should have been 150 times slower, not 1000 times slower!

A typical boot log

early_printk_console is enabled at 0x40600000
Ramdisk addr 0x00000003, Compiled-in FDT at 0xc03c2348
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.38.6 (eli@localhost.localdomain) (gcc version 4.1.2) #19 Fri Aug 5 16:40:02 IDT 2011
setup_cpuinfo: initialising
setup_cpuinfo: Using full CPU PVR support
cache: wt_msr_noirq
setup_memory: max_mapnr: 0x8000
setup_memory: min_low_pfn: 0xc0000
setup_memory: max_low_pfn: 0xc8000
On node 0 totalpages: 32768
free_area_init_node: node 0, pgdat c04f515c, node_mem_map c05ca000
 Normal zone: 256 pages used for memmap
 Normal zone: 0 pages reserved
 Normal zone: 32512 pages, LIFO batch:7
pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
pcpu-alloc: [0] 0
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
Kernel command line: console=ttyUL0 ip=::::::dhcp rootfstype=nfs root=/dev/nfs rw nfsroot=10.11.12.13:/shared/nfsroot,tcp
PID hash table entries: 512 (order: -1, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
allocated 655360 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Memory: 123204k/131072k available
SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
NR_IRQS:32
xlnx,xps-intc-1.00.a #0 at 0xc8000000, num_irq=8, edge=0x60
xlnx,xps-timer-1.00.a #0 at 0xc8004000, irq=7
Heartbeat GPIO at 0xc8008000
microblaze_timer_set_mode: shutdown
microblaze_timer_set_mode: periodic
Console: colour dummy device 80x25
Calibrating delay loop... 37.17 BogoMIPS (lpj=185856)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
Initializing cgroup subsys ns
ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
devtmpfs: initialized
NET: Registered protocol family 16
PCI: Probing PCI hardware
bio: create slab <bio-0> at 0
XGpio: /axi@0/gpio@40040000: registered
XGpio: /axi@0/gpio@40020000: registered
XGpio: /axi@0/gpio@40000000: registered
vgaarb: loaded
Switching to clocksource microblaze_clocksource
microblaze_timer_set_mode: oneshot
Switched to NOHz mode on CPU #0
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 4096 (order: 3, 32768 bytes)
TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
TCP reno registered
UDP hash table entries: 256 (order: 0, 4096 bytes)
UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
NET: Registered protocol family 1
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
PCI: CLS 0 bytes, default 32
Skipping unavailable RESET gpio -2 (reset)
GPIO pin is already allocated
audit: initializing netlink socket (disabled)
type=2000 audit(0.429:1): initialized
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
squashfs: version 4.0 (2009/01/31) Phillip Lougher
fuse init (API version 7.16)
msgmni has been set to 240
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
40600000.serial: ttyUL0 at MMIO 0x40600000 (irq = 6) is a uartlite
console [ttyUL0] enabled
brd: module loaded
loop: module loaded
of:xsysace 41800000.sysace: Xilinx SystemACE revision 1.0.12
of:xsysace 41800000.sysace: capacity: 3980592 sectors
 xsa: xsa1
Xilinx SystemACE device driver, major=254
Generic platform RAM MTD, (c) 2004 Simtec Electronics
xilinx_spi 40a00000.spi: at 0x40A00000 mapped to 0xc8080000, irq=0
of:xilinx_emaclite 40e00000.ethernet: Device Tree Probing
Xilinx Emaclite MDIO: probed
of:xilinx_emaclite 40e00000.ethernet: MAC address is now 00:0a:35:49:b2:00
of:xilinx_emaclite 40e00000.ethernet: Xilinx EmacLite at 0x40E00000 mapped to 0xC80A0000, irq=5
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.19.1-ioctl (2011-01-07) initialised: dm-devel@redhat.com
nf_conntrack version 0.5.0 (1925 buckets, 7700 max)
ip_tables: (C) 2000-2006 Netfilter Core Team
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
Registering the dns_resolver key type
registered taskstats version 1
Sending DHCP requests .
PHY: c0020918:07 - Link is Up - 100/Full
., OK
IP-Config: Got DHCP answer from 10.11.12.13, my address is 10.11.12.155
IP-Config: Complete:
 device=eth0, addr=10.11.12.155, mask=255.255.255.0, gw=10.11.12.13,
 host=10.11.12.155, domain=, nis-domain=(none),
 bootserver=10.11.12.13, rootserver=10.11.12.13VFS: Mounted root (nfs filesystem) on device 0:13.
devtmpfs: mounted
Freeing unused kernel memory: 147k freed
Starting rcS...
++ Mounting filesystem
++ Starting telnet daemon
rcS Complete
/bin/sh: can't access tty; job control turned off
/ # NET: Registered protocol family 10
eth0: no IPv6 routers present

Linux on Microblaze HOWTO (part II)

This is part II of my HOWTO on running Linux on Microblaze. The outline is as follows:

Kernel compilation in general

Compiling a Linux kernel traditionally consists of the following steps (some of which are elaborated further below):

  • Obtaining a kernel source tree.
  • Configure the kernel. Which all in all means to set up a file named “.config” in the kernel source’s root directory.
  • Compile actual kernel, ending up with an executable image.
  • Compile the post-boot loadable kernel modules.
  • Put everything in its place, set up the bootloader
  • Pray and boot

When compiling for Microblaze, the process is somewhat different:

  • Cross compilation: The compiled binaries run on a processor different from the one doing the compilation.
  • Kernel modules are most likely not used at all. They are a bit pointless when the hardware is known in advance, and also add some complexity in setting up the entire system for boot. Besides, modprobe on a Microblaze can take forever.
  • The hardware configuration is custom made, and the kernel needs to be informed about it (through the Device Tree Structure)

Downloading kernel sources

Note that all kernels compile for all target architectures. If you download a kernel from Xilinx’ repository, it may have the parts relevant to Xilinx slightly more updated. The emphasis is on “may”.
The “vanilla” kernel (maintained by Linus Torvalds) can be downloaded from the main kernel archive or one of its mirrors. Several other flavors float around, including Xilinx own git

git clone git://git.xilinx.com/linux-2.6-xlnx.git

or Petalogix’ git (after all, they do a lot of maintenance on the Xilinx devices):

git clone git://developer.petalogix.com/linux-2.6-microblaze.git

The question is always which kernel is best. The answer is that it’s a bit of a gamble. It’s usually almost exactly the same piece of software, with git version having the latest changes. That means the latest bug fixes, new drivers, but also the latest, undocumented and undiscovered bugs. Vanilla kernels tend to be more conservative, but the only rule is that there are no rules. So in short, toss a coin and pick one.

Personally, I compiled the kernel which happened to be on my hard disk for other purposes.

Cross compilers

The good news is that there’s no need to compile the GNU tools. As a matter of fact, this part turned out to be surprisingly painless. The cross compiler and binutils binaries + initramfs images can be downloaded with

$ git clone git://git.xilinx.com/xldk/microblaze_v1.0_le.git
$ git clone git://git.xilinx.com/xldk/microblaze_v1.0.git

Choose one, depending on whether you prefer little endian or big endian for your processor. I picked little endian, but there’s one initramfs in the big endian bundle which isn’t there for the little endian set (which only has the “minimal” image).

One of the files fetched by git is microblazeel-unknown-linux-gnu.tar.gz (gzipped tarball) for the little endian version and mb_gnu_tools_bin.tar.bz (bzipped tarball) for big endian. I’ll leave the latter, because I didn’t use it.

There’s no need to install anything, and no need to be root (actually, doing this as root is pretty unwise). Just untar the tarball of your choice in any directory. Tar generates several subdirectories, but we’re after the cross compilers. Or more precisely, to make the kernel build system use them. This boils down to this:

export CROSS_COMPILE=/home/myhomedir/untarred-to/microblazeel-unknown-linux-gnu/bin/microblazeel-unknown-linux-gnu-

First of all, note the dash at the end of the statement. The whole string is a prefix for all compilation commands made by the kernel build system. It is often recommended to set the path to where the compilers are, and then set CROSS_COMPILE to a shorter prefix. I don’t see the point in polluting the overall path. The build environment has no problem with the statement above.

It has also crossed my mind to use the mb-gcc and friends, which are part of the SDK. But that may require another paid-for license, in case different people do the FPGA and software (which usually is the case).

And to wrap this up: If I’ll ever need to build a cross compiler from scratch, I would start with looking at Buildroot (and another page about it) or following this guide (I haven’t tried either, though).

Kernel configuration

Setting this up correctly is a tedious process, and even the most seasoned kernel hackers may not get it right on the first go. If it’s your first time, prepare to spend quite a few hours on this. The less experienced you are with Linux in general, the more time will you need to spend to make an educated guess about your need for each feature offered.

You can try to use my configuration file, or at least start off with it. It was made for against a 2.6.38 kernel, and booted well as shown in part III. Copy the file as .config on the kernel source’s root, and start with oldconfig.

The commands involved are basically (all “make” commands issues at the kernel source’s top directory):

  • Clean up everything, including the .config file if present. This is not necessary if you just uncompressed your kernel. It’s actually rarely necessary at all: “make ARCH=microblaze mrproper”. This will delete .config! (I know I just said it).
  • Adopt an existing .config file: “make ARCH=microblaze oldconfig”. This is useful in particular when switching to another kernel version or flavor. Only questions about new features are asked. If you downloaded my configuration file, I would suggest not to turn on options that are offered while running oldconfig, unless they are clearly Xilinx related.
  • Configure the kernel: “make ARCH=microblaze xconfig”, “make ARCH=microblaze gconfig” or “make ARCH=microblaze menuconfig” (pick one). These applications present the kernel options in a fairly user-friendly manner, and eventually save the result to .config. I recommend xconfig, because it’s graphic and has a search feature, which turns out very useful.

When targeting an embedded platform, the strategy is to enable whatever is necessary in the kernel itself, and not count on kernel modules. A second issue is to eliminate anything unnecessary from the kernel. This is not just a matter of the kernel image’s size and speed, but enabling components which have nothing to do there can cause the kernel compilation to fail, and even worse, the kernel to crash at boot. Each architecture maintains a set of #include headers, and some kernel components may assume certain things that these architecture-dependent parts haven’t caught up with. So the rule that is whatever hasn’t been tested, won’t necessarily work. Enabling an esoteric keyboard driver on a Microblaze processor may very well fail the boot, simply because nobody cares.

In particular, you most likely want to follow these:

  • Under Platform Options, set CONFIG_KERNEL_BASE_ADDR to where your DDR RAM begins (0xC0000000 on my processor), the targeted FPGA family as well as the other parameters (USE_*). The USE_* parameters’ correct values can be found in the .dts file. Just copy the values of the processor elements with the same names.
  • Also set
    CONFIG_SERIAL_UARTLITE=y
    CONFIG_SERIAL_UARTLITE_CONSOLE=y
    CONFIG_SERIAL_CORE=y
    CONFIG_SERIAL_CORE_CONSOLE=y
  • Since we’re not going to use any boot loader, the kernel command line needs to be compiled into the kernel itself: Enable CMDLINE_BOOL (default bootloader kernel argument) and set it to something useful. As for the console, set it to console=ttyUL0, or nothing goes to console after the two first lines sent to console from early_printk_console (CONFIG_CMDLINE_FORCE may be necessary as well. It doesn’t hurt in the absence of a boot loader anyhow)
  • Enable CONFIG_MSDOS_FS and CONFIG_VFAT_FS in kernel (not module), so that the SystemACE can be read.
  • Enable CONFIG_XILINX_SYSACE
  • Enable CONFIG_XILINX_EMACLITE and CONFIG_FB_XILINX
  • Disable the FTRACE config option (under kernel hacking, compilation fails) instead of using patch.

And for your own sake, make a copy of the .config file every now and then as you work on it. It’s very easy to delete it by mistake or to mess it up in general.

Setting the Linux boot parameters correctly is very important, because if they’re wrong, kernel recompilation is they only way to fix it in the absence of a boot loader. I’ve chosen to mount the root directory from the network, but note that /dev/sxa is the Compact flash itself (with /dev/sxa1 is the first partition, for example). So it’s fairly simple to add a partition to the flash device, and put a regular root filesystem there. Maybe I’ll do that myself and update this post.

Anyhow, my choice for the Linux boot parameters was

console=ttyUL0 ip=::::::dhcp rootfstype=nfs root=/dev/nfs rw nfsroot=10.11.12.13:/shared/nfsroot,tcp

where “/shared/nfsroot” is the shared NFS directory on the server with IP 10.11.12.13. This command is suitable for getting the root from the network, which is very convenient for development. This setting requires a DHCP server on the LAN. In case you don’t want to configure a DHCP server, use the ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:::off format instead. Documentation/filesystems/nfs/nfsroot.txt in the kernel sources has more about booting from NFS. I’ve also written a post about booting a diskless PC from network, but it’s a bit of an overkill.

In case you’re interested in how the whole configuration thing comes together, let’s take CONFIG_EARLY_PRINTK for example. In arch/microblaze/kernel/Makefile, one of the lines says:

obj-$(CONFIG_EARLY_PRINTK)    += early_printk.o

On the other hand, in the config file it can say

CONFIG_EARLY_PRINTK=y

So when the Makefile is executed, the target early_prink.o is added to either obj-y, obj-m or obj-n. obj-y is the list of objects to be inserted into the kernel, obj-m is the list of modules, and obj- is the junk list. The configuration rules are given in the Kbuild files, next to the Makefiles.

A small Makefile fix

As of 2.6.38, there is a small error in the arch/microblaze/boot/Makefile, which makes the build system always attempt making an U-Boot image, which is not necessary in our case. This may result in an error message (saying “mkimage” wasn’t found), when everything is actually OK. So in the part saying

$(obj)/simpleImage.%: vmlinux FORCE
 $(call if_changed,cp,.unstrip)
 $(call if_changed,objcopy)
 $(call if_changed,uimage)
 $(call if_changed,strip)
 @echo 'Kernel: $@ is ready' ' (#'`cat .version`')'

remove or comment out the line saying “$(call if_changed,uimage)”.

Compiling the kernel

Before starting: You didn’t forget to set CROSS_COMPILE and copy the updated xilinx.dts file to its place… right?

I prefer cleaning up before compiling:

make ARCH=microblaze clean
rm arch/microblaze/boot/simpleImage.*

This is a good time to ask why the image file isn’t cleaned by “make clean”. To be fixed, I suppose.

And then, the compilation is just

make -j 8 ARCH=microblaze simpleImage.xilinx

Note that the “.xilinx” suffix corresponds to the xilinx.dts file in the arch/microblaze/boot/dts/ directory. If another .dts file should be made effective, change the suffix.

The “-j 8″ means that 8 compilation processes run in parallel, which is suitable for a quad processor with hyperthreading. Skip this option or use another number, depending on your computer, your spare time and your need to see the logic of the events.

The basic UNIX rule is that everything went fine unless an error message appeared. A more explicit confirmation is that it said

OBJCOPY arch/microblaze/boot/simpleImage.xilinx

somewhere close to the end, and that the arch/microblaze/boot/simpleImage.xilinx is indeed there, and has a date stamp that makes sense.

If and when you get errors, well, there’s no simple  recipe to solve that. The easiest way is to eliminate the need to compile that certain file by changing the kernel configuration, if the functionality is indeed unnecessary. Otherwise your best friends are Google and your brain, not necessarily in that order.

As for the Device Tree, it was compiled into a .dtb file (the Device Tree binary blob), which can be found in the same directory as the just generated kernel image. The Device Tree Compiler (dtc) comes with the kernel sources, and can be found in scripts/dtc.

And just to wrap this up: If you insist on seeing all the commands issued instead of the otherwise laconic output, there the KBUILD_VERBOSE flag. For example,

make ARCH=microblaze KBUILD_VERBOSE=1 clean

With a compiled kernel image at hand (which already has the Device Tree built-in), all that’s left is to set up the Compact Flash and boot. Go to part III of this HOWTO.

A few other make statements

For completeness:

  • Clean up any compiled binaries: Recommended after a change in .config: “make ARCH=microblaze clean”
  • Generate loadable modules: “make ARCH=microblaze modules”. Not necessary if everything needed is compiled into the kernel.
  • And then gather the modules in a neat directory (making sure you don’t have a /lib/modules directory with the same version number): “make ARCH=microblaze modules_install”. This will write to /lib/modules on the local machine, so if you happen to compile exactly the same kernel version for your own PC and the embedded target, the kernel modules the PC relies on will be overwritten.

Linux on Microblaze HOWTO (part I)

This is part I of my HOWTO on running Linux on Microblaze. The outline is as follows:

Introduction

This HOWTO goes through the procedures for getting a simple Linux system running on a Xilinx Microblaze processor. The examples are given for an SP605 evaluation board, but almost everything here applies for other FPGAs and boards as well. The Xilinx software version used here is 13.2.

There are quite a few variants on how to get the bitstream and Linux kernel into their right places in the FPGA. The approach taken here is to boot up from the Compact Flash alone by writing a file to it. No bootloader is used in this howto; the SystemACE chip is responsible for loading both the FPGA bitstream and Linux kernel image, and it will do so reading one single (.ace) file. The main advantage of this approach is that there’s no need to set up a boot loader, which is yet another piece of software that can go wrong. The main disadvantage is that a bootloader allows some tweaking of the kernel configuration at boot time, which has to be done by recompiling the kernel otherwise.

The root filesystem is mounted from network (NFS) in this HOWTO.

I’m assuming the following prerequisites:

  • You have the Xilinx tools set up properly, and have managed to compile and run a simple standalone “Hello, World” application with the EDK/SDK (having loaded the code to the FPGA in any way, we’ll get to that)
  • You’ve actually seen the RS-232 console data on a terminal, and feel confident about it (otherwise you may work hard to figure out why everything is stuck, when it’s actually your terminal window’s problem).
  • You’re running on one of the evaluation boards, or know how to set up the processor to work with your own (and have that tested already)
  • Your board has a systemACE flash chip (recent evaluation boards usually do)
  • You have access to a machine running Linux on a computer. Compiling the kernel will require this. The Xilinx tools can be run on whatever’s your cup of tea.
  • You have the ability to read and write files to a Compact Flash. This is most easily done with a simple adapter to a PC computer, which should be available in camera or computer accessories shops. Chances are you have one without necessarily being aware of it.

An outline of the steps

So this is what we’ll do:

  • Set up a Microblaze processor in the Xilinx EDK so it can run Linux.
  • Generate the processor, so an FPGA bitstream is at hand.
  • Export the processor to the Xilinx SDK and compile a dummy C application, so that  necessary metadata files are generated
  • Generate a Device Tree file (.dts) based upon files created by EDK/SDK, and copy it into the Linux kernel sources, so Linux is in sync with the EDK regarding what it’s running on.
  • Configure the kernel and compile it.
  • Create a .ace file from the FPGA bitstream and kernel image just compiled.
  • Set up the Compact Flash card.
  • Boot and hope for good

And of course, certain software tools will need to be downloaded for this. We’ll come to this.

Setting up the processor

If you’re really lazy about this, you can use the minimal processor I’ve generated for the SP605 board. Unzip, double-click system.xmp, and skip to after the bullets below. It will work on that board only, of course.

Otherwise: Start Platform Studio (EDK) and create a new platform, based upon the Wizard’s defaults.

Following a Microblaze on Linux guide, in particular the part regarding minimal hardware requirements, there a need to make sure that the hardware has an MMU with two regions, a timer, an interrupt controller and a UART with an interrupt line. In the platform studio it goes like this:

Starting off with the Wizard’s defaults,

  • Double click “microblaze_0″ on the Ports tab, and set the Linux with MMU preset on the Configuration wizard showing up. This will take care of most settings.
  • Still in the ports view, add an AXI Interrupt Controller (under Clock, Reset and Interrupt in the IP Catalog). Accept default settings. Make a new connection for its irq output, and connect it to the microblaze_0′s interrupt input pin.
  • Pick the RS232_Uart_1 and make a new connection for the interrupt line. Connect that signal to the interrupt controller.
  • Add an AXI Timer/Counter, and accept defaults. Make a new connection for the interrupt, and connect it to the interrupt controller.
  • Connect the interrupts of the Ethernet lite, SPI Flash, IIC SFP, IIC EEPROM, IIC_DVI, and SysACE cores to the interrupt controller as well.

Then generate bitstream, export to SDK, and run the SDK, adopting this hardware platform. The goal of this is to generate a .mss file, which will be needed later. For this to happen, make a new C project (“Hello World” will do just fine) and compile it.

There is no need to “update the bitstream” like with standalone applications: The Linux kernel can take care of itself, without having its entry address hardwired in the FPGA’s block RAM. We’ll use the system.bit, and not the download.bit (even though the latter works too).

Creating a Device Tree file

The purpose of this stage is to generate a .dts file, which is the format expected by the kernel build environment. It informs the kernel about the structure of the processor and its peripherals. The device tree structure is discusses further here.

If you chose to download and use my processor with no changes whatsoever, you can also get my DTS file. Just copy it to arch/microblaze/boot/dts/ in the to-be compiled kernel source tree.

To make your own .dts file, first create a special directory, and make it the working directory of your shell.

The device tree file is generated automatically with the libgen utility with the help of a Tcl script. As of ISE 13.2, this script needs to be loaded separately with git:

bash> git clone git://git.xilinx.com/device-tree.git

This generates a device-tree directory. Another web page explains how to make SDK recognize the script, but I prefer command line for things like this. Another post of mine explains the device tree further.

Copy the system.xml file from the directory to which you exported to SDK (in the “hw” subdirectory), into the current one. Then copy system.mss from the project’s BSP directory. It will have a name like hello_world_bsp_0.

Edit the copy you made of system.mss, so that the BEGIN OS to END part reads

BEGIN OS
 PARAMETER OS_NAME = device-tree
 PARAMETER OS_VER = 0.00.x
 PARAMETER PROC_INSTANCE = microblaze_0
END

and not “standalone” for OS.

And then run libgen as follows (make sure it’s in the PATH. The easiest way is to launch a “Xilinx shell” from the EDK’s project menu):

libgen -hw system.xml -lp device-tree -pe microblaze_0 -log libgen.log system.mss

Which generates a xilinx.dts in microblaze_0/libsrc/device-tree_v0_00_x. Copy this file to arch/microblaze/boot/dts/ in the to-be compiled kernel source tree. If you can’t find the file there, and libgen didn’t complain about some error, you may have forgotten to edit system.mss as mentioned just above.

Now let’s go on to compiling the kernel, in part II.

The Device Tree for embedded Linux and Xilinx FPGAs

Spoiler

It’s very likely that you don’t need to read this. If all you want is to get a Linux kernel to detect a Microblaze processor on an Xilinx FPGA, the relevant information is in another post of mine. This post goes into the details which are necessary to understand, if you want to write a kernel driver for a device tree mapped peripheral.

Why a device tree is necessary

The main issue with running Linux on an FPGA is that the Linux kernel needs to know what peripherals it has and where it can find them. On PC computers this problem was solved many years ago with the PCI bus: The BIOS detects the peripherals, allocates their addresses and interrupts and tells the operating system what it has and where it can be found. In the embedded world, this information was hardcoded into pieces of the kernel sources, which were written specifically for every board. With many boards out there, the kernel source grew way too fast. This far-from-optimal solution is not feasible with a soft processor, whose peripherals are configured per case. Hacking the kernel sources to match the FPGA is a recipe for bugs, crashes and being stuck with a certain kernel forever.

The elegant solution for this is the Flattened Device Tree. The idea is to create some binary data structure, which is either linked into the kernel image or given to it during boot. This binary blob contains the information about the processor itself and its peripherals, including the addresses, interrupts and several application-specific parameters. So the drivers for these peripherals are written very similar to PCI drivers: They declare what peripherals they support, and obtain their resources from a standard kernel API.

The code for Flattened Device Tree and Open Firmware resides in drivers/of in the kernel tree. The relevant include file is include/linux/of.h.

Generation

Note that at least for Xilinx FPGAs, there is no need to generate the device tree manually. Rather, get a copy of the device tree generator with

bash> git clone git://git.xilinx.com/device-tree.git

which basically consists of a TCL script run by libgen and a configuration file. The device tree generator’s page explains how to make SDK recognize the script, but there’s no reason to play around with SDK for that.

Instead, go

libgen -hw /path/to/system.xml -lp /path/to/device-tree -pe microblaze_0 -log libgen.log system.mss

Which generates a system.dts in microblaze_0/libsrc/device-tree_v0_00_x

The system.mss file is generated as a byproduct when compiling just any a project within SDK, and is found under the directory with the _bsp_n suffix. I still need to find out how to create the file from the command line.

It needs to be modified, so that the BEGIN OS to END part reads

BEGIN OS
 PARAMETER OS_NAME = device-tree
 PARAMETER OS_VER = 0.00.x
 PARAMETER PROC_INSTANCE = microblaze_0
END

and not “standalone” for OS.

To get the system.xml file (which was necessary to create the system.mss), go Project > Export Hardware to SDK in the EDK platform studio. Or

make -f system.make exporttosdk

from the project’s home directory.

The correct setup of the device tree entry can be found in the Documentation/devicetree/bindings directory of the kernel sources. The xilinx.txt file describes the bindings for Xilinx peripherals, and explains how information in the system.mhs file is translated into a xilinx.dts.

As part of a full kernel compilation, the .dts is compiled into a .dtb file (the Device Tree binary blob), which can be found in the same directory as the generated kernel image. The Device Tree Compiler (dtc) comes with the kernel sources, and can be found in scripts/dtc.

A sample entry

The following example is given there for a Uartlite (which we’ll follow on below):

opb_uartlite_0: serial@ec100000 {
 device_type = "serial";
 compatible = "xlnx,opb-uartlite-1.00.b";
 reg = <ec100000 10000>;
 interrupt-parent = <&opb_intc_0>;
 interrupts = <1 0>; // got this from the opb_intc parameters
 current-speed = <d#115200>;     // standard serial device prop
 clock-frequency = <d#50000000>; // standard serial device prop
 xlnx,data-bits = <8>;
 xlnx,odd-parity = <0>;
 xlnx,use-parity = <0>;
 };

It’s recommended to have a look at arch/microblaze/platform/generic/system.dts in the kernel sources for a fullblown file. Or one you’ve generated yourself, for that matter.

Declarations in a kernel module driver

Device tree mapped instances are treated by the kernel very much like PCI devices, only the source of information is the DTB (Device Tree Binary) rather than from the BIOS.

The parallel to PCI’s Vendor/Product IDs is an entry looking like this (taken from uartlite.c):

static struct of_device_id ulite_of_match[] __devinitdata = {
 { .compatible = "xlnx,opb-uartlite-1.00.b", },
 { .compatible = "xlnx,xps-uartlite-1.00.a", },
 {}
};

MODULE_DEVICE_TABLE(of, ulite_of_match)

Which is then bound to a driver with

static struct of_platform_driver ulite_of_driver = {
 .probe = ulite_of_probe,
 .remove = __devexit_p(ulite_of_remove),
 .driver = {
 .name = "uartlite",
 .owner = THIS_MODULE,
 .of_match_table = ulite_of_match,
 },
}

and then, finally, exposed to the kernel with

static inline int __init ulite_of_register(void)
{
 pr_debug("uartlite: calling of_register_platform_driver()\n");
 return of_register_platform_driver(&ulite_of_driver);
}

somewhere at the end of the driver’s code. This format is very similar to the declaration of PCI devices, so if this is unclear, I’d suggest learning how to do it the PCI way, which is by far more documented.

And by the way, when the kernel is configured to support it, the device tree can be viewed in human-readable format in /proc/device-tree.

The of_device_id structure

The structure is defined in include/linux/mod_devicetable.h as

struct of_device_id
{
 char    name[32];
 char    type[32];
 char    compatible[128];
#ifdef __KERNEL__
 void    *data;
#else
 kernel_ulong_t data;
#endif
};

Surprisingly enough, the lengths of the entries are fixed and limited.The three strings, name, type and compatible are compared as strings (with strcmp(), see of/base.c) with the device tree’s node’s data. Everything declared (that is, non-NULL) in the structure must be equal with the node’s info for a match. In other words, NULLs are wildcards.

In the declaration example above, only the “compatible” part was declared, so any device matching the string exactly triggers off a probe on the driver.

 

The Xilinx EDK “update bitstream” process: A closer look

Introduction

The Xilinx Platform Studio (EDK) has this “update bitstream” function, which I wasn’t so clear about, despite its documentation page. Its icon says “BRAM INIT” which turns out to be more accurate than expected. So what happens during this process? When is it necessary?

If you’re into running a Linux kernel, you’re most likely wasting your time reading this, because the Linux kernel is kicked off directly from the external RAM, and hence this mangling isn’t necessary. To set up a Linux bitstream, see another post of mine.

Having that said, let’s look at the problem this functions solves: A Microblaze processor starts executing at address 0 unless told otherwise. Its interrupt vectors are at near-zero addresses as well. These addresses are mapped to an FPGA block RAM.

What this block RAM should contain is a jump to the application’s entry point. On a SP605 board, this is most likely the beginning of the DDR memory, Oxc0000000. So when the processor kicks off, this block RAM’s address zero should contain:

00000000 <_start>:
 0:    b000c000     imm    -16384
 4:    b8080000     brai    0

Which is Microblazish for “Jump to Oxc0000000″ (note the lower 16 bits of both commands).

When a system is booted, there are two phases: First, the FPGA is loaded with its bitstream, and then the external memory, containing the bulk of execution code. And then the processor is unleashed.

So the block memory’s correct content needs to be included in the bitstream itself. But when the processor is implemented from its logic elements, it isn’t clear what should be written there. It’s only when the software is linked, that the addresses of the different segments are known.

But software compilation and linking requires the knowledge of the processor’s memory map, which is generated while the processor is implemented. So there’s a chicken-and-egg situation here.

The egg was first

The solution is that block RAM’s content is fixed after the software is compiled and linked. The reset and interrupt vectors are included in the ELF file generated by the software linker, and are mapped to the block RAM’s addresses. The “update bitstream” process reads the ELF file, finds the relevant region, and updates the bitstream file, producing the download.bit file. That’s why choosing the ELF file is necessary for this process.

Necessity

The original problem was that the execution starts from address zero. But if the ELF file points at the real starting point, and this is properly communicated to the processor at startup, there’s no need to set up the block RAM at all. Well, assuming that the executable takes care of interrupts and exception vectors soon enough. This is the case with Linux kernel images, for example, for which there is no need to update the bitstream.

Some gory details

The “update bitstream” process launches a command like

bitinit -p xc6slx45tfgg484-3 system.mhs -pe microblaze_0 sdk/peripheral_tests_0/Debug/peripheral_tests_0.elf \
 -bt implementation/system.bit -o implementation/download.bit

which takes place in two phases. In the first phase, the system.mhs file is read and parsed, so that the memory map is known and the block RAM is identified. This program then runs something like

data2mem -bm "implementation/system_bd" -p xc6slx45tfgg484-3 -bt "implementation/system.bit" -bd "sdk/peripheral_tests_0/Debug/peripheral_tests_0.elf" tag microblaze_0 -o b implementation/download.bit

Which is the action itself. Data2mem is a utility for mangling bitstreams so that their block RAMs contain desired data. The -bm flag tells data2mem to get the block RAM map from implementation/system_bd.bmm, which can be

// BMM LOC annotation file.
//
// Release 13.2 - Data2MEM O.61xd, build 2.2 May 20, 2011
// Copyright (c) 1995-2011 Xilinx, Inc.  All rights reserved.

///////////////////////////////////////////////////////////////////////////////
//
// Processor 'microblaze_0', ID 100, memory map.
//
///////////////////////////////////////////////////////////////////////////////

ADDRESS_MAP microblaze_0 MICROBLAZE-LE 100

 ///////////////////////////////////////////////////////////////////////////////
 //
 // Processor 'microblaze_0' address space 'microblaze_0_bram_block_combined' 0x00000000:0x00001FFF (8 KBytes).
 //
 ///////////////////////////////////////////////////////////////////////////////

 ADDRESS_SPACE microblaze_0_bram_block_combined RAMB16 [0x00000000:0x00001FFF]
 BUS_BLOCK
 microblaze_0_bram_block/microblaze_0_bram_block/ramb16bwer_0 [31:24] INPUT = microblaze_0_bram_block_combined_0.mem PLACED = X3Y30;
 microblaze_0_bram_block/microblaze_0_bram_block/ramb16bwer_1 [23:16] INPUT = microblaze_0_bram_block_combined_1.mem PLACED = X2Y30;
 microblaze_0_bram_block/microblaze_0_bram_block/ramb16bwer_2 [15:8] INPUT = microblaze_0_bram_block_combined_2.mem PLACED = X2Y32;
 microblaze_0_bram_block/microblaze_0_bram_block/ramb16bwer_3 [7:0] INPUT = microblaze_0_bram_block_combined_3.mem PLACED = X2Y36;
 END_BUS_BLOCK;
 END_ADDRESS_SPACE;

END_ADDRESS_MAP;

So this file defines the addresses covered as well as the physical positions of these block RAMs in the logic fabric.

The -bd flag points at the ELF file to get the data from, with the “tag microblaze_0″ part saying that only the memories tagged microblaze_0 in the .bmm file should be handled, and the rest ignored.

 

Microblaze ELF: A small look inside

This is a small reverse-engineering of the ELF file, as generated by Xilinx’ SDK for a simple standalone application targeted for the SP605 board.

ELF headers

Looking into the ELF file, we have something like this:

> mb-objdump --headers sdk/peripheral_tests_1/Debug/peripheral_tests_1.elf

sdk/peripheral_tests_1/Debug/peripheral_tests_1.elf:     file format elf32-microblazele

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
 0 .vectors.reset 00000008  00000000  00000000  000000b4  2**2
 CONTENTS, ALLOC, LOAD, READONLY, CODE
 1 .vectors.sw_exception 00000008  00000008  00000008  000000bc  2**2
 CONTENTS, ALLOC, LOAD, READONLY, CODE
 2 .vectors.interrupt 00000008  00000010  00000010  000000c4  2**2
 CONTENTS, ALLOC, LOAD, READONLY, CODE
 3 .vectors.hw_exception 00000008  00000020  00000020  000000cc  2**2
 CONTENTS, ALLOC, LOAD, READONLY, CODE
 4 .text         0000653c  c0000000  c0000000  000000d4  2**2
 CONTENTS, ALLOC, LOAD, CODE
 5 .init         0000003c  c000653c  c000653c  00006610  2**2
 CONTENTS, ALLOC, LOAD, READONLY, CODE
 6 .fini         0000001c  c0006578  c0006578  0000664c  2**2
 CONTENTS, ALLOC, LOAD, READONLY, CODE
 7 .ctors        00000008  c0006594  c0006594  00006668  2**2
 CONTENTS, ALLOC, LOAD, DATA
 8 .dtors        00000008  c000659c  c000659c  00006670  2**2
 CONTENTS, ALLOC, LOAD, DATA
 9 .rodata       00000986  c00065a4  c00065a4  00006678  2**2
 CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 .sdata2       00000006  c0006f2a  c0006f2a  00006ffe  2**0
 ALLOC
 11 .sbss2        00000000  c0006f30  c0006f30  000071d8  2**0
 CONTENTS
 12 .data         000001d0  c0006f30  c0006f30  00007000  2**2
 CONTENTS, ALLOC, LOAD, DATA
 13 .eh_frame     00000004  c0007100  c0007100  000071d0  2**2
 CONTENTS, ALLOC, LOAD, DATA
 14 .jcr          00000004  c0007104  c0007104  000071d4  2**2
 CONTENTS, ALLOC, LOAD, DATA
 15 .sdata        00000000  c0007108  c0007108  000071d8  2**0
 CONTENTS
 16 .sbss         00000000  c0007108  c0007108  000071d8  2**0
 CONTENTS
 17 .tdata        00000000  c0007108  c0007108  000071d8  2**0
 CONTENTS
 18 .tbss         00000000  c0007108  c0007108  000071d8  2**0

 19 .bss          00000d78  c0007108  c0007108  000071d8  2**2
 ALLOC
 20 .heap         00000400  c0007e80  c0007e80  000071d8  2**0
 ALLOC
 21 .stack        00000400  c0008280  c0008280  000071d8  2**0
 ALLOC
 22 .debug_line   0000779f  00000000  00000000  000071d8  2**0
 CONTENTS, READONLY, DEBUGGING
 23 .debug_info   00008b11  00000000  00000000  0000e977  2**0
 CONTENTS, READONLY, DEBUGGING
 24 .debug_abbrev 000028e7  00000000  00000000  00017488  2**0
 CONTENTS, READONLY, DEBUGGING
 25 .debug_aranges 000006c0  00000000  00000000  00019d70  2**3
 CONTENTS, READONLY, DEBUGGING
 26 .debug_macinfo 0007f541  00000000  00000000  0001a430  2**0
 CONTENTS, READONLY, DEBUGGING
 27 .debug_frame  00000f10  00000000  00000000  00099974  2**2
 CONTENTS, READONLY, DEBUGGING
 28 .debug_loc    00003f80  00000000  00000000  0009a884  2**0
 CONTENTS, READONLY, DEBUGGING
 29 .debug_pubnames 00000fbe  00000000  00000000  0009e804  2**0
 CONTENTS, READONLY, DEBUGGING
 30 .debug_str    000018d5  00000000  00000000  0009f7c2  2**0
 CONTENTS, READONLY, DEBUGGING
 31 .debug_ranges 00000078  00000000  00000000  000a1097  2**0
 CONTENTS, READONLY, DEBUGGING

Even though this is a lot of mumbo-jumbo, there are three main parts. The reset and interrupt vectors, around address zero, the main parts of the ELF (.text, .data and such) at Oxc0000000 and on, and the debug parts which have no memory allocation at all.

The reset branch to application

This is interesting to compare with the Microblaze’s memory map. It can be deduced from the .mhs file, but hey, the log file (with .log suffix) has this segment:

Address Map for Processor microblaze_0
 (0000000000-0x00001fff) microblaze_0_d_bram_ctrl    microblaze_0_dlmb
 (0000000000-0x00001fff) microblaze_0_i_bram_ctrl    microblaze_0_ilmb
 (0x40000000-0x4000ffff) Push_Buttons_4Bits    axi4lite_0
 (0x40020000-0x4002ffff) LEDs_4Bits    axi4lite_0
 (0x40040000-0x4004ffff) DIP_Switches_4Bits    axi4lite_0
 (0x40600000-0x4060ffff) RS232_Uart_1    axi4lite_0
 (0x40800000-0x4080ffff) IIC_SFP    axi4lite_0
 (0x40820000-0x4082ffff) IIC_EEPROM    axi4lite_0
 (0x40840000-0x4084ffff) IIC_DVI    axi4lite_0
 (0x40a00000-0x40a0ffff) SPI_FLASH    axi4lite_0
 (0x40e00000-0x40e0ffff) Ethernet_Lite    axi4lite_0
 (0x41800000-0x4180ffff) SysACE_CompactFlash    axi4lite_0
 (0x74800000-0x7480ffff) debug_module    axi4lite_0
 (0xc0000000-0xc7ffffff) MCB_DDR3    axi4_0

So obviously all the main ELF parts go directly to the DDR memory (that isn’t much of a surprise), and the reset/interrupt go to the internal block ram.

A quick disassembly reveals the gory details:

> mb-objdump --disassemble sdk/peripheral_tests_1/Debug/peripheral_tests_1.elf
sdk/peripheral_tests_1/Debug/peripheral_tests_1.elf:     file format elf32-microblazele

Disassembly of section .vectors.reset:

00000000 <_start>:
 0:    b000c000     imm    -16384
 4:    b8080000     brai    0
Disassembly of section .vectors.sw_exception:

00000008 <_vector_sw_exception>:
 8:    b000c000     imm    -16384
 c:    b8081858     brai    6232
Disassembly of section .vectors.interrupt:

00000010 <_vector_interrupt>:
 10:    b000c000     imm    -16384
 14:    b80818a4     brai    6308
Disassembly of section .vectors.hw_exception:

00000020 <_vector_hw_exception>:
 20:    b000c000     imm    -16384
 24:    b8081870     brai    6256
Disassembly of section .text:

c0000000 <_start1>:
c0000000:    b000c000     imm    -16384
c0000004:    31a07108     addik    r13, r0, 28936
c0000008:    b000c000     imm    -16384
c000000c:    30406f30     addik    r2, r0, 28464
(... and it goes on and on ...)

So let’s look at the reset vector at address zero. The first IMM opcode loads C000 as the upper 16 bits for the command following, which is a branch immediate command. Together, they make a jump to Oxc000000. Likewise, the software exception jumps to Oxc0001858 and so on.

Since only the block RAM part can be included in the download.bit bitfile, only these jump vectors depend on the ELF file during the “Update bitfile” process. That’s why one gets away with not running this process, even when the ELF has been modified with a plain recompilation.

And now to the bootloop ELF

So what is the bootloop code doing? The headers are no more impressive than

> mb-objdump --headers bootloops/microblaze_0.elf

bootloops/microblaze_0.elf:     file format elf32-microblazele

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
 0 .boot         00000004  00000000  00000000  00000074  2**0
 CONTENTS, ALLOC, LOAD, READONLY, CODE
 1 .text         00000000  00000000  00000000  00000074  2**0
 CONTENTS, ALLOC, LOAD, READONLY, CODE
 2 .data         00000000  00000000  00000000  00000074  2**0
 CONTENTS, ALLOC, LOAD, DATA
 3 .bss          00000000  00000000  00000000  00000078  2**0
 ALLOC

Note the Size column: All entries are empty, except for the .boot section, which is four bytes small (one single instruction). That doesn’t leave room for sophisticated software, and the disassembly is indeed

> mb-objdump --disassemble bootloops/microblaze_0.elf

bootloops/microblaze_0.elf:     file format elf32-microblazele

Disassembly of section .boot:

00000000 <_boot>:
 0:    b8000000     bri    0        // 0

Which is simply an endless loop. So they called it bootloop for a reason.

 

Booting a Microblaze processor + software using Compact Flash

This is a small guide to loading a standalone application + bitstream to an FPGA using the CompactFlash card. Or put otherwise, how to make the System ACE chip happy.

For loading a Linux kernel in the same way, I suggest referring to a special post in that subject.

Formatting the flash

Rule #1: Don’t format it unless you have to. And if you have to, read the System ACE CompactFlash Solution datasheet (DS080.pdf), in particular “System ACE CF Formatting Requirements” which basically says that if you format the flash under XP, it won’t work. To summarize it shortly,

  • Make it a FAT12 or FAT16, and not a FAT32 (the usual choice)
  • More than one sector per cluster
  • Only one reserved sector (XP may very well allocate more)
  • Maximum 2GB capacity (note that when it says 2GB commercially, it’s usually slightly less, but can be more. Partitioning is recommended)

It’s recommended to rewrite the partition table, as it may arrive messy. With fdisk, this is a desired final format (give or take sizes):

Disk /dev/sdd: 2017 MB, 2017419264 bytes
64 heads, 63 sectors/track, 977 cylinders
Units = cylinders of 4032 * 512 = 2064384 bytes
Disk identifier: 0x00000000

 Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         977     1969600+   6  FAT16

NOTE: My Flash Disk appeared as /dev/sdd, yours may appear as something else. Don’t forget to fix this when running these commands, or you may wipe your hard disk!

Note the file system ID 6 (FAT16).  The card originally arrived with type 4, which is “FAT16 < 32MB”. To format the Compact Flash correctly in Linux, go (change sdd1 with the correct device, or erase something you didn’t want to):

# mkdosfs -R 1 -F 16 /dev/sdd1

And then verify that you got one single reserved sector (it’s likely you got it wrong):

# hexdump -n 32 -C /dev/sdd1
00000000  eb 3c 90 6d 6b 64 6f 73  66 73 00 00 02 20 01 00  |.<.mkdosfs... ..|
00000010  02 00 02 00 00 f8 f5 00  3f 00 40 00 00 00 00 00  |........?.@.....

The 16-bit word at 0x0e is the reserved sector count, as detailed in Wikipedia. If it isn’t as shown above, SystemACE won’t boot. Unfortunately, recent version of mkdosfs has a new “feature” which silently rounds up the number of reserved sectors to align with clusters. So it gets wrong. The solution for this is to downgrade this simple utility, possibly by downloading it from here. Version 3.0.9 is too new, 2.11 is fine.

Minimalistic setting

If there’s no xilinx.sys file in the root directory, and there is a file with an .ace extension, System ACE will boot from that file. Make sure there’s only one file with the .ace extension in flash’ the root directory. This setting doesn’t take advantage of the possibility to configure which image to boot from at powerup, but it’s easy to start off with.

Configurable setting

We shall now look on a setting which has only one .ace image to boot from, but is easily expanded to several images, chosen by the levels of three pins of the System ACE chip at powerup.

In the root directory, there should be a xilinx.sys file, saying something like this:

# Any comment goes here
dir = trydir;
cfgaddr0 = cfg0;
cfgaddr1 = cfg0;
cfgaddr2 = cfg0;
cfgaddr3 = cfg0;
cfgaddr4 = cfg0;
cfgaddr5 = cfg0;
cfgaddr6 = cfg0;
cfgaddr7 = cfg0;

The eight different cfgaddr lines tell the (Xilinx) System ACE chip which directory to go to, depending on the state of the three CFGADDR pins of the chip. So different profiles can be chosen from with DIP switches and such. In the case above, all eight configuration point at the same directory, cfg0.

The first line, declares the main working directory, which is trydir.

So in the case above, the root directory must have a directory called trydir, and within that directory, there must be a directory called cfg0.

And in cfg0, there must be a single file with .ace suffix, which is the ACE file to be loaded into the FPGA. Or more precisely, the ACE file is a translation of an SVF file, which is a sequence of JTAG instructions.

In order to allow configuration at powerup, create other directories (cfg1, cfg2 etc) and assign them to the desired cfgaddrN in the xilinx.sys file.

Generating the ACE file

Everything said here is related to the software arriving with ISE 13.2. It looks like there have been some significant changes from past versions.

In the Xilinx Platform Studio (EDK), pick Hardware > Generate bitstream on the processor configured. Basically, this generates netlists, builds them, and run the map, place and route and bitgen which creates a file such as system.bit.

Export the hardware format to SDK (Project > Export hardware design to SDK…), and then develop with SDK based upon that hardware. The bundle includes a hardware description as an XML file as well as the bitfile.

Once the project is built, it generates an .elf file, usually in the Debug subfolder. Its name and path is easily found in the Executable tab at the bottom of the SDK. Back in the EDK, pick Project > Select ELF file… and choose the relevant executable (for implementation). Then pick Device Configuration > Update Bitstream. That creates download.bit. This step is mandatory every time the ELF is changed, even though things will most likely work even without updating download.bit every time, since the relevant parts stay the same.

Create a directory to gather the relevant files, and copy the following into it:

  • The Tcl script generating ACE file: ISE_DS/EDK/data/xmd/genace.tcl (relative to the path where Xilinx ISE is installed)
  • The bitstream (download.bit) file
  • The ELF file

Open a command shell (Project > Launch Xilinx Shell if you like), change to this directory and go:

xmd -tcl genace.tcl -hw download.bit -elf myelf.elf -ace myace.ace -board sp605 -target mdm

which generates a lot of junk files (.svf most notably, which contain JTAG commands in a portable format), and eventually the myace.ace is created (any file name is OK, of course).

In the example above, I assumed that the target is the SP605 board. Looking at the genace.tcl script reveals easily which boards are supported. If it isn’t, it’s not such a big deal. The only reason the board matters is because the System ACE needs to know which device in the JTAG chain to talk with plus some programming parameters. The -board flags to this scrips allows setting the options in a “genace option file” (whatever that means). I would hack the script, though. It looks easier. See here for more information.

A test run

At times, the SP605 board’s green LED went on, but nothing happened. Pressing SYS_ACE_RESET is pressed (the middle button out of three close to the Compact Flash jack) caused a reload, which was OK. Probably some kind of race condition during powerup.

References

The walkthrough above is based upon this somewhat outdated guide. The BIST sources (rdf0032.zip) are indeed recommended for download, because of other issues of interest:

  • The ready_for_download subdirectory, which shows another example of a Compact Flash layout
  • The bootloader/src subdirectory, which has sources for loading executables from the Flash’ filesystem in SREC format (using sysace_fopen and the like).
  • The genace_all.sh file in the ready_for_download subdirectory, showing how to create SREC files from ELFs with mb-objcopy.

 

Random Microblaze notes to self


A mix of issues not deserving a post of their own.

COM port issues (with Windows XP)

The SDK has its own terminal, which can be set to run with a serial port. It works fine.

As for Hyperterminal, by all means configure a connection with a specified Hyperterminal configuration file. Just setting the properties of the current connection holds the terminal in disconnected mode until some key is pressed on the keyboard, ignoring incoming data. This can make it look as if nothing is sent from the other end.

More important, when the card is turned on, the COM port will not appear if Hyperterminal is open. So Hyperterminal has to be closed and reopened every time the card is powered on.

The setting is basically 9600 baud, 8 bits, 1 stop bit, no parity and no flow control. Despite some official guides, it looks like it’s not necessary to go to the Device Manager, right-click CP210x USB to UART Bridge Controller and set up Properties of the interface on that level. Note that they revert to default every time the USB interfaces disappear and reappear. At least with Hyperterminal, there have been no problems with having wrong values in the Bridge’s own settings.

A simple led toggling application

XGpio GpioOutput;

#define LED_CHANNEL 1

int main()
{
 volatile int Delay;
 int Status;
 int count = 0;

 Xil_ICacheEnable();
 Xil_DCacheEnable();

 print("---Entering main---\n\r");

 Status = XGpio_Initialize(&GpioOutput, XPAR_LEDS_4BITS_DEVICE_ID);
 XGpio_SetDataDirection(&GpioOutput, LED_CHANNEL, 0x0);

 while (1) {
 count++;

 XGpio_DiscreteWrite(&GpioOutput, LED_CHANNEL, (count & 0xf));

 for (Delay = 0; Delay < 1000000; Delay++);
 }

 // Never reached

 Xil_DCacheDisable();
 Xil_ICacheDisable();

 return 0;
}

Making the processor with command line

Example set for the generation of a Microblaze processor:

platgen -p xc6slx45tfgg484-3 -lang verilog    -msg __xps/ise/xmsgprops.lst system.mhs

If the processor isn’t going to be at top level, go:

platgen -p xc6slx45tfgg484-3 -lang verilog   -toplevel no -ti system_i -msg __xps/ise/xmsgprops.lst system.mhs

or ngdbuild will complain about double IO buffers.

That creates an hdl directory with a toplevel system module, a system_stub.v for instantiation, and several other HDL files. Configuration files for synthesis are written into the “synthesis” directory. The actual cores are in NGC format. Almost all core HDL files are wrappers (in VHDL).

To synthesize, change directory to “synthesis”

cd synthesis

and run the main synthesis script

synthesis.cmd

That’s a quick synthesis, because it’s all wrappers. The script ends with an exit 0, possibly making the command window close in the end.

Anyhow, a system.ncd file (netlist) was just created in the implementation directory.

Implementation with:

xflow -wd implementation -p xc6slx45tfgg484-3 -implement xflow.opt system.ngc

After PAR is OK (and a Perl script verifies that). But hey, the xflow.opt is generated by EDK, so this hardly helps. But this looks like a common implementation.

Notes for using system.ngc directly

That is, creating a black box within a regular project for the processor. This can also be done by embedding the processor into an ISE project, but sometimes ISE needs to be avoided.

  • Create the netlist files manually with platgen, with the non-toplevel option mentioned above. Or alternatively, include a system.xmp in a plain ISE project, and allow the NGC files to be generated from there.
  • Copy all NGC and NCF files in the “implementation” directory (possibly excluding system.ngc) to somewhere ngdbuild looks for binaries (as specified with the -sd flag). Don’t copy NGC files from its subdirectories.
  • Copy the system.v file from the “hdl” directory. This has several black modules for all .ngc files except for system.ngc
  • For non-Linux use, copy edkBmmFile.bmm from the main implementation directory to somewhere, and use -bm flag on ngdbuild to point at this file. This helps the data2mem block RAM initialization utility change the right places in the bitstream. This is necessary on standalone applications, for which the start address is zero. Linux systems start directly from external memory.
  • Add -sd flag in the .xst file used for parameters by the XST synthesizer, so it points at where the Microblaze’s NGC files can be found. This will make XST read the cores so it reads cores at the beginning of Advanced HDL Synthesis. It’s recommended to verify that this indeed happens. This is important, because some of the cores include the I/O buffers. When the cores are read, XST prevents itself from putting its own I/O buffers where they are already instantiated by the cores. Failing to read these cores will result in Ngdbuild complaining about I/O buffers being connected in series: One generated by XST and one by the core.
  • Implementing a bitstream file directly from an system.ngc may fail if too many I/Os are connected. A large number can make sense when they go to logic, but not to actual pins. If the purpose of this bitstream generation is to export it to the SDK for the sake of setting up a BSP (or generating a Device Tree), the solution is to remove these external ports, implement, and then return these ports. This is easiest done by editing the MHS file directly. It also seems like running Project Navigator’s “Export Hardware Design To SDK without Bitstream” process, which is available for XMP sources in the design, will work without removing ports.

References

  • Main start-off source: xilinx.wikidot.com
  • Using the genace (TCL) script
  • Linux 2.6 for Microblaze main page
  • Linux on Xilinx devices (PPC) — useful, also has the command line for formatting the Compact Flash
  • A bit about setting up the device tree: In the Linux source, Documentation/devicetree/bindings/xilinx.txt
  • In the Linux source, arch/microblaze/boot/dts/system.dts — A sample DTS file (not the one to use!)

PCIe: Is your card silently struggling with TLP retransmits?

Introduction

The PCI Express standard requires an error detection and retransmit mechanism, which ensures that the TLP packets indeed arrive correctly. The need for reliable communication on a system bus is obvious, but this mechanism also sweeps problems under the carpet: If data packets arrive faulty or are lost in the lower layers, nobody will practically notice this. While error reporting mechanisms exist in the hardware level, there is no mechanism to inform the end user that something isn’t working so well.

Update, 19.10.15: The Linux kernel nowadays has a mechanism for turning AER messages into kernel messages. In fact, they can easily flood the log, as discussed in this post of mine.

Errors in the low-level packets are not only a performance issue (retransmissions are a waste of bandwidth). With properly designed hardware, there is no reason for their appearance at all, so their very existence indicates that something might be close to stop working.

When developing hardware or using PCIe extension cables, this issue is even more important. A setting which hasn’t been verified extensively may appear to work, but in fact it’s just barely getting the data through.

The methodology

According to the PCIe spec, correctable (as well as uncorrectable) errors are noted in PCI Express Capability structure by setting bits matching the type of error. Using command-line application in Linux, we’ll detect the status of a specific device.

By checking the status register of our specific device, it’s possible to tell if it has detected (and fixed) something wrong in the TLP packets it has received. To detect corrected errors in TLPs going in the other direction, it’s necessary to locate the device’s link partner (a switch, bridge or the root complex). Even then, it will be difficult to say something definite: If the link partner reports an error, there may not be a way to tell which link (and hence device) caused it.

In this example, we’ll check a Xillybus peripheral (custom hardware), because we can control the amount of data flowing from and to it. For example, in order to send 100 MB of zeros in a loop, just go:

$ dd if=/dev/zero of=/dev/xillybus_write_32 bs=1k count=100k &
$ cat /dev/xillybus_read_32 > /dev/null

The Device Status Register

This register is part of the PCI Express Capability structure, at offset 0x0a. This register’s 4 least significant bits can supply information about the device’s health:

  • Bit 0 — Correctable Error Detected. This bit is set if e.g. a TLP packet doesn’t pass the CRC check. This error is correctable with a retransmit, and hence sets this bit.
  • Bit 1 — Non-Fatal Error Detected. A condition which wasn’t expected, but could be recovered from. This may indicate some incompatibility between the link partners, or an physical layer error, which caused a recoverable mishap in the protocol.
  • Bit 2 — Fatal Error Detected. This means that the device should be considered unreliable. Unrecoverable packet loss is one of the reasons for setting this bit.
  • Bit 3 — Unsupported Request Detected. When the device receives a request packet which it doesn’t support, this bit goes high. It may be harmless, in particular if the hosting hardware is significantly newer than the device.

(See section 6.2 for the classification of errors)

Checking status

This requires a fairly recent version of setpci (3.1.7 is enough). Earlier version may not recognize extended capability registers by their name.

As mentioned earlier, we’ll query a Xillybus peripheral. This allows running a script loop of sending a known amount of data, and then check if something went wrong.

To read the Device Status Register, become root and go:

# setpci -d 10ee:ebeb CAP_EXP+0xa.w
0000

Despite the command’s name, setpci, it actually reads a word (the “.w” suffix) at offset 0xa on the PCI Express Capability (CAP_EXP) structure. The device is selected by its Vendor/Product IDs, which are 0x10ee and 0xebeb respectively. This works well when there’s a single device with that pair.

Otherwise, it can be singled out by its bus position. For example, check one of the switches:

# lspci
(... some devices ...)
00:1b.0 Audio device: Intel Corporation Ibex Peak High Definition Audio (rev 05)
00:1c.0 PCI bridge: Intel Corporation Ibex Peak PCI Express Root Port 1 (rev 05)
00:1c.1 PCI bridge: Intel Corporation Ibex Peak PCI Express Root Port 2 (rev 05)
00:1c.3 PCI bridge: Intel Corporation Ibex Peak PCI Express Root Port 4 (rev 05)
00:1d.0 USB Controller: Intel Corporation Ibex Peak USB Universal Host Controller (rev 05)
(... more devices ...)
[root@ocho eli]# setpci -s 00:1c.0 CAP_EXP+0xa.w
0010

In both cases the return value was zeros on bits 3-0, indicating that no errors whatsoever were detected. But suppose we got something like this (which is a result of playing nasty games with the PCIe connector):

# setpci -d 10ee:ebeb CAP_EXP+0xa.w
000a

Bits 1 and 3 are set here, indicating a non-fatal error has been detected as well as an unsupported request. Surprisingly enough, playing with the connector didn’t cause a correctable error.

When writing to this register, any bit which is ’1′ in the written word is cleared in the status register. So to clear all four error bits, write the word 0x000f:

# setpci -d 10ee:ebeb CAP_EXP+0xa.w=0x000f
# setpci -d 10ee:ebeb CAP_EXP+0xa.w
0000

Alternatively, the output of lspci -vv can be used to spot an AER condition quickly. For example, a bridge not being happy with some packets sent its way:

# lspci -vv

[ ... ]

00:01.0 PCI bridge: Intel Corporation Device 1901 (rev 07) (prog-if 00 [Normal decode])
[ ... ]
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <256ns, L1 <8us
                        ClockPM- Surprise- LLActRep- BwNot+

[ ... ]

Identifiying what went wrong

AER-capable endpoints are very likely to have related capability registers. These can be polled, in order to figure out the nature of the errors. For example, to periodically poll and reset the Correctable Status Register, this little bash script can be used (note that the bus positions of the devices it polls are hardcoded, and are marked green below):

#!/bin/bash
clear

while [ 1 ] ; do
 echo -en \\033[H

 for DEVICE in 00:1c.6 02:00.0 04:00.0 05:00.0 ; do
 echo $DEVICE: `setpci -s $DEVICE ECAP_AER+10.l`
 setpci -s $DEVICE ECAP_AER+10.l=31c1
 done

 usleep 100000
done

Some general notes

  • setpci writes directly to the PCIe peripheral’s configuration space. Typos may be as harmful as with any conduct as root. Note that almost all peripherals, including disk controllers are linked to the PCIe bus somehow.
  • The truth is that all these 0x prefixes are redundant. lspci assumed hex values anyhow.
  • When lspci answers “Capability 0010 not found” it doesn’t necessarily mean that the PCI Express capability structure doesn’t exist on some device. It can also mean that no device was matched, or that you don’t have permissions for the relevant operation.

Embedded PC talking with an FPGA: Make it simple

Why embedded PC

Embedded PC computers are commonly used instead of simple microcontrollers when more than a basic interface with the outer world is needed, e.g.

  • Disk storage (ATA, SATA or ATAPI)
  • USB connection with disk-on-key storage or other peripherals
  • Ethernet connection (TCP/IP in particular)
  • VGA/DVI for display of GUI, possibly based upon a high-level standard widget library

With PC/104 board computers and their derivatives available in the market at modest prices, the adopted solution is often to design the custom peripherals using these interfaces. The non-trivial task is many times not to design the custom logic, but rather to interface with the PC through the dedicated pins. Writing the drivers for the PC can also turn out frustrating. Things don’t become easier when high data bandwidths are required, and hence DMA becomes a must.

Using standard peripherals

Dedicated signal processing or data acquisition cards are sometimes used with traditional PCI/PCIe interface when data capture is an integral part of the project. These dedicated cards are not only expensive, but their configuration and adaptation to the dedicated application can sometimes turn out to be as demanding as designing the components from scratch.

A custom, yet painless solution

An elegant shortcut is to design a simple daughterboard which is based upon a Spartan-6 FPGA with a built-in PCIe component. With an embedded computer supporting the PC/104-Express form factor, the communication with the board is immediate, and requires just 7 wires of connection. True, designing the PCIe interfaces on both sides in by no means a simple task, but Xillybus has already taken care of that. The user application talks with a FIFO on the FPGA, and through a device file on a Linux computer. All the low-level communication is transparent, leaving the application designer with an intuitive channel of data running at up to 200 MBytes/s.

This works with any processor  supporting PCIe, of course, but the embedded SoC processors with PCIe supported natively is a new market, and well, fullblown PCs are not really embedded. This way or another, there is no reason to struggle with getting data transported between a PC and a custom peripheral anymore.