Some blurbs about tweaking mplayer’s codecs

The (non-) problem

The truth is that there never was a problem. What really happened was that I got things confused between a few versions of mplayer/mencoder, and only the latests (of those I have, 1.0rc1-3.2.2) does the job. Bus since I wrote down some things I might want to return to some day, here’s the whole blob.

What got me on this, was that using mplayer (version 1.0 of some rc’s) to play my Canon 500D’s movie clips, I get the sound OK, but an image which flashes the real thing every now and then (keyframes?) and shows some grey garbage otherwise. And tons of error messages on the console.

On some other version the image looks OK, but A/V sync is lost soon enough. And many error messages indicate that the decoder doesn’t get it right (a lot of “Consumed only 136509 bytes instead of 1365120″ alikes). That isn’t very promising.

It’s worth to note, that mplayer/mencoder choose the native ffmpeg libavcodec by default. As ffmpeg improves, these issues are fixed.

My real goal is to convert the clip to something descent using mencoder. I don’t even think about editing a video in H.264. So all I need now is to find the right video decoder.

But I prefer to use the decoder supplied by Canon (spoiler: I never managed to). Since I own the camera, and got the software legally, why not use the codec they sent me? Only one problem…

What is the DLL of the codec used?

In order to “steal” the codec from the Canon application, I needed to know which DLL Canon uses to play its own videos. In order to do that, I opened Zoombrowser, and ran the ListDLL command line utility (which can be downloaded from here). The utility spits out all DLLs of all processes running, but using the “>” redirection in a command window, all data goes to a file. Then I double-clicked a video, and ran ListDLL again, redirecting the data to another file.

The difference between the files is most probably the DLLs loaded to play a clip. This worked because I ran Zoombrowser from scratch.

With my favourite diff application, I got a long list of new DLLs. These two caught my eyes:

C:\Program Files\Canon\Canon MOV Decoder\CanonH264Filter.ax
C:\Program Files\Canon\Canon MOV Decoder\CanonIPPH264DecLib.dll

Hmmm…  In retrospective, I could have figured that one out without heavy tools. But at least I know which they are now.

Installing the codec

I copied both files mentioned above to /usr/local/lib/win32. Then I added the following entry to the /usr/local/etc/mplayer/codecs.conf:

videocodec canonh264
  info "Canon's H.264 decoder"
  status working
  fourcc avc1,AVC1
  fourcc h264,H264
  fourcc x264,X264
  driver dshow
  dll "CanonH264Filter.ax"
  guid 0xb7215ee3, 0xaf54, 0x433f, 0x9d, 0x2f, 0x22, 0x64, 0x91, 0x69, 0x84, 0xf6
  out YUY2
  out BGR32,BGR24,BGR15

As for the output formats, I guessed them. Odds are I got it wrong.  As for the GUID, I managed to find a class with a “FriendlyName” saying “Canon H.264 Decode Filter 1.3″, and it has the class ID B7215EE3-AF54-433F-9D2F-2264916984F6. So basically that’s it.

Anyhow, this didn’t work at all. When I ran mencoder with -vc canonh264, it ended with

Forced video codec: canonh264
Opening video decoder: [dshow] DirectShow video codecs
Called unk_GetVersionExW
Segmentation fault

I won’t even try to pretend that I understand what went wrong here, but GetVersionExW happens to be a function exported from Windows’ kernel32.dll, retrieving information about the current operating system. I’m not clear on whether the function was never found, or if the decoder wasn’t happy with the answer it got. This way or another, a segfault is a segfault. I thought this was the place to give up. I’ll use the good old ffmpeg decoder.

A remark about H.264

Canon’s choice of H.264 as the encoding format is a bit bizarre, since it’s a version of MPEG-4. And just for general knowledge: MPEG-4 is horrible. In particular it has this annoying thing about stale regions in the frame, which look bad and basically don’t heal. But I suppose that MPEG-2 would require too fast writes to the flash or something. The result is still pretty bad.

Summary

Trying to fix video issues late at night is not necessarily the wisest thing to do.

Xilinx’ MiG memory controller’s init process reverse engineered

Introduction

I’m using Xilinx’ MiG 1.7.3 for running DDR2 memories on a Virtex-4 FPGA. It didn’t take me long to realize that the controller never finishes initialization. The problem is that I had no idea of why, and as far as I know, no documentation to refer to in my attempts to understand where the controller got stuck, which is an essential stage in getting it unstuck.

Since Xilinx are wise enough to release the IP core with its source, I was able to reverse engineer the initialization process to the level necessary for my own purpose. This is a memo of the details, just in case I’ll need to do this again some time. I sure hope that won’t be necessary…

In my case, the problem seems to have been overheating of the FPGA. I’m not 100% sure about this, but with 90 degrees centigrade measured on the case, and everything starting to work OK when a descent heatsink (with fan) was put in place, it looks pretty much like good old heat.

Overview

The initialization process consists of several stages. During the entire process, the controller is governed by the init_state one-hot state machine in the ddr2_controller module. The end of this process is marked by init_done_int going high, which goes out as init_done, hence marking the end to the IP core’s user.

The initialization consists of roughly three stages:

  • Setting up the memory device
  • Setting up the IDELAYs taps so that the DQ inputs are samples with good timing.
  • Learning the correct latency for reading data from DQs during read cycles.

Throughout the init process, active and precharge operations take place as required by standard. These operations are not mentioned here, since they don’t add anything to understanding the principle.

Setting up the memory device

This is the normal JEDEC procedure, which includes a preknown sequence of peculiar operations, as defined in the DDR2 standard. This includes writing to the memory’s mode registers. During this phase, the controller will not care if it’s talking to a memory or not, since it never reads anything back from the memory.

Setting up the IDELAYs taps

The importance of this stage is to make sure that data is sampled from DQ lines at the best possible timing. Each DQ input is calibrated separately.

This stage begins with a single write command to column zero. The write data FIFO has already been written some data to it, so that the rising edge contains all ones, and the falling edge is all zeros. For example, for a memory with 16 DQ lines, the FIFO has been fed with 0xFFFF0000 twice for memories with burst length of 4, and four times if the burst length is 8.

This can be seen in the backend_fifos module. In that module, one can see that data is written to the write data FIFO immediately after reset. Also, there is another set of words written to the FIFO, which are intended for the next stage.

All in all, this single write commands drains the FIFO with the words containing all ones or all zeros, so that column zero contains this data. Next the controller reads column zero continuously while adjusting the delay taps to achieve proper input timing for the DQs.

The logic for moving the taps is outside the ddr2_controller module. The latter merely helps by performing reads. When the tap logic finishes, it signals it’s done by raising the signal known as phy_Dly_Slct_Done in the ddr2_controller module, and carries many other names such as SEL_DONE.  In the tap_logic module (from which it origins) it’s called tap_sel_done.

The tap calibrator increments the tap delay until the data on that line shifts, or until 55 increments has taken place. Whenever this happens, it’s considered to be the data edge. The tap delay is then decremented by the number of times defined by the tby4tapvalue parameter (17 in my case).

Note that even if no edge is found at all, the tap delay calibrator will consider the calibration of that tap OK.

Here is a short list of lines I found useful to look at with a scope (using the FPGA Editor):

  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/calib_done_int
  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/tap_sel_done
  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_done
  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/dlyce_dqs[0]
  • ddr2_ctrl_tandem/data_path_00/tap_logic_00/dlyinc_dqs[0]

CHAN_DONE is the most interesting signal, because it goes high briefly every time a data line has finished its tap calibration. Unfortunately, the synthesizer messes up the identification of this signal, so the only way to tell it, is by finding what causes ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/chan_sel_int to change state. In my case it was

ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/chan_sel_int_not0002

This signal should go high 8 times (or the number of data lines per DQ you have). If it does a fewer numbers and then nothing happens, you can tell which of this data lines is problematic simply by counting these strobes.

Latency for reading data

The purpose of this stage is to tell when, in terms of semiclocks, to sample the data read from the memory. I’m not 100% clear on why this stage is necessary at all, but that won’t change the fact that it exists.

This stage starts with a single write operation again. This time the written data is slightly more sophisticated (keep in mind that it was loaded to the write data FIFO immediately after wakeup from reset). The first column will have the data 0xA written to it, duplicated to occupy all DQs. For example, on a memory with 16 DQs, the first column will be 0xAAAA. The second column is 0x5 duplicated, the third 0x9, and the fourth 0x6, all duplicated. If the burst length is 8, this four word sequence is repeated.

After writing this, the controller reads column zero continously, until COMP_DONE goes high. This signal origins from the pattern_compare8 module, which tells the controller it has recovered the correct input data alignment. More precisely, the rd_data send the ddr2_controller a logical AND of all pattern_compare8′s comp_done signals.

These pattern_compare8 modules simply looks for an 0xAA pattern followed by a 0x99 pattern in the input during rising edges only, or an 0x55 followed by 0x66 on the rising edge. So it will catch the reads of the first and third column, or the second or forth, but either way this solves the alignment ambiguity completely.

As the pattern_compare8 module tries to match the data, it increments (among others) its clk_cnt_rise register (not to be confused with the clk_count_rise wire, which contains the final result). Monitoring clk_cnt_rise[0] (using FPGA Editor, for example) can give a positive feedback that the initialization is at this phase. It should give a nice square wave at half the DDR2 controller’s clk0 frequency, and then stop when this phase is done.

Summary.

The initialization process is not the simplest in the world, and it’s likely to fail if you got anything wrong with your memory, in particular if you have as little as one data wire line miswired. This is not really good news, but understanding the process may help at least understand what went wrong, and hopefully fixing it too.

DCM loses lock on Virtex-4: It’s all about auto calibration

The whole story began when I decided to be kind enough to tell the Xilinx tools (ISE 9.2 in my case) that the Virtex-4 I’m targeting is a grown-up. Stepping 2, to be precise. I added

CONFIG STEPPING = "2";

to the UCF file. It must have been one of those moments where I believed that the tools do what is best for me.

It wasn’t long before the mapper told me it’s rewarding me with some autocalibration logic for the DCM. Sounded pretty OK. Some logic that will get the DCM back on its feet if the clock stops and returns. Not that I have any such plans. As a matter of fact, I’ve made sure that the DCM will get a reset after any possible messing with the DCM’s clock input.

Both the mapping warning and the docs mention that it’s possible to disable the autocalibration feature in order to save some logic. They never mentioned that the logic can kill the DCM.

And then one of the DCMs started losing lock. I had changed several other things at the same time, so it wasn’t easy to track down why. But it looked so weird: The DCM’s lock flag would go high, and then go down again. The timescale was tens of milliseconds, which is way beyond the response times  for a DCM.

My first thought was that it must have something to do with the clock’s signal quality. Maybe some crosstalk. The clock was around 200 MHz. But then I decided to look a bit closer on what this autocalibration was about.

That led me to Answer Record #21435, which was pretty explicit about the reset:

When the input clock returns, the user must manually assert the DCM reset for at least 200 ms to resume proper DCM functionality.

200 ms? So there is was. I did mess with the input clock, but then I sent a brief reset signal to the DCM to get it back to normal. It worked in the past. Not with the extra logic. So all I needed to do, was to add

defparam   thedcm.DCM_AUTOCALIBRATION = “FALSE”;

(in the Verilog definition of the DCM) and the problem (which shouldn’t have occured in the first place) was solved.

To make things slightly more annoying, I also had to upgrade the old “DCM” primitives to “DCM_BASE”, because when the “DCM” primitives are upgraded automatically to DCM_ADV’s (by XST),  the DCM_AUTOCALIBRATION parameter set to the default, which is “TRUE”. The same parameter simply doesn’t exist for the backward-compatible “DCM” primitive.

Note to self: Remember to disable the autocalibration on all DCMs from now on.

500 days: The uptime never reached

I know it’s stupid, but there’s something cool about very long uptimes. I think it begins when the uptime reaches 100 days: You think twice before rebooting your Linux machine: Is it really necessary?

The truth is that without really paying attention to it, my Linux box approached 500 days. I noticed that, because I wanted to upgrade my kernel (it’s about time…). But first I wanted to know what I’m breaking. 493 days, uptime told me. So I decided to wait a week. Childish, but it’s 500 days after all.

Today is the day. My computer has been up for 500 days. Here comes the celebration:

[eli@localhost eli]$ uptime
  1:51pm  up 3 days, 10:01,  4 users,  load average: 0.21, 0.05, 0.01

Oops. That’s not 500 days! And I’m sure that the computer hasn’t rebooted. Conclusion: My uptime counter has wrapped to zero.

Let’s get to the root of this:

[eli@localhost eli]$ cat /proc/uptime
295330.71 41278547.60

The left number is the uptime, in seconds.  To the right is the idle time, in seconds as well.

It just so happens that the kernel counts the uptime in jiffies, which is 1/100 of a second. Since my computer is a 32-bit Intel, the counter can only count 2^32 jiffies, which is 42949672.96 seconds, which happens to be 497.1 days (divide by 60, 60 again, and then 24). After that, the counter starts over from zero. What a nasty trick…

And since we’re at it, why not check up the idle time? 41278547.60 seconds is 477.76 days, which tells something about how much I use my computer. In fact, it’s more like a housekeeping server (mail, routing and stuff), so no wonder it’s idle most of the time. A quick calculation shows that it was active no more than 4.5% of its 500 days of uptime. Hmmm…

Anyhow, I’m sure that whoever wrote the uptime counter had a little laught about idiots like me who think it’s cool to see the uptime reach 500 days. Don’t count the jiffies, he must have said, make the jiffies count.

Command-line (bash/GIMP) mass conversion and processing

The purpose

I use GIMP a lot. I store the images in the native file format, XCF. Now I’m stuck with a lot of files I can’t see outside GIMP, but I don’t want to save those files as anything else, because I’ll lose all the layer data. Solution: Batch conversion to JPEG as a simple bash script to run from the command line.

The idea is to make a JPEG copy of the image as it’s seen when it’s opened with GIMP. For that reason, I’ve chosen to flatten the image by visible layers only, and to crop it to image size.

Also, I’ll show an example of how to massively fix images with a GIMP script.

2022 update: This post is really old. Nowadays ImageMagick supports XCF, so it’s possible to just go

$ convert this.xcf this.jpg

But this post can still be of use for more sophisticated tasks with GIMP.

The script

It looks like LISP and it looks like bash. In fact, they’re mixed.

#!/bin/bash
{
cat <<EOF
(define (convert-xcf-to-jpeg filename outfile)
  (let* (
	 (image (car (gimp-file-load RUN-NONINTERACTIVE filename filename)))
	 (drawable (car (gimp-image-merge-visible-layers image CLIP-TO-IMAGE)))
	 )
    (file-jpeg-save RUN-NONINTERACTIVE image drawable outfile outfile .9 0 0 0 " " 0 1 0 1)
    (gimp-image-delete image) ; ... or the memory will explode
    )
  )

(gimp-message-set-handler 1) ; Messages to standard output
EOF

for i in *.xcf; do
  echo "(gimp-message \"$i\")"
  echo "(convert-xcf-to-jpeg \"$i\" \"${i%%.xcf}.jpg\")"
done

echo "(gimp-quit 0)"
} | gimp -i -b -

To try it out, simply execute the script from a directory containing several .xcf files. Be sure not have any .jpg files you care about in the same directory, because the outputs are with the same file names, just with the .jpg extension (old files are overwritten with no warning).

You will get kind-of-warning messages while the script is iterating, indicating which file is being processed. This is normal.

The concept of this script is simple. An ad-hoc LISP script is generated in the Bash block, which is enclosed by curly brackets. First we define the function, which converts one file. The bash script then creates calls to this function, by means of the (Bash) for-loop. All this is then fed into GIMP through standard input (piping).

Some LISP notes

I’m not really into LISP. So I ran into some trouble. These are my notes, so I won’t go through it again:

First, there’s the Script-Fu console, which was, well, sort-of helpful. The internal functions’ API can be found there as well.

As a LISP novice, I didn’t know the difference between “let” and “let*”. It turns out, that let* allows the use of previous assignments in the following ones, so this is what you get in the Script-Fu console:

> (let ( (x 2) (y x)) y)
Error: eval: unbound variable: x 

> (let* ( (x 2) (y x)) y)
2

It’s also worth to note, that the GIMP interpreter does not remember functions across different -b command-line arguments:

# First statement succeeds, second fails.
gimp -i -b '(define (myfun x y) (- x y))' -b '(myfun 2 3)'

# This works, because it's two statements in one execution (yuck!)
gimp -i -b '(define (myfun x y) (- x y)) (myfun 2 3)'

Mass processing of frames

As the title implies, I needed this to make adjustments on a video clip. It’s true that many video editors (Cinelerra included) have filters for that, but running a sequence of GIMP commands is probably stronger than any possible video editor.

So here’s a little script which runs some operations on a list of frames. This was useful, mainly because I wanted to run curves on a clip (and Cinelerra doesn’t have that operation. I wonder which video editor has).

#!/bin/bash
{
cat <<EOF

(define (do-retouch filename outfile)
 (let* (
 (image (car (gimp-file-load RUN-NONINTERACTIVE filename filename)))
 (drawable (car (gimp-image-merge-visible-layers image CLIP-TO-IMAGE)))
 )
 (gimp-colorize drawable 10 50 0)
 (gimp-curves-spline drawable HISTOGRAM-VALUE 6 #(0 0 147 44 255 122 ) )
 (gimp-hue-saturation drawable ALL-HUES 0 0 20)
 (plug-in-gauss RUN-NONINTERACTIVE image drawable 10 10 1)
 (file-png-save2 RUN-NONINTERACTIVE image drawable outfile outfile 0 9 0 0 0 0 0 0 1 )
 (gimp-image-delete image) ; ... or the memory will explode
 )
 )

(gimp-message-set-handler 1) ; Messages to standard output
EOF

for i in frame*.png; do
 echo "(gimp-message \"$i\")"
 echo "(do-retouch \"$i\" \"fixed/x_${i%%.png}.png\")"
done

echo "(gimp-quit 0)"
} | gimp -i -b

The script-Fu console was helpful in finding the function’s names, which are pretty obvious. I’ll only mention that running gimp-image-merge-visible-layers in this case is probably not necessary, but I suppose GIMP won’t waste too much time on merging a layer with itself.

As for the curves operation (gimp-curves-spline), I first tried to use curves traces which are saved by GIMP as they are being used in the GUI, but that turned out to be pretty complicated. So I went for the simple approach: Open the GUI, find the X-Y points on the graph, and copy them manually. The “6″ says that there are 6 numbers ahead (3 X-Y pairs), and then we have X0, Y0, X1, Y1, etc. The values go from 0 to 255. So it’s pretty trivial, and does exactly the same as the GUI.

Importing the frames to Cinelerra

Not that it’s directly relevant, but to import a bunch of frames into Cinelerra, use the mkframelist command line utility and load it like any file. For example,

$ ls *.png | mkframelist -r 30 > framelist

For 30 fps, and then load “framelist” to Cinelerra. Note that the paths in the list are absolute, so you can’t just move around the files.

 

Canon EOS 500D: Using the wrong driver intentionally

Foreword

Before I say a word, my children, I have to warn you: What I’m about to teach you here is basically how to mess up your computer. It’s how to make Windows install the wrong driver for a USB device (possibly PCI devices as well). Don’t complain about a headache when you want to switch to the correct driver (or just another one): Once Windows adopts a driver for a device, it develops certain sentiments for it, and will not give it up so easily.

Also, my children, I have to confess that I still use Windows 2000 these days. It’s about time I went on for at least XP, but I have too much installed software to carry with me (none of which will run, I fear, after an upgrade).

Having these issues out of the way, let’s get to business.

My motivation

I bought a brand new Canon EOS 500D, which came with a brand new EOS Digital Solution disk (v20.0, if we’re at it). It’s black, it’s pretty, it autoboots, but it tells me to go &@^%$ myself with my ancient operating system (Windows 2000, as mentioned). Canon’s “support” is more like a black hole, so I’m on my own. All I want is to download the pics to my computer. When I plug the camera in, I get “Found new hardware” but no driver to get it working with.

I was slightly more successful with Linux (Fedora 9, on my laptop). gphoto2 managed to download the images (using command line, which is cool) using PTP (Picture Transfer Protocol) but I want this working on my desktop.

Now here’s the problem in summary: The camera connects to the computer and says: “I was made by Canon (manufacturer number 0409) and I’m a 500D camera (product ID 31CF), and I support the standard interface 6.1.1, which means “Still Imaging”. An XP computer would say “Aha! A camera! Let’s try PTP, even though I’ve never heard about this device!” but a Windows 2000 won’t talk with strangers (if they are cameras, that is).

Drivers for this camera for Windows 2000 are not to be found. I tried to find a generic PTP driver for Windows, but couldn’t find one. There’s a generic interface (WIA), but no generic driver. Then I thought: Since any camera driver would talk PTP with any PTP camera, why not put just any Canon driver to talk with my camera? After all, the driver just transfers files. I hope.

Update (February 5th, 2010): I got the following laconic mail from Canon’s help desk today:

“Dear Customer,

The EOS 500D camera can only be connected to personal computers with either Windows Vista, Windows XP or Mac OS X operating systems. Unfortunately Windows 2000 is not supported”

Wow! That’s a piece of valuable information after having the camera for over six months!

The black magic

It just so happens, that the 400D has a PTP TWAIN driver for Windows 2000 (the installation file is k6906mux.exe). So I downloaded that one, and installed it as is. Which didn’t help much, of course. But it left me the INF file at the destination directory. That allowed me some wicked manipulations.

The trick is to bind the driver’s software to the specific hardware ID. So I opened the INF file, and found the part saying:

[Models]
%DSLRPTP.DeviceDesc%=DSLRPTP.Camera, USB\VID_04A9&PID_3110

That means, “if some device says it was made by 04A9 (Canon) and that its product ID is 3110 (EOS 400D, I suppose)”, use this driver.

Hey, this is an open invitation for intervention! I simply changed it to:

[Models]
%DSLRPTP.DeviceDesc%=DSLRPTP.Camera, USB\VID_04A9&PID_31CF

(actually, I did it on a copy of the file)

And then I went for all these places saying

[DSCamera.Addreg]
HKLM,"%DS_REG%\TWAIN\EOSPTP",DeviceDesc,,"EOS Kiss_X REBEL_XTi 400D"
HKLM,"%DS_REG%\TWAIN\EOSPTP",ModelName,,"EOS Kiss_X REBEL_XTi 400D"

and changed them to something saying it’s a 500D using 400D driver. Just free text so I know what I’m doing.

By the way, you may wonder where I had the 500D’s product ID from. The answer is Linux again. There’s a utility called lsusb, which supplies all that info. You can get it in Windows too, I suppose. I just don’t know how.

Putting it to work

At this point, I plugged in my camera, and powered it on. Windows told me it found new hardware, great, and then asked me to supply a driver (a couple of wizard windows ahead). It actually wants an INF file, so I gave it the one I cooked.

Since the VID/PID in the file match those given by the camera, Windows installed the drivers and associated them with the camera from now on. Mission accomplished.

Did it work?

The truth is that the result isn’t very impressive. Maybe because Canon’s own EOS utility failed to talk with the camera this way, and Picasa’s interface with the TWAIN driver is a bit uncomfortable. But the bottom line is that I can now download the images to my Windows 2000 computer.

On the other hand, maybe it’s this ugly with proper drivers as well. The most important thing is that it works, after all.

Verilog: Declaring each port (or argument) once

(…or why the Verilog-emacs AUTOARG is redundant)

In Verilog, I never understood why port declarations appear both in the module declaration, and then immediately afterwards, along with the wires and registers. I mean, if the ports in the module declaration are always deducible from what follows immediately, why is the language forcing me to write it twice?

The short answer is: It doesn’t.

Let’s have a look on this simple module:

module example(clk, outdata, inbit, outbit);
   parameter width = 16;

   input clk;
   input inbit;
   output outbit;
   output [(width-1):0] outdata;

   reg [(width-1):0]		outdata;

   assign 	outbit = !inbit;

   always @(posedge clk)
     outdata <= outdata + 1;

endmodule

There is nothing new here, even for the Verilog beginner: It demonstrates a simple combinatoric input-output relation. We also have an output, which happens to be a register as well (I didn’t even bother to reset it).

And as usual, every port is mentioned twice. Yuck.

Instead, we can go:

module example #(parameter width = 16)
  (
   input clk,
   input inbit,
   output outbit,
   output reg [(width-1):0] outdata
   );	       

   assign 	outbit = !inbit;

   always @(posedge clk)
     outdata <= outdata + 1;

endmodule

At this point, I’d like to point out, that this is no dirty trick; This type of module declaration is explicitly defined in the Verilog 2001 standard (or by its official name, IEEE Std 1364-2001). This goes both for defining the ports and the parameters  (thanks to Evgeni for pointing out the possibility to set parameters as shown above).

According to the BNF definition in Annex A (A.1.3, to be precise), a module definition must take one of the two formats shown above, but mixing them is not allowed.

So here are few things to note when using the latter format:

  • Each port declaration ends with a comma, not a semicolon. Same goes for parameter declarations.
  • It’s not allowed to declare anything about the port again in the module’s body. Repeating the port’s name as a wire or reg is not allowed.
  • Use “output reg” (which is legal in either format)  instead of declaring the register in the module’s body (which is not allowed in this setting)
  • Syntax highlighters and indenters may not work well

The question is now: How could I not know about this?

Porting to Virtex-4: Who ate my IOB registers?

Surprise, surprise!

When porting a design from Spartan-3 to Virtex-4, I discovered that many registers, which were correctly placed in the IOB in the Spartan-3, fell off into fabric-placed flip-flops. Which is very bad news, since keeping the registers in the IOB isn’t just a matter of better timing, but rather repeatable timing, which is much more important to me. I don’t want the timing to change when I reimplement, or a poorly designed board with marginal signal quality can make a new FPGA version to appear buggy, because something stopped to work.

It turns out, that the tools got lost somewhere in the transaction from plain IOBs to ILOGICs and OLOGICs. In other words, the synthesizer (XST J.39, with ISE 9.2.03i) or maybe the mapper failed to take obvious hints. My hunch is that the mapper is to blame.

What part of “ILOGIC” didn’t you understand?

There’s always the aggressive solution of instantiating the IOBUF and the relevant flip-flops explicitly. In fact, it may be enough to just instantiate the IOBUF itself. The only explanation I can think of why this would help, is that the synthesizer packs the registers during synthesis, and maybe also makes some minor fixes to allow this packing. It’s ugly, but it works.  Or if it doesn’t work, at least I know why: A major advantage of instantiating IDDR and ODDR (or IDDR_2CLK, if you want to feed it with two clocks) is that it forces the mapper to complain loudly when in refuses to put them in place. It can’t just hide the flip-flops in the fabric, say nothing, and hope I won’t notice.

In theory, differences between the clocking and reset schemes should be allowed between the ILOGIC and OLOGIC flip-flops. How do I know? Because I can get it done with the FPGA Editor. In practice, the packer is terribly picky about what it’s ready to pair into an ILOGIC/OLOGIC couple. I haven’t tested all combinations, but it appears it won’t try to pack a DDR flip-flop next to a non-DDR one. And if there’s a difference in clocking or reset, forget about it.

Example: I had a case where I used the PRESET input for the ODDR, but no reset at all for the IDDR, and got:

ERROR:Pack:1564 – The dual data rate register controller/ddr_bus_T[10] failed to join the OLOGIC component as required.  The OLOGIC SR signal does not match the ILOGIC SR signal, or the ILOGIC SR signal is absent.

Inference (and its black magic)

Just using the “IOB of … is TRUE” synthesis pragma seems to make the synthesizer do no more than to avoid eliminating equivalent registers, and duplicating them when their content is used in several sites. That’s nice, but sometimes not enough.

Let’s see a few examples of what worked and what didn’t work with the ISE foodchain, when instatiation was avoided. The target is Virtex-4. Note that no IOB pragma is used here.

First, let’s look at this:

module try(
	   input clk,
	   output reg toggle
	   );
   reg [1:0] 	  count;

   always @(posedge clk)
     begin
	count <= count + 1;

	case (count)
	  0: toggle <= 1'b0;
	  1: toggle <= 1'bz;
	  2: toggle <= 1'b1;
	  3: toggle <= 1'bz;
	endcase
     end

endmodule

This actually worked well: Both tri-state buffer and data register were placed in the OLOGIC element, resulting in optimal timing. But hey, what if we want to read from the data lines while they are tri-stated? So we go for this (note that it’s not functionally equivalent):

module try(
	   input clk,
	   inout toggle
	   );
   reg [1:0] 	  count;

   reg 		  toggle_reg;
   reg 		  z_reg;

   assign 	  toggle = z_reg ? 1'bz : toggle_reg;

   always @(posedge clk)
     begin
	count <= count + 1;
	z_reg <= count[1];
	case (count)
	  0: toggle_reg <= 1'b1;
	  1: toggle_reg <= 1'b0;
	  2: toggle_reg <= 1'b1;
	  3: toggle_reg <= 1'b1;
	endcase
     end

endmodule

This placed both registers in the OLOGIC as well (note that I didn’t bother to read from the IO, but never mind). The truth is that I was lucky here. If z_reg and toggle_reg would happen to be equivalent, they would melt into a single register, which would remain outside the OLOGIC element. Or let’s look at this example (note the subtle difference…):

module try(
	   input clk,
	   output toggle
	   );
   reg [1:0] 	  count;

   reg 		  toggle_reg;
   reg 		  z_reg;

   assign 	  toggle = !z_reg ? 1'bz : toggle_reg;

   always @(posedge clk)
     begin
	count <= count + 1;
	z_reg <= count[1];
	case (count)
	  0: toggle_reg <= 1'b1;
	  1: toggle_reg <= 1'b0;
	  2: toggle_reg <= 1'b1;
	  3: toggle_reg <= 1'b1;
	endcase
     end

endmodule

For those who missed the difference, here it is: The polarity of z_reg, as a data out enabler is reversed. The z_reg register which is implemented by the synthesizer turns out to be in the wrong polarity for the tri-state buffer in the pad, so the OLOGIC is used as a piece of combinatoric logic. A NOT gate, to be precise. It would be wiser and possible, of course, to implement both the NOT gate and the flip-flop inside the OLOGIC, but it looks like the tools don’t take it that far.

But let’s be a bit fair. These oddities could be fixed simply by adding a single synthesis hint:

// synthesis attribute IOB of z_reg is "TRUE"

The thing is, that the tools managed without this hint in several cases when targeting Spartan-3 devices. What’s ugly here,  is not that the synthesis pragma is necessary, but that the tools behave differently suddenly.

And another small pitfall: Don’t put double quotes around the register’s name (z_reg) in our case. That will cause the synthesizer to silently ignore the pragma comment. It’s OK to put the around “TRUE” but not around a name which lives in the synthesizer’s name space.

Botttom line

When porting to Virtex-4 (and most likely newer FPGAs) keep a very close eye on where the IOB registers are placed. Also, be aware of how picky the tools have become about the similarity between paired OLOGIC and ILOGIC.

Getting the right names in the UCF file: Using netgen

The problem: NGDBUILD tells you it can’t find a net or instance given in the UCF file. It’s likely that the synthesizer changed the names, sometimes slightly and sometimes beyond recognition. You need these names to define a timing group, for example, but how do you know them?

Normally, I would get the net and instance names from FPGA Editor, or possibly from the Timing analyzer. But without any successful place-and-route, how can I know what names the synthesizer gave them, if I can’t even get through NGDBUILD?

Solution: Create a simulation model in Verilog (it also possible in VHDL, but I’ll show Verilog):

If my synthesis gave mydesign.ngc, simply write at command prompt (to most of you it’s a DOS window):

netgen -ofmt verilog mydesign.ngc delme.v

And delme.v will contain the simulation model. It’s a fairly readable file, in which the design is broken down to small primitives, which makes it pretty heavy. But the names used for nets and logic are those that go to NGDBUILD, and with some searching in the text file, one can get around.

Note that if mydesign.ncd is used rather than mydesign.ngc, you’ll get the simulation model for the post-PAR result (which can be useful too at times).

The PCF file: Xilinx timing constraints as the tools understood them

One of the problems with setting up timing constraints in the UCF file, is to be sure that you got the right elements in, and kept the unnecessary ones out.

Suppose I wrote something like

NET "the_clock" TNM_NET = "tnm_ctrl_clk";
TIMESPEC "TS_ctrl_clk" = PERIOD "tnm_ctrl_clk" 40 ns HIGH 50 %;

What logic element does it apply to? Did it work like I expected?

The information can be obtained by creating a timegroup report in the Timing Analyzer, but it’s actually available in a much easier way: The PCF file, which is created by the MAP tool. This file has the same syntax as the UCF file, but it reflects the constraints as understood by the tools.

You will find the as-made pin placements there (not shown here), and the timing groups as TIMEGRP statements. It goes something like:

TIMEGRP tnm_ctrl_clk = BEL "controller/bus_oe_16" BEL
        "controller/ctrl_dout_15" BEL "controller/bus_oe_15" BEL
        "controller/ctrl_dout_14" BEL "controller/bus_oe_14" BEL
        "controller/ctrl_dout_13" BEL "controller/bus_oe_13" BEL
        "controller/ctrl_dout_12" BEL "controller/bus_oe_12" BEL
        "controller/ctrl_dout_11" BEL "controller/bus_oe_11" BEL
        "controller/ctrl_dout_10" BEL "controller/bus_oe_10" BEL
        "controller/ctrl_dout_9" BEL "controller/bus_oe_9" BEL
        "controller/ctrl_dout_8" BEL "controller/bus_oe_8" BEL
        "controller/ctrl_dout_7" BEL "controller/bus_oe_7" BEL
        "controller/ctrl_dout_6" BEL "controller/bus_oe_6" BEL
        "controller/ctrl_dout_5" BEL "controller/bus_oe_5" BEL
        "controller/ctrl_dout_4" BEL "controller/bus_oe_4" BEL
        "controller/ctrl_dout_3" BEL "controller/bus_oe_3" BEL
        "controller/ctrl_dout_2" BEL "controller/bus_oe_2" BEL
        "controller/ctrl_dout_1" BEL "controller/bus_oe_1" BEL
        "controller/ctrl_dout_0";

There you have it, in plain text. The relevant constraint is just a few rows away:

TS_ctrl_clk = PERIOD TIMEGRP "tnm_ctrl_clk" 40 ns HIGH 50%;

As simple as that.