01signal: Clock domain crossing with data

This page is the last in a series of three pages about clock domains.

When one bit is not enough

Quite often, the signal that is required to go across clock domains is a data word, and not a single bit. The straightforward solution in this case is the dual-clock FIFO that is supplied by the FPGA's vendor, as already suggested. But sometimes this isn't an option. And besides, someone had to implement that FIFO in the past.

So the goal is to make a vector signal appear correctly on another clock domain. Let's first start with a naïve and incorrect example for a clock domain crossing, just to explain why it isn't that easy:

reg [7:0] foo, bar, bar_metaguard;

always @(posedge clk1)
  foo <= foo + 1;

always @(posedge clk2)
  begin
    bar_metaguard <= foo; // This will fail sometimes!
    bar <= bar_metaguard;
  end

This is exactly like the simple metastability guard example on the previous page, but the registers are 8-bit vectors, and @foo is a counter instead of changing between '0' and '1'.

So why is this wrong? The problem is the differences in the routing delays among the paths from @foo to @bar_metaguard: Each of the eight bits has a different delay. When @foo changes, some of the changes of these bits may arrive with legal timing to @bar_metaguard's flip-flops, and others won't arrive in time.

So even there is no metastability in any of the 8 flip-flops that comprise @bar_metaguard, there can be a situation where @foo changes, and only some of the bits of @bar_metaguard obtain the new value, and others remain with the old value. So if @foo changes from 0xff to 0x00, the next value of @bar_metaguard could be anything. That is because some bits obtained @foo's value before the change, and others after this change. This incorrect value will be visible on @bar after a clock cycle of @clk2.

To solve this problem, it's first necessary to define the need: Is @bar required to contain a valid value all the time, or is it intended to occasionally pass information from one clock domain to another? I'll discuss these two options separately.

Option #1: Continuous sampling

If the destination word (@bar in the example) is required to continuously sample the word on the other clock domain (@foo), and always contain a legal and meaningful value, there's only one way to ensure this: Use the naïve method for clock domain crossing that is shown above, but make sure that on each clock cycle of @clk1, only one of @foo's bits changes (or none). In other words, use a metastability guard with the vector signal, but make sure to avoid the problem when multiple bits change on the same clock cycle.

Because only one bit can change on each clock cycle, each change is either sampled or missed by @bar_metaguard, and either way it reflects one of the values that @foo had.

So if @foo in the example above was a counter that used Gray code rather than plain binary code, it would work perfectly fine: The essence of Gray coding is that only one bit changes each time the word counts up, so @bar would be guaranteed to always carry a meaningful value.

But wait, what happens if @clk1 has a higher frequency than @clk2? That doesn't matter if it's OK that @bar skips some of @foo's values. For example, if @foo is a counter in Gray code format, some counted numbers may be skipped when looking at @bar. And yet, all values that are seen on @bar are correct in the sense that they did appear in @foo at some point in time.

The most common use of this method is inside dual-clock FIFOs, where Gray code is used to transfer the address in the FIFO's RAM across the two clock domains: The FIFO's writing side encodes the address of the last word it has written to the RAM into Gray code. The encoded word is transferred to the reading side, which is on another clock domain, by using metastability guards on the word. The same method is used in the other direction.

Because each side knows the other side's updated address (after the delay of the metastability guards), each side can also calculate how many elements there are in the FIFO, and hence produce signals like empty and full.

Conveying events at a higher rate

Recall from the previous page, that the metastability guard with a single bit has a limitation on the frequency of the clock of the source's clock domain. If each change of this bit is a way to tell the other side about an event, there is a risk that the receiving side misses events if these events occur to often.

The solution is to convey a counter that is encoded in Gray code across the clock domains. This way, the receiving side known how many events have occurred, and hence will not lose any information. The number of bits of this counter is chosen so that even if the events occur at every clock cycle of @clk1, the receiving side is still able to deduce how many events have taken place.

So in this sense, it can be easier to work with a vector than a single bit: If a single bit changes and that fact is missed because the destination's clock is slower, the result is to miss that anything happened at all. But with a word that is correctly transferred across clock domains (e.g. with Gray code), no information is lost.

And if one bit is enough for this purpose, then the Gray code counter is just a single bit that changes its value on each event. In other words, with a single bit, the solution with the Gray code counter is exactly the same as the simple metastability guard.

A petty comment on the timing of the paths

As the title of this section implies, it may be a good idea to skip to the next section.

There's an underlying assumption regarding the metastability guard for vector signals: That the difference between the delays of these paths doesn't exceed a clock cycle of the source's clock.

This assumption is almost certainly met without doing anything special about it, and still, let's consider this theoretical example: Say that @clk1 has the frequency 500 MHz, and that one of @foo's paths to @bar_metaguard has a routing delay of 1 ns. Also, let's say that another path has a routing delay of 4 ns, which is of course extremely unlikely to occur, but let's see what can happen:

One of the bits changes its value, and the change begins the journey that takes 4 ns. On the following clock cycle, 2 ns later, the other bit changes, and reaches @bar_metaguard after a time of 1 ns. But that's 1 ns earlier than the first bit's arrival. Hence @bar_metaguard can sample the entire word with a value that @foo never had.

As routing delays are typically much shorter than in this example, this is not expected to happen in reality. Nevertheless, any routing delay is theoretically possible. To eliminate this possibility altogether, a timing constraint like the following can be used (written in Vivado format):

set_max_delay -datapath_only -from [ get_pins -hier -filter {name=~*/C} ] -to [ get_pins -hier -filter {name=~*_metaguard*/D} ] 1.5

This constraint resembles the one with set_max_delay that was given in the previous page. However, note that in the previous page the metastability guard is in the "-from" part, and here it is in the "-to" part. So the constraints don't apply to the same paths. The purpose of the constraint in the previous page was to give the metastability guard some time to recover from metastability. Accordingly, that constraint applies to paths between the same clock. On the other hand, the constraint above relates to the clock domain crossing itself.

This constraint is therefore written differently: As the relevant paths connect between clock domains of unrelated clocks, it's meaningless to take the skews of these clocks and the jitters of these clocks into account. This is what the -datapath_only part says: Never mind the time it takes for the clocks to reach the flip-flops. Just measure the path.

What makes this constraint confusing is that the path starts at the clock pin of the source's flip-flop, and ends at the data input pin (D) at the destination. The stopwatch hence starts when the source's flip-flop gets its clock and ends when the updated signal arrives at the destination, and this is required to satisfy its setup time. Hence, this path includes both side's timing specifications and requirements.

By restricting all these paths to 1.5 ns, as in this timing constraint, no path can exceed this time limit, and hence the skew between path delays is limited to this number as well. So even if @clk1 has a clock period of 2 ns, it's impossible for the paths to arrive in wrong order. Which, once again, is extremely unlikely anyhow, but this is the way to ensure that.

Note that if the path to the metastability guard is affected by a false path constraint (e.g. set_false_paths or set_clock_groups), set_max_delay will probably have no effect: The false path constraint is likely to take precedence. So always check the paths in the timing report, in order to verify that the tools interpret the constraints as desired. A different page about timing discusses this.

Option #2: Occasional update of the register

The limitation that only one bit may change on each clock cycle is often too restrictive. When the data is updated occasionally, another technique can be used. For the following example, assume that @do_update is active (i.e. has value '1') only once in several clock cycles. Also, let's assume that this register is used to indicate that the value in @foo should be updated with @new_value:

reg [7:0] foo, bar;
reg       toggle, toggle_metaguard, toggle_a, toggle_b;
reg       new_value_bar;

always @(posedge clk1)
  if (do_update)
    begin
      foo <= new_value;
      toggle <= !toggle;
    end

always @(posedge clk2)
  begin
    toggle_metaguard <= toggle;
    toggle_a <= toggle_metaguard;
    toggle_b <= toggle_a;

    if (toggle_a != toggle_b)
      bar <= foo; // No metastability guard, because foo is stable
    new_bar <= (toggle_a != toggle_b); // Not necessary, just side info
  end

For now, ignore @new_bar. I'll come to that later.

So this is how it works: @foo is updated only when @do_update is active. When that happens, @toggle changes to its opposite value on the same clock cycle.

On @clk2's clock domain, @toggle_metaguard obtains the value of @toggle as a metastability guard. On the next clock cycle, this value is copied into @toggle_a. On the clock cycle after that, the value in @foo is copied directly into @bar. This is because @toggle_a and @toggle_b have different values during exactly one clock cycle.

The fact that @bar and @foo are in different clock domains has no significance, because @foo has been stable for well more than enough time to meet the timing requirements.

Why am I so sure about that? This time I have a good reason, and it goes like this: The whole procedure starts when @toggle_metaguard changed value because @toggle did. Had @bar sampled @foo at the same @clk2 cycle, it would have been unsafe, but with some luck maybe it would have been OK. But then there's another clock cycle of @clk2 until @toggle_metaguard's new value gets to @toggle_a. And @bar isn't updated even then, only on the next clock cycle of @clk2.

So from the moment that @foo changes until @foo is sampled by @bar there's a period of time, which corresponds to at least two clock cycles of @clk2. Compared with any flip-flop's setup time, that's an eternity. That said, it makes sense to apply set_max_delay as shown in the previous page on @toggle_metaguard. The same can be done with the paths to @bar, even though it's very unlikely to be necessary, because of the just mentioned eternity.

The Achilles' heel of this method is that @do_update must be active rarely enough to ensure that @foo remains stable when it's sampled by @bar. A reasonable minimal time between such updates is the time that corresponds to four clock cycles of @clk2. So the calculation is how many clock cycles of @clk1 corresponds to four clock cycles of @clk2, and round up to the nearest integer. If @clk1 is four times slower than @clk2 (or slower), that's not a restriction at all. Otherwise, there must be some mechanism in the logic that ensures that @do_update doesn't get active more often than allowed.

The truth is that in real-life designs, when the update rate is very slow, clock domains are sometimes crossed carelessly without any protection of the sort that @toggle offers. When it's done like this, @foo is copied into @bar continuously. When @foo changes once in a long while, @bar may contain an incorrect value during one clock cycle, but who cares? More often than not, this mistake is the result of neglecting the whole issue of clock domains, because hey, it works. Until it doesn't, occasionally.

Speaking of being sloppy, note that neither @toggle nor any of its related registers are reset nor assigned an initial value in the example above. This is usually fine, because odds are that the synthesizer assigns them all with an initial value of 0. And even if these registers don't have the same value initially, it results in one unnecessary sampling of @foo, and no more than that. It might be a good idea to reset these registers nevertheless.

More advanced variants

So far, I've presented three simple examples:

Going across clock domains with a single bit, by virtue of a metastability guard (in the previous page).
The same with multiple bits, but with the restriction that only one bit changes each clock cycle.
Going across clock domains without a restriction on the word, however with the limitation on how often this word is allowed to change. A toggle bit was used to ensure that the word is sampled only when it's stable.

These simple examples are the basis for several other mechanisms.

First, I promised to say something about @new_bar in the example above. So it's just a register that is high during one clock cycle, when @bar has a new value. Nothing special about this, but note that @bar and @new_bar reflect @foo and @do_update in the other clock domain. So this is a way to pass commands and status messages across a clock domain (have I mentioned that a FIFO should be used instead, when possible?).

Another interesting expansion of the last example is: Put a dual port RAM instead of the pair of registers, @foo and @bar. This is a method for conveying buffers of data across clock domains: Suppose that the logic in the clock domain of @clk1 writes data into the RAM, and fills one half of this RAM after some time. As the logic goes on to fill the second half of the RAM, it changes the value of @toggle. This register is copied to the clock domain of @clk2, exactly as shown above. But instead of updating @bar as in the example, the logic consumes the data in the first half of the RAM.

This is how this simple register can synchronize a double-buffer mechanism, where one side reads data from the RAM and the other side reads from this RAM. In fact, the role of @toggle isn't just to change value, but it also informs the other side which half of the RAM is currently being written to.

And yet, it's best to use a FIFO when possible. Even though this double-buffer mechanism might sound tempting, it should be used only when there is no better alternative. For example, when the data from the RAM is read in a different order than the data is written.

Summary

In the end, it boils down to this: In the transition between clock domains of unrelated clocks, there's always resynchronization logic involved. The data word that passes this resynchronization is limited, so only one bit can change value on each clock cycle of the source's clock (@clk1 in the examples). Otherwise, illegal data may arrive at the destination.

In some applications, this is good enough, but when this limitation is too restraining, the data can instead move between the clock domains with a vector register or through a RAM, without any resynchronization logic used on the data itself. This works thanks to logic that maintains a minimal time gap between the write operation and the read operation of the data. This time gap ensures that the data word is stable when it's sampled at the destination. This logic is nevertheless based upon the same technique for clock domain crossing. Accordingly, this solution involves resynchronization logic that is limited to changing one bit at a time, possibly by using Gray code.

So resynchronization logic and this rule of one bit are always there when unrelated clocks are involved. It's just a matter of how they're applied.