Improving timing for FIFO by adding registersEvery now and then we find that the design's FIFO signals in the critical paths. There are several reasons why a FIFO may be involved:
So, let's stick to the real problem: We want the rd_en signal to be sampled by a flip-flop in the FIFO as soon as possible, and we want a low clock-to-output on the data lines. It's clear that the answer is adding registers, but there's some logic to add as well... The Verilog code for a FIFO wrapper is given below, and is released to the public domain. This means that I don't claim any copyrights for it. How to use it: This is a functional drop-in wrapper for a typical FIFO (such as the one generated by Xilinx' Fifo Generator). This means that if you wrap your existing FIFO with this module, your design will functionally "do the same", with possibly better static timing. But you can not expect identical simulations, because the data appears slightly later on the FIFO's output. A non-buggy and fairly sensible design should not care about such a difference. Assuming that you have a non-FWFT FIFO to begin with, with a pretty standard read interface, you should instantiate it in the module below as "orig_fifo", and set the "width" parameter to the number of data bits you have. If you need some other control signal, pass them though with care. For example, "almost empty" may ring the bell slightly too soon. Signals regarding the write path can be passed on as is, since the write path is not touched by this wrapper. The term "FWFT" is explained below, in case you're not familiar with it. So here's the module. It's explained in detail after the listing. module reg_fifo(rst, rd_clk, rd_en, dout, empty, wr_clk, wr_en, din, full, prog_full); // Eli Billauer, 12.4.06 // This module is released to the public domain; Any use is allowed. parameter width = 8; input rst; input rd_clk; input rd_en; input wr_clk; input wr_en; input [(width-1):0] din; output empty; output full; output prog_full; output [(width-1):0] dout; reg fifo_valid, middle_valid; reg [(width-1):0] dout, middle_dout; wire [(width-1):0] fifo_dout; wire fifo_empty, fifo_rd_en; wire will_update_middle, will_update_dout; // orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO fifo orig_fifo ( .rst(rst), .rd_clk(rd_clk), .rd_en(fifo_rd_en), .dout(fifo_dout), .empty(fifo_empty), .wr_clk(wr_clk), .wr_en(wr_en), .din(din), .full(full), .prog_full(prog_full) ); assign will_update_middle = fifo_valid && (middle_valid == will_update_dout); assign will_update_dout = (middle_valid || fifo_valid) && rd_en; assign fifo_rd_en = (!fifo_empty) && !(middle_valid && fifo_valid); assign empty = !(fifo_valid || middle_valid); always @(posedge rd_clk) if (rst) begin fifo_valid <= 0; middle_valid <= 0; dout <= 0; middle_dout <= 0; end else begin if (will_update_middle) middle_dout <= fifo_dout; if (will_update_dout) dout <= middle_valid ? middle_dout : fifo_dout; if (fifo_rd_en) fifo_valid <= 1; else if (will_update_middle || will_update_dout) fifo_valid <= 0; if (will_update_middle) middle_valid <= 1; else if (will_update_dout) middle_valid <= 0; end endmodule OK, let's see how this thing works. The first thing to realize, is that if we want that the FIFO's rd_en will be detached from the original rd_en, two registers are necessary. Consider the following drawing: ![]() The FIFO's output is marked as fifo_dout. This modules output is dout. Now suppose that we didn't have the middle register. We could activate the original FIFO's rd_en so that the first data entry appears on fifo_dout, and hold our "empty" signal low, telling the application to come and get data. But whenever the application came to get some, it would raise the rd_en. The wrapper would then sample fifo_dout into dout, and all is fine so far. But now what? Since we don't want the FIFO's fifo_rd_en to combinatorically depend on the application's rd_en, there is no possibility to update fifo_dout during the same clock cycle, so we've run out of data. A direct response to rd_en is therefore to raise the "empty" flag, raise the FIFO's fifo_rd_en, and wait a clock. Only then can we clear the "empty" flag again. This means that the wrapped FIFO works in half rate. And if we're ready for a FIFO to work in half rate, there are simpler solutions. So the solution is to have another register, middle_dout. The FIFO wrapper always tries to keep both fifo_dout and middle_dout with data. Just to get an idea of how this works, let's consider the case of a well-filled FIFO and that we're after a long period with no reads. That means that fifo_dout and middle_dout have valid data. At this point the application's rd_en is raised. dout samples middle_dout. The "empty" flag is kept low, since data can still be sampled from fifo_dout. If rd_en is kept high, data will continously be sampled from fifo_dout to dout. But all this requires the FIFO's fifo_rd_en to be correctly driven. We shall now see how it's all done. Let's get acquainted with a few internal signals in the Verilog code above. We'll start with will_update_middle and will_update_dout:
Evidently from the Verilog code, the middle_dout and dout registers (respectively) sample something when these signals are high. middle_dout samples fifo_dout (it doesn't have much choice), but dout will sample middle_dout if middle_valid is high, otherwise it goes for fifo_dout. The two *_valid flags tell us whether the respective register containts valid data.
When fifo_rd_en is high during a clock rise, fifo_valid goes high as well: If we just pulled data from the FIFO, its current output is considered valid. But if fifo_rd_en was low, and data was sampled by one of middle_dout or dout, we don't consider fifo_dout valid anymore, since its data has been consumed. middle_valid goes by the same logic: When will_update_middle is high, data is sampled into middle_dout, so middle_valid goes high as well. If not, it will go low when dout samples data from it. Now let's turn to how the FIFO's fifo_rd_en signal is generated:
We will read from the FIFO if it has data (i.e. not fifo_empty) and if any of fifo_dout and/or middle_dout is not valid. Note that fifo_rd_en driven by the state machine's urge to prepare data for reading, and not the application's reading. As a result, fifo_rd_en is derived from a three-input function, which depends on two flip-flops (fifo_valid and middle_valid) and the FIFO's own fifo_empty signal. On a Xilinx FPGA, this is done with a single LUT, which is as fast as you can get for any flow-controlled read from a FIFO. The "empty" flag is generated from a function of these two flip-flops as well, so it's very fast as well.
We are now left with the generation of the will_update_* signals:
will_update_dout essentially says (!empty && rd_en). Since we mimic a FIFO, it's pretty natural that the output is updated when it's not empty, and data is requested by rd_en. will_update_middle is trickier: It requires fifo_valid (otherwise there is nothing to sample) and then we have this (middle_valid == will_update_dout). This is actually a shortcut for saying the following:
The combinatoric functions which generate will_update_middle, will_update_dout and the next values of fifo_valid and middle_valid may seem complicated, but they eventually depend only fifo_empty, rd_en and two registers: fifo_valid and middle_valid. A fairly clever synthesizer should detect this, and create a single logic level on these important paths on a Xilinx FPGA architecture. A FWFT (First Word Fall Through) FIFO conversionIn a "classic" FIFO, a deasserted "empty" flag means that valid data will be presented on the FIFO's outputs after rd_en is sampled high on clock. A FWFT FIFO presents the data on the outputs as soon as it's available, so a deasserted "empty" flag means that the data on the outputs is valid.The meaning of rd_en could therefore be interpreted somewhat differently: For a "classic" FIFO it means "bring me data". On a FWFT it's something like "I just grabbed the data, bring me the next one if you have it". We shall now see a module, which uses a "classic" FIFO to implement a FWFT. Note that it does not improve timing: The FIFO's dout is passed through, and the FIFO's rd_en depends on the one coming from the application, so its timing is slightly worse than a the bare FIFO it uses. As a matter of fact, this wrapper merely fiddles with the rd_en and empty signals. If timing improvement is necessary, see the end of this page. module basic_fwft_fifo(rst, rd_clk, rd_en, dout, empty, wr_clk, wr_en, din, full, prog_full); // Eli Billauer, 12.4.06 // This module is released to the public domain; Any use is allowed. parameter width = 8; input rst; input rd_clk; input rd_en; input wr_clk; input wr_en; input [(width-1):0] din; output empty; output full; output prog_full; output [(width-1):0] dout; reg dout_valid; wire fifo_rd_en, fifo_empty; // orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO fifo orig_fifo ( .rst(rst), .rd_clk(rd_clk), .rd_en(fifo_rd_en), .dout(dout), .empty(fifo_empty), .wr_clk(wr_clk), .wr_en(wr_en), .din(din), .full(full), .prog_full(prog_full) ); assign fifo_rd_en = !fifo_empty && (!dout_valid || rd_en); assign empty = !dout_valid; always @(posedge rd_clk) if (rst) dout_valid <= 0; else begin if (fifo_rd_en) dout_valid <= 1; else if (rd_en) dout_valid <= 0; end endmodule Using a FWFT for an improved non-FWFT FIFOGiven a FWFT FIFO, it's quite easy to turn it into a non-FWFT FIFO. A FWFT FIFO can be created as shown above, or requested as such from Xilinx' tools.As before, the bonus is the extra register at the output improves the timing. Besides, Xilinx FIFO Generator has an "Use embedded registers" option, which can be enabled with no difference in the FWFT FIFO's behavior (note however that when embedded registers are used on the built-in FIFO created by Xilinx' FIFO Generator, it's not a real FWFT. When any non-built-in FIFO option is used, it's a true FWFT). Embedded registers improve timing even further. So it boils down to simply this (example given for 32 bit width): Since a deasserted "empty" means that data is present on the output port, create a register which samples this data when rd_en is asserted. That mimics a standard FIFO's behavior: module regular_fifo ( input rst, input wr_clk, input rd_clk, input [31:0] din, input wr_en, input rd_en, output reg [31:0] dout, output full, output empty, output prog_full ); wire [31:0] dout_w; always @(posedge rd_clk) if (rd_en && !empty) dout <= dout_w; fwft_fifo wrapper ( .wr_clk(wr_clk), .rd_clk(rd_clk), .rst(rst), .din(din), .wr_en(wr_en), .rd_en(rd_en), .dout(dout_w), .full(full), .prog_full(prog_full), .empty(empty) ); endmodule A FWFT with improved timingWe shall finish with a combination of the two: A FWFT module with extra registers for improved timing. It is based on the first module of this page, with some changes that makes it a FWFT FIFO. Only few changes are required: To make dout sample data not only on rd_en, but also when it's invalid. Also, the "empty" flag (which is a misnomer for a FWFT) should be active when dout is not valid, rather than depending on the validity of other registers.The module for a conversion from a non-FWFT FIFO to a FWFT FIFO, including the timing advantages, is given below. module fwft_fifo(rst, rd_clk, rd_en, dout, empty, wr_clk, wr_en, din, full, prog_full); parameter width = 8; input rst; input rd_clk; input rd_en; input wr_clk; input wr_en; input [(width-1):0] din; output empty; output full; output prog_full; output [(width-1):0] dout; reg fifo_valid, middle_valid, dout_valid; reg [(width-1):0] dout, middle_dout; wire [(width-1):0] fifo_dout; wire fifo_empty, fifo_rd_en; wire will_update_middle, will_update_dout; // orig_fifo is just a normal (non-FWFT) synchronous or asynchronous FIFO fifo orig_fifo ( .rst(rst), .rd_clk(rd_clk), .rd_en(fifo_rd_en), .dout(fifo_dout), .empty(fifo_empty), .wr_clk(wr_clk), .wr_en(wr_en), .din(din), .full(full), .prog_full(prog_full) ); assign will_update_middle = fifo_valid && (middle_valid == will_update_dout); assign will_update_dout = (middle_valid || fifo_valid) && (rd_en || !dout_valid); assign fifo_rd_en = (!fifo_empty) && !(middle_valid && dout_valid && fifo_valid); assign empty = !dout_valid; always @(posedge rd_clk) if (rst) begin fifo_valid <= 0; middle_valid <= 0; dout_valid <= 0; dout <= 0; middle_dout <= 0; end else begin if (will_update_middle) middle_dout <= fifo_dout; if (will_update_dout) dout <= middle_valid ? middle_dout : fifo_dout; if (fifo_rd_en) fifo_valid <= 1; else if (will_update_middle || will_update_dout) fifo_valid <= 0; if (will_update_middle) middle_valid <= 1; else if (will_update_dout) middle_valid <= 0; if (will_update_dout) dout_valid <= 1; else if (rd_en) dout_valid <= 0; end endmodule
Last modified on Mon Jun 8 11:59:22 2015. E-mail:
|