Xilinx’ MiG memory controller’s init process reverse engineered
Introduction
I’m using Xilinx’ MiG 1.7.3 for running DDR2 memories on a Virtex-4 FPGA. It didn’t take me long to realize that the controller never finishes initialization. The problem is that I had no idea of why, and as far as I know, no documentation to refer to in my attempts to understand where the controller got stuck, which is an essential stage in getting it unstuck.
Since Xilinx are wise enough to release the IP core with its source, I was able to reverse engineer the initialization process to the level necessary for my own purpose. This is a memo of the details, just in case I’ll need to do this again some time. I sure hope that won’t be necessary…
In my case, the problem seems to have been overheating of the FPGA. I’m not 100% sure about this, but with 90 degrees centigrade measured on the case, and everything starting to work OK when a descent heatsink (with fan) was put in place, it looks pretty much like good old heat.
Overview
The initialization process consists of several stages. During the entire process, the controller is governed by the init_state one-hot state machine in the ddr2_controller module. The end of this process is marked by init_done_int going high, which goes out as init_done, hence marking the end to the IP core’s user.
The initialization consists of roughly three stages:
- Setting up the memory device
- Setting up the IDELAYs taps so that the DQ inputs are samples with good timing.
- Learning the correct latency for reading data from DQs during read cycles.
Throughout the init process, active and precharge operations take place as required by standard. These operations are not mentioned here, since they don’t add anything to understanding the principle.
Setting up the memory device
This is the normal JEDEC procedure, which includes a preknown sequence of peculiar operations, as defined in the DDR2 standard. This includes writing to the memory’s mode registers. During this phase, the controller will not care if it’s talking to a memory or not, since it never reads anything back from the memory.
Setting up the IDELAYs taps
The importance of this stage is to make sure that data is sampled from DQ lines at the best possible timing. Each DQ input is calibrated separately.
This stage begins with a single write command to column zero. The write data FIFO has already been written some data to it, so that the rising edge contains all ones, and the falling edge is all zeros. For example, for a memory with 16 DQ lines, the FIFO has been fed with 0xFFFF0000 twice for memories with burst length of 4, and four times if the burst length is 8.
This can be seen in the backend_fifos module. In that module, one can see that data is written to the write data FIFO immediately after reset. Also, there is another set of words written to the FIFO, which are intended for the next stage.
All in all, this single write commands drains the FIFO with the words containing all ones or all zeros, so that column zero contains this data. Next the controller reads column zero continuously while adjusting the delay taps to achieve proper input timing for the DQs.
The logic for moving the taps is outside the ddr2_controller module. The latter merely helps by performing reads. When the tap logic finishes, it signals it’s done by raising the signal known as phy_Dly_Slct_Done in the ddr2_controller module, and carries many other names such as SEL_DONE. In the tap_logic module (from which it origins) it’s called tap_sel_done.
The tap calibrator increments the tap delay until the data on that line shifts, or until 55 increments has taken place. Whenever this happens, it’s considered to be the data edge. The tap delay is then decremented by the number of times defined by the tby4tapvalue parameter (17 in my case).
Note that even if no edge is found at all, the tap delay calibrator will consider the calibration of that tap OK.
Here is a short list of lines I found useful to look at with a scope (using the FPGA Editor):
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/calib_done_int
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/tap_sel_done
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_done
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/dlyce_dqs[0]
- ddr2_ctrl_tandem/data_path_00/tap_logic_00/dlyinc_dqs[0]
CHAN_DONE is the most interesting signal, because it goes high briefly every time a data line has finished its tap calibration. Unfortunately, the synthesizer messes up the identification of this signal, so the only way to tell it, is by finding what causes ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/chan_sel_int to change state. In my case it was
ddr2_ctrl_tandem/data_path_00/tap_logic_00/data_tap_inc_0/chan_sel_int_not0002
This signal should go high 8 times (or the number of data lines per DQ you have). If it does a fewer numbers and then nothing happens, you can tell which of this data lines is problematic simply by counting these strobes.
Latency for reading data
The purpose of this stage is to tell when, in terms of semiclocks, to sample the data read from the memory. I’m not 100% clear on why this stage is necessary at all, but that won’t change the fact that it exists.
This stage starts with a single write operation again. This time the written data is slightly more sophisticated (keep in mind that it was loaded to the write data FIFO immediately after wakeup from reset). The first column will have the data 0xA written to it, duplicated to occupy all DQs. For example, on a memory with 16 DQs, the first column will be 0xAAAA. The second column is 0x5 duplicated, the third 0x9, and the fourth 0x6, all duplicated. If the burst length is 8, this four word sequence is repeated.
After writing this, the controller reads column zero continously, until COMP_DONE goes high. This signal origins from the pattern_compare8 module, which tells the controller it has recovered the correct input data alignment. More precisely, the rd_data send the ddr2_controller a logical AND of all pattern_compare8′s comp_done signals.
These pattern_compare8 modules simply looks for an 0xAA pattern followed by a 0x99 pattern in the input during rising edges only, or an 0x55 followed by 0x66 on the rising edge. So it will catch the reads of the first and third column, or the second or forth, but either way this solves the alignment ambiguity completely.
As the pattern_compare8 module tries to match the data, it increments (among others) its clk_cnt_rise register (not to be confused with the clk_count_rise wire, which contains the final result). Monitoring clk_cnt_rise[0] (using FPGA Editor, for example) can give a positive feedback that the initialization is at this phase. It should give a nice square wave at half the DDR2 controller’s clk0 frequency, and then stop when this phase is done.
Summary.
The initialization process is not the simplest in the world, and it’s likely to fail if you got anything wrong with your memory, in particular if you have as little as one data wire line miswired. This is not really good news, but understanding the process may help at least understand what went wrong, and hopefully fixing it too.