Xilinx Ultrascale / Ultrascale+ GTH/GTY CPLL calibration

This post was written by eli on August 23, 2020
Posted Under: FPGA,GTX,Vivado

… or why does my GTH/GTY not come out of reset? Why are those reset_rx_done / reset_tx_done never asserted after a reset_all or a reset involving the CPLLs?

What’s this CPLL calibration thing about?

It turns out that some GTH/GTY’s on Ultrascale and Ultrascale+ FPGAs have problems with getting the CPLL to work reliably. I’ll leave PG182 for the details on which ones. So CPLL calibration is a cute name for some workaround logic, based upon the well-known principle, that if something doesn’t work, turn it off, turn it on, and then check again. Repeat.

Well, not quite that simple. There’s also some playing with a bit (they call it FBOOST, jump start or not?) in the secret-sauce CPLL_CFG0 setting.

This way or another, this extra piece of logic simply checks whether the CPLL is at its correct frequency, and if so, it does nothing. If the CPLL’s frequency isn’t as expected, with a certain tolerance, it powers the CPLL off and on (with CPLLPD), resets it (CPLLRESET) and also plays with that magic FBOOST bit. And then tries again, up to 15 times.

The need for access to the GT’s DRPs is not just for that magic register’s sake, though. One can’t just measure the CPLL’s frequency directly, as it’s a few GHz. An FPGA can only work with a divided version of this clock. As there are several possibilities for routing and division of clocks inside the GT to its clock outputs, and the clock dividers depend on the application’s configuration, there’s a need to bring one such clock output to give the CPLL’s output divided by a known number. TXOUTCLK was chosen for this purpose.

So much of the calibration logic does exactly that: It sets some DRP registers to set up a certain relation between the CPLL’s output and TXOUTCLK (divided by 20, in fact), it does its thing, and then returns those register’s values to what they were before.

A word of warning

My initial take on this CPLL calibration thing was to enable it for all targets (see below for how). Can’t hurt, can it? An extra check that the CPLL is fine before kicking off. What could possibly go wrong?

I did this on Vivado 2015.2, and all was fine. And then I tried on later Vivado version. Boom. The GTH didn’t come out of reset. More precisely, the CPLL calibration clearly failed.

I can’t say that I know exactly why, but I caught the change that makes the difference: Somewhere between 2015.2 and 2018.3, the Wizard started to set the GTH’s CPLL_INIT_CFG0 instantiation parameter to 16′b0000001010110010. Generating the IP with this parameter set to its old value, 16′b0000000000011110, made the GTH work properly again.

I compared the reset logic as well as the CPLL calibration logic, and even though there I found a few changes, they were pretty minor (and I  also tried to revert some of them, but that didn’t make any difference).

So the conclusion is that the change in CPLL_INIT_CFG0 failed the CPLL calibration. Why? I have no idea. The meaning of this parameter is unknown. And the CPLL calibration just checks that the frequency is OK. So maybe it slows down the lock, so the CPLL isn’t ready when it’s checked? Possibly, but this info wouldn’t help very much anyhow.

Now, CPLL calibration is supposed to be enabled only for FPGA targets that are known to need it. The question is whether the Transceiver IP’s Wizard is clever enough to set CPLL_INIT_CFG0 to a value that won’t make the calibration fail on those. I have no idea.

By enabling CPLL calibration for a target that doesn’t need it, I selected an exotic option, but the result should have worked nevertheless. Surely it shouldn’t break from one Vivado version to another.

So the bottom line is: Don’t fiddle with this option, and if your GTH/GTY doesn’t come out of reset, consider turning CPLL calibration off, and see if that changes anything. And if so, I have no clear advice what to do. But at least the mystery will be resolved.

Note that…

  • The CPLL calibration is triggered by the GT’s reset_all assertion, as well as with reset_*_pll_and_datapath, if the CPLL is used in the relevant data path. The “reset done” signal for a data path that depends on the CPLL is asserted only if and when the CPLL calibration was successful and the CPLL is locked.
  • If cplllock_out (if exposed) is never asserted, this could indicate that the CPLL calibration failed. So it makes sense to wait indefinitely for it — better fail loudly than work with a wobbling clock.
  • Because the DRP clock is used to measure the period of time for counting the number of cycles of the divided CPLL clock, its frequency must be set accurately in the Wizard. Otherwise, the CPLL calibration will most certainly fail, even if the CPLL is perfectly fine.
  • The calibration state machine takes control of some GT ports (listed below) from when cpllreset_in is deasserted, and until the calibration state machine has finished, with success or failure.
  • While the calibration takes place, and if the calibration ends up failing, the cplllock_out signal presented to the user logic is held low. Only when the calibration is finished successfully, is the GT’s CPLLLOCK connected to the user logic (after a slight delay, and synchronized with the DRP clock).

Activating the CPLL calibration feature

See “A word of warning” above. You probably don’t want to activate this feature for all FPGA targets.

There are three possible choices for whether the CPLL calibration module is activated in the Wizard’s transceiver. This can’t be set from the GUI, but by editing the XCI file manually. There are two parameters in that file, PARAM_VALUE.INCLUDE_CPLL_CAL and MODELPARAM_VALUE.C_INCLUDE_CPLL_CAL, which should have the same value as follows:

  • 0 — Don’t activate.
  • 1 — Do activate.
  • 2 — Activate only for devices which the Wizard deems have a problem (default).

Changing it from the default 2 to 1 makes Vivado respond with locking the core saying it “contains stale content”. To resolve this, “upgrade” the IP, which triggers a warning that user intervention is necessary.

And indeed, three new ports are added, and this change this addition of ports is also reflected in the XCI file (but nothing else should change): gtwiz_gthe3_cpll_cal_txoutclk_period_in, gtwiz_gthe3_cpll_cal_cnt_tol_in and gtwiz_gthe3_cpll_cal_bufg_ce_in.

These are three input ports, so they have to be assigned values. PG182‘s Table 3-1 gives the formulas for that (good luck with that) and the dissection notes below explain these formulas. But the TL;DR version is:

  • gtwiz_gthe3_cpll_cal_bufg_ce_in should be assigned with a constant 1′b1.
  • gtwiz_gthe3_cpll_cal_txoutclk_period_in should be assigned with the constant value of P_CPLL_CAL_TXOUTCLK_PERIOD, as found in the transceiver IP’s synthesis report (e.g. mytransceiver_synth_1/runme.log).
  • gtwiz_gthe3_cpll_cal_cnt_tol_in should be assigned with the constant value of P_CPLL_CAL_TXOUTCLK_PERIOD, divided by 100.

The description here relates to a single transceiver in the IP.

The meaning of gtwiz_gthe3_cpll_cal_txoutclk_period_in is as follows: Take the CPLL clock and divide it by 80. Count the number of clock cycles in a time period corresponding to 16000 DRP clock cycles. That’s the value to assign, as this is what the CPLL calibration logic expects to get.

gtwiz_gthe3_cpll_cal_cnt_tol_in is the number of counts that the result can be higher or lower than expected, and still the CPLL will be considered fine. As this is taken as the number of expected counts, divided by 100, this results in a ±1% clock frequency tolerance. Which is a good idea, given that common SSC clocking (PCIe, SATA, USB 3.0) might drag down the clock frequency by -5000 ppm, i.e. -0.5%.

The possibly tricky thing with setting these correctly is that they depend directly on the CPLL frequency. Given the data rate, there might be more than one possibility for a CPLL frequency, however it’s not expected that the Wizard will change it from run to run unless something fundamental is changed in the parameters (e.g. changing the data rate of one of the directions or both).

Besides, the CPLL frequency appears in the XCI file as MODELPARAM_VALUE.C_CPLL_VCO_FREQUENCY.

If the CPLL is activated deliberately, it’s recommended to verify that it actually takes place by setting a wrong value for gtwiz_gthe3_cpll_cal_txoutclk_period_in, and check that the calibration fails (cplllock_out remains low).

Which ports are affected?

Looking at ultragth_gtwizard_gthe3.v gives the list of ports that the CPLL calibration logic fiddles with. Within the CPLL calibration generate clause they’re assigned with certain values, and in the “else” clause, with the plain bypass:

    // Assign signals as appropriate to bypass the CPLL calibration block when it is not instantiated
    else begin : gen_no_cpll_cal
      assign txprgdivresetdone_out = txprgdivresetdone_int;
      assign cplllock_int          = cplllock_ch_int;
      assign drprdy_out            = drprdy_int;
      assign drpdo_out             = drpdo_int;
      assign cpllreset_ch_int      = cpllreset_int;
      assign cpllpd_ch_int         = cpllpd_int;
      assign txprogdivreset_ch_int = txprogdivreset_int;
      assign txoutclksel_ch_int    = txoutclksel_int;
      assign drpaddr_ch_int        = drpaddr_int;
      assign drpdi_ch_int          = drpdi_int;
      assign drpen_ch_int          = drpen_int;
      assign drpwe_ch_int          = drpwe_int;
    end

Dissection of Wizard’s output

The name of the IP was ultragth in my case. That’s the significance of this name appearing all over this part.

The impact of changing the XCI file: In the Verilog files that are produced by the Wizard, MODELPARAM_VALUE.C_INCLUDE_CPLL_CAL is used directly when instantiating the ultragth_gtwizard_top, as the C_INCLUDE_CPLL_CAL instantiation parameter.

Also, the three new input ports are passed on to ultragth_gtwizard_top.v, rather than getting all zero assignments when they’re not exposed to the user application logic.

When activating the CPLL calibration (setting INCLUDE_CPLL_CAL to 1) additional constraints are also added to the constraint file for the IP, adding a few new false paths as well as making sure that the timing calculations for the TXOUTCLK is set according to the requested clock source. The latter is necessary, because the calibration logic fiddles with TXOUTCLKSEL during the calibration phase.

In ultragth_gtwizard_top.v the instantiation parameters and the three ports are just passed on to ultragth_gtwizard_gthe3.v, where the action happens.

First, the following defines are made (they like short names in Xilinx):

`define ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__EXCLUDE 0
`define ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__INCLUDE 1
`define ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__DEPENDENT 2

and further down, we have this short and concise condition for enabling CPLL calibration:

    if ((C_INCLUDE_CPLL_CAL         == `ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__INCLUDE) ||
        (((C_INCLUDE_CPLL_CAL       == `ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__DEPENDENT) &&
         ((C_GT_REV                 == 11) ||
          (C_GT_REV                 == 12) ||
          (C_GT_REV                 == 14))) &&
         (((C_TX_ENABLE             == `ultragth_gtwizard_gthe3_TX_ENABLE__ENABLED) &&
           (C_TX_PLL_TYPE           == `ultragth_gtwizard_gthe3_TX_PLL_TYPE__CPLL)) ||
          ((C_RX_ENABLE             == `ultragth_gtwizard_gthe3_RX_ENABLE__ENABLED) &&
           (C_RX_PLL_TYPE           == `ultragth_gtwizard_gthe3_RX_PLL_TYPE__CPLL)) ||
          ((C_TXPROGDIV_FREQ_ENABLE == `ultragth_gtwizard_gthe3_TXPROGDIV_FREQ_ENABLE__ENABLED) &&
           (C_TXPROGDIV_FREQ_SOURCE == `ultragth_gtwizard_gthe3_TXPROGDIV_FREQ_SOURCE__CPLL))))) begin : gen_cpll_cal

which simply means that the CPLL calibration module should be generated if _INCLUDE_CPLL_CAL is 1 (as I changed it to), or if it’s 2 (default) and some conditions for enabling it automatically are met).

Further down, the hint for how to assign those three new ports is given. Namely, if CPLL was added automatically due to the default assignment and specific target FPGA, the values calculated by the Wizard itself are used

      // The TXOUTCLK_PERIOD_IN and CNT_TOL_IN ports are normally driven by an internally-calculated value. When INCLUDE_CPLL_CAL is 1,
      // they are driven as inputs for PLL-switching and rate change special cases, and the BUFG_GT CE input is provided by the user.
      wire [(`ultragth_gtwizard_gthe3_N_CH* 18)-1:0] cpll_cal_txoutclk_period_int;
      wire [(`ultragth_gtwizard_gthe3_N_CH* 18)-1:0] cpll_cal_cnt_tol_int;
      wire [(`ultragth_gtwizard_gthe3_N_CH*  1)-1:0] cpll_cal_bufg_ce_int;
      if (C_INCLUDE_CPLL_CAL == `ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__INCLUDE) begin : gen_txoutclk_pd_input
        assign cpll_cal_txoutclk_period_int = {`ultragth_gtwizard_gthe3_N_CH{gtwiz_gthe3_cpll_cal_txoutclk_period_in}};
        assign cpll_cal_cnt_tol_int         = {`ultragth_gtwizard_gthe3_N_CH{gtwiz_gthe3_cpll_cal_cnt_tol_in}};
        assign cpll_cal_bufg_ce_int         = {`ultragth_gtwizard_gthe3_N_CH{gtwiz_gthe3_cpll_cal_bufg_ce_in}};
      end
      else begin : gen_txoutclk_pd_internal
        assign cpll_cal_txoutclk_period_int = {`ultragth_gtwizard_gthe3_N_CH{p_cpll_cal_txoutclk_period_int}};
        assign cpll_cal_cnt_tol_int         = {`ultragth_gtwizard_gthe3_N_CH{p_cpll_cal_txoutclk_period_div100_int}};
        assign cpll_cal_bufg_ce_int         = {`ultragth_gtwizard_gthe3_N_CH{1'b1}};
      end

These `ultragth_gtwizard_gthe3_N_CH things are just duplication of the same vector, in case there are multiple channels for the same IP.

First, note that cpll_cal_bufg_ce is assigned constant 1. Not clear why this port is exposed at all.

And now to the calculated values. Given that it says

      wire [15:0] p_cpll_cal_freq_count_window_int      = P_CPLL_CAL_FREQ_COUNT_WINDOW;
      wire [17:0] p_cpll_cal_txoutclk_period_int        = P_CPLL_CAL_TXOUTCLK_PERIOD;
      wire [15:0] p_cpll_cal_wait_deassert_cpllpd_int   = P_CPLL_CAL_WAIT_DEASSERT_CPLLPD;
      wire [17:0] p_cpll_cal_txoutclk_period_div100_int = P_CPLL_CAL_TXOUTCLK_PERIOD_DIV100;

a few rows above, and

  localparam [15:0] P_CPLL_CAL_FREQ_COUNT_WINDOW      = 16'd16000;
  localparam [17:0] P_CPLL_CAL_TXOUTCLK_PERIOD        = (C_CPLL_VCO_FREQUENCY/20) * (P_CPLL_CAL_FREQ_COUNT_WINDOW/(4*C_FREERUN_FREQUENCY));
  localparam [15:0] P_CPLL_CAL_WAIT_DEASSERT_CPLLPD   = 16'd256;
  localparam [17:0] P_CPLL_CAL_TXOUTCLK_PERIOD_DIV100 = (C_CPLL_VCO_FREQUENCY/20) * (P_CPLL_CAL_FREQ_COUNT_WINDOW/(400*C_FREERUN_FREQUENCY));
  localparam [25:0] P_CDR_TIMEOUT_FREERUN_CYC         = (37000 * C_FREERUN_FREQUENCY) / C_RX_LINE_RATE;

it’s not all that difficult to do the math. And looking at Table 3-1 of PG182, the formulas match perfectly, but I didn’t feel very reassured by those.

So why bother? Much easier to use the values calculated by the tools, as they appear in ultragth_synth_1/runme.log (for a 5 Gb/s rate and reference clock of 125 MHz, but YMMV as there’s more than one way to achieve a line rate):

	Parameter P_CPLL_CAL_FREQ_COUNT_WINDOW bound to: 16'b0011111010000000
	Parameter P_CPLL_CAL_TXOUTCLK_PERIOD bound to: 18'b000000111110100000
	Parameter P_CPLL_CAL_WAIT_DEASSERT_CPLLPD bound to: 16'b0000000100000000
	Parameter P_CPLL_CAL_TXOUTCLK_PERIOD_DIV100 bound to: 18'b000000000000101000
	Parameter P_CDR_TIMEOUT_FREERUN_CYC bound to: 26'b00000011100001110101001000

The bottom line is hence to set gtwiz_gthe3_cpll_cal_txoutclk_period_in to 18′b000000111110100000, and gtwiz_gthe3_cpll_cal_cnt_tol_in to 18′b000000000000101000. Which is 4000 and 40 in plain decimal, respectively.

Dissection of CPLL Calibration module (specifically)

The CPLL calibrator is implemented in gtwizard_ultrascale_v1_5/hdl/verilog/gtwizard_ultrascale_v1_5_gthe3_cpll_cal.v.

Some basic reverse engineering. This may be inaccurate, as I wasn’t very careful about the gory details on this matter. Also, when I say that a register is modified below, it’s to values that are listed after the outline of the state machine (further below).

So just to get an idea:

  • TXCLKOUTSEL start with value 0.
  • Using the DRP ports, it fetches the existing values of the PROGCLK_SEL and PROGDIV registers, and modifies their values.
  • It changes TXCLKOUTSEL to 3′b101, i.e. TXCLKOUT is routed to TXPROGDIVCLK. This can be more than one clock source, but it’s the CPLL directly, divided by PROGDIV (judging by the value assigned to PROGCLK_SEL).
  • CPLLRESET is asserted for 32 clock cycles, and then deasserted.
The state machine now enters a loop as follows.
  • The state machine waits 16384 clock cycles. This is essentially waiting for the CPLL to lock, however the CPLL’s lock detector isn’t monitored. Rather, it waits this fixed amount of time.
  • txprogdivreset is asserted for 32 clock cycles.
  • The state machine waits for the assertion of the GT’s txprgdivresetdone (possibly indefinitely).
  • The state machine checks that the frequency counter’s output (more on this below) is in the range of TXOUTCLK_PERIOD_IN ± CNT_TOL_IN. If so, it exits this loop (think C “break” here), with the intention of declaring success. If not, and this is the 15th failed attempt, it exits the loop as well, but with the intention of declaring failure. Otherwise, it continues as follows.
  • The FBOOST DRP register is read and then modified.
  • 32 clock cycles later, CPLLRESET is asserted.
  • 32 clock cycles later, CPLLPD is asserted for a number of clock cycles (determined by the module’s WAIT_DEASSERT_CPLLPD_IN input), and then deasserted (the CPLL is powered down and up!).
  • 32 clock cycles later, CPLLRESET is deasserted.
  • The FBOOST DRP register is restored to its original value.
  • The state machine continues at the beginning of this loop.
And the final sequence, after exiting the loop:
  • PROGDIV and PROGCLK_SEL are restored to its original value
  • CPLLRESET is asserted for 32 clock cycles, and then deasserted.
  • The state machine waits for the assertion of the GT’s cplllock, possibly indefinitely.
  • txprogdivreset is asserted for 32 clock cycles.
  • The state machine waits for the assertion of the GT’s txprgdivresetdone (possibly indefinitely).
  • The state machine finishes. At this point one of the module’s CPLL_CAL_FAIL or CPLL_CAL_DONE is asserted, depending on the reason for exiting the loop.

As for the values assigned when I said “modified” above, I won’t get into that in detail, but just put a related snippet of code. Note that these values are often shifted to their correct place in the DRP registers in order to fulfill their purpose:

  localparam [1:0]  MOD_PROGCLK_SEL = 2'b10;
  localparam [15:0] MOD_PROGDIV_CFG = 16'hA1A2; //divider 20
  localparam [2:0]  MOD_TXOUTCLK_SEL = 3'b101;
  localparam        MOD_FBOOST = 1'b1;

Now, a word about the frequency counter: It’s a bit complicated because of clock domain issues, but what it does is to divide the clock under test by 4, and then count how many cycles the divided clock has during a period of FREQ_COUNT_WINDOW_IN DRP clocks. Which is hardcoded as 16000 clocks.

If we’ll trust the comment saying that PROGDIV is set to 20, it means that the frequency counter gets the CPLL clock divided by 20. It then divides this further by 4, and counts this for 16000 DRP clocks. Which is exactly the formula given in Table 3-1 of PG182.

Are we having fun?

Reader Comments

Thanks!

I always thought that the CPLL calibration block was only needed for the engineering samples, but after checking UG576 and PG182, it seems to me that the GTH in UltraScale+ has some bugs, and calibration is required.

In some cases, for example, HDMI, it might be difficult for FPGA to know the expected CPLL rate. What I can think of is to provide a register to drive the period_in port, and let software to program it based on format.

Simon

#1 
Written By Simon on August 24th, 2020 @ 07:33

Add a Comment

required, use real name
required, will not be published
optional, your blog address