Combining PCIe and Gigabit Transceiver on Cyclone V

This post was written by eli on April 30, 2018
Posted Under: FPGA,Intel FPGA (Altera)

Overview

The goal: Using one of the PCIe transceivers for something else on a Cyclone V GT FPGA Development Kit Board, while keeping the PCIe link (narrowing it down from 4x to 2x). This allows allocating some other logic to the transceiver that goes to the PCIe finger, and do the bifurcation with a PCIe extender cable (and some soldering).

In the relevant project, the PCIe block was implemented as a QSys unit, accommodating Altera’s reset and calibration IP blocks.

Use the same controller clock

The first attempt was to just create and instantiate a Cyclone V Transceiver Native PHY IP Core, and place its pins instead of one of the PCIe lanes. But adding a plain MGT transceiver to the design caused the fitter to fail with the following error messages, whether it was supposed to be placed in the same transceiver group of six or not:

Error (14566): The Fitter cannot place 2 periphery component(s) due to conflicts with existing constraints (2 HSSI PMA Aux. block(s)). Fix the errors described in the submessages, and then rerun the Fitter. The Altera Knowledge Database may also contain articles with information on how to resolve this periphery placement failure. Review the errors and then visit the Knowledge Database at https://www.altera.com/support/support-resources/knowledge-base/search.html and search for this specific error message number.
    Error (175001): The Fitter cannot place 1 HSSI PMA Aux. block, which is within Cyclone V Transceiver Native PHY native_xcvr.
        Info (14596): Information about the failing component(s):
            Info (175028): The HSSI PMA Aux. block name(s): native_xcvr:native_xcvr|altera_xcvr_native_av:native_xcvr_inst|av_xcvr_native:gen_native_inst.av_xcvr_native_insts[0].gen_bonded_group_native.av_xcvr_native_inst|av_pma:inst_av_pma|av_tx_pma:av_tx_pma|av_tx_pma_ch:tx_pma_insts[0].av_tx_pma_ch_inst|tx_pma_ch.tx_pma_buf.tx_pma_aux
        Error (178014): Partition assignments may be preventing transceiver placement - transceivers optimizations across partitions are not supported in this version of the Quartus Prime software. For more information, refer to the Release Notes.
    Error (175001): The Fitter cannot place 1 HSSI PMA Aux. block, which is within Cyclone V Transceiver Native PHY native_xcvr.
        Info (14596): Information about the failing component(s):
            Info (175028): The HSSI PMA Aux. block name(s): native_xcvr:native_xcvr|altera_xcvr_native_av:native_xcvr_inst|av_xcvr_native:gen_native_inst.av_xcvr_native_insts[0].gen_bonded_group_native.av_xcvr_native_inst|av_pma:inst_av_pma|av_rx_pma:av_rx_pma|rx_pmas[0].rx_pma.rx_pma_aux
        Error (178014): Partition assignments may be preventing transceiver placement - transceivers optimizations across partitions are not supported in this version of the Quartus Prime software. For more information, refer to the Release Notes.
Error (12289): An error occurred while applying the periphery constraints. Review the offending constraints and rerun the Fitter.

The error messages are horribly misleading, and made me try pushing the transceiver into the QSys project that defines the PCIe interface. Fortunately or unfortunately, I fixed the problem accidentally (or more like out of laziness) and got the idea that it was indeed a matter of structure and partition. See “Leftovers” section below for what I tried and eventually failed.

So the real reason it failed: There are these reconfiguration and reset generating IPs, which are driven by some clock (mgmt_clk on the reconfiguration block), which isn’t necessarily any of the transceiver’s data rate reference clocks. If the PCIe’s reconfiguration IP is driven by a clock that is different from the other transceiver’s, this fitter error occurs.

The solution was to use PCIe’s reference clock for driving the reconfiguration logic of the new transceiver. It could have been the other way around as well, I suppose. Either way, it’s just a utility clock.

Narrow down PCIe to 1x

The next obstacle was that both the PCIe and the general-purpose transceiver relied on two different dedicated reference clock inputs for their PLLs, but with different bit rates. This calls for two different CMUs (PLLs). And according to the Handbook (CV-5V3, figures 2-2 and 2-7), there’s a thing with only the CMUs of CH1 and CH4 being connected directly to these reference clocks. So without diving to deep into the clocking structure, it became apparent that the only way to solve this PLL placement problem was to narrow down the PCIe interface further down to 1x, so it didn’t occupy the CH1 transceiver. I suppose that freed its CMU, and allowed it to generate the clock for some other transceiver.

Also in CV-5V3, figure 4-5 shows that on a PCIe 2x or x4 configuration, CMU PLL is placed in CH4, which is inactive as a transmitter. This goes along with a comment made a page earlier, saying that “The Quartus II software automatically places the CMU PLL in a channel different from that of the data channels”.

And if this doesn’t sound very worked out from my side, it’s because I abandoned this direction and bought an HSMC breakout board instead. So I lost motivation to deal with this issue. Too much hassle.

Leftovers (or: some failed attempts)

Nothing really meaningful here: The jots below were written in my attempts to figure out how and why partitions made any difference (in hindsight: they don’t).

I first tried to put the transceiver as an IP inside the Qsys that holds the PCIe IPs. Which requires some acrobatics, as described in this post. That happened to work, because I used the same clock for both reconfiguration block (and accidentally hit gold without realizing it). Sometimes laziness is a blessing.

Encouraged by the false “I nailed it” feeling, I tried to include the Verilog and SystemVerilog sources of the QSys design, so they’ll be one chunk so to speak, but then the error came back.

One thing that I noted in the map report is that it said:

Info (16010): Generating hard_block partition "hard_block:auto_generated_inst"

So I tried this between synthesis and fitting (merging partitions should do it…?)

$ quartus_cdb theproj --merge=on

Not only did this not make any difference, but the merge report that was generated showed that all HSSI related logic (PMA / PCS elements etc.) was all in the hard_block:auto_generated_inst partition, and none outside it.

Apparently, this hard_block partition is some kind of container for the transceiver-related logic block, and nothing else. Harmless, it seems.

Add to the list of failing attempts: Export the entire design as a post-synthesis QXP, and then run it through quartus_map with only this QXP as a source, followed by the fitter. Which failed the same way. Actually, the same message on generating the hard_block partition appeared in the map report adopting the QXP file.

Also, when going (from shell, then Tcl shell):

$ quartus_sta -s
tcl> project_open theproj
tcl> create_timing_netlist
tcl> report_partitions

The report implied that only the “Top” partition exists (on a design without the added transceiver, or there would be no timing netlist to work with).

Which left me wondering what’s the magic about putting the transceiver inside the QSys design? The answer is of course, none whatsoever. It had nothing to do with partitioning in the first place.

Reader Comments

Hey,

I’m a big fan of your blogs. I actually am trying to get PCIe x2 and Transceivers for SGMII to work on Cyclone V. It seems like you were not successful, but is it possible if you have more details, information that you could email me a bit more.

For me, I have to get this to work somehow, and would like all the help I can get.

#1 
Written By Arti on August 4th, 2018 @ 02:30

Hello,

As mentioned above, I didn’t pursue this direction, so there’s no additional info to offer.

#2 
Written By eli on August 4th, 2018 @ 08:50

Hello, I was able to get it to compile with those transceivers. You probably already know about this and it probably didn’t fix your issue, but I want to tell you the solution that worked for me.

In qsys for the TSE IP, it allows me to choose a TX PLL Clock network. The default x1 or selecting xN which allows the TX PLL to be placed inside or out of the six-pack.

For me it worked because TSE IP is what I need and it happens to have that configuration available.

#3 
Written By Arti on August 31st, 2018 @ 20:28

Add a Comment

required, use real name
required, will not be published
optional, your blog address