ASPM makes Spartan-6′s PCIe core miss TLP packets

This post was written by eli on June 27, 2011
Posted Under: FPGA,Linux kernel,PCI express

The fatal error

Let’s break the bad news: Spartan-6′s PCIe core may drop TLP packets sporadically when ASPM (Active State Power Management) is enabled. That means that any TLP given to the core for transmission can silently disappear, as if it was never submitted. I also suspect that the problem exists in the opposite direction.

Hardware involved: Spartan xc6slx45t-fgg484-3-es (evaluation sample version) on an SP605 evaluation board. That mounted on a Gigabyte G31M-ES2L motherboard, having the Intel G33 chipset and a E5700 3.0 GHz processor.

The fairly good news is that he core’s cfg_dstatus[2] ( = fatal error detected) will go high as a result of dropping TLPs. Or at least so it did in my case. So it looks like monitoring this signal, and do something loud if it goes to ’1′ is enough to at least know if the core does the job or not.

Let me spell it out: If you’re designing with Xilinx’ PCIe core, you should verify that cfg_dstatus[2] stays ’0′, and if it goes high you should treat the PCIe endpoint as completely unreliable.

How to know if ASPM is enabled

On a Linux box, become root and go lspci -vv. The output will include all devices, but the relevant part will be something like

01:00.0 Class ff00: Xilinx Corporation Generic FPGA core
 Subsystem: Xilinx Corporation Generic FPGA core
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
 Latency: 0, Cache Line Size: 4 bytes
 Interrupt: pin ? routed to IRQ 44
 Region 0: Memory at fdaff000 (64-bit, non-prefetchable) [size=128]
 Capabilities: [40] Power Management version 3
 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [48] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
 Address: 00000000fee0300c  Data: 4181
 Capabilities: [58] Express Endpoint IRQ 0
 Device: Supported: MaxPayload 512 bytes, PhantFunc 0, ExtTag-
 Device: Latency L0s unlimited, L1 unlimited
 Device: AtnBtn- AtnInd- PwrInd-
 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
 Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
 Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0
 Link: Latency L0s unlimited, L1 unlimited
 Link: ASPM L0s Enabled RCB 64 bytes CommClk- ExtSynch-
 Link: Speed 2.5Gb/s, Width x1

There we have it: I set up the device with an unlimited L0s latency, hence the BIOS configured the device to have an unlimited L0s latency, and this ended up with ASPM enabled.

What we really want is the output to end with something like:

Link: Latency L0s unlimited, L1 unlimited
 Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
 Link: Speed 2.5Gb/s, Width x1

The elegant solution

The really good news is that there is a simple solution: Disable ASPM. In other words, program the link partners to never reach the L0s nor L1 power saving modes. In a Linux kernel driver, it’s pretty simple:

#include <linux/pci-aspm.h>

pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
 PCIE_LINK_STATE_CLKPM)

This is something I would do without thinking twice for any device based upon Xilinx’ PCIe core. Actually, I would do this for any device for which power saving is irrelevant.

The maybe-working solution

In theory, the kernel can run in different ASPM policies, one of which is “powersave”. If it runs in “performance” all transactions to L0s are disabled, and all should be well. In practice, it looks like the kernel community is pushing towards allowing L0s even under the performance policy.

The shaky workaround

When some software wants to allow L0s, it must check if the switching latency from L0s to L0 (that is, from napping to awake) is one the device can take. The device announces its maximal allowed latency in the PCI Express Capability Structure. By setting the acceptable L0s latency limit to the shortest latency allowed (64 ns), one can hope that the hardware will not be able to meet this requirement, and hence give up on using ASPM. This trick happened to work on my own motherboard, but another motherboard may be able to meet the 64 ns requirement, and enable ASPM. So this isn’t really a solution.

Anyhow, the success of this method will yield an lspci -vv output with something like

Capabilities: [58] Express Endpoint IRQ 0
 Device: Supported: MaxPayload 512 bytes, PhantFunc 0, ExtTag-
 Device: Latency L0s <64ns, L1 <1us
 Device: AtnBtn- AtnInd- PwrInd-
 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
 Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
 Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0
 Link: Latency L0s unlimited, L1 unlimited
 Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
 Link: Speed 2.5Gb/s, Width x1

How I know it isn’t my own bug

The transitions from L0 to L0s and back throttle the data flow through the PCIe core, so maybe these on-and-offs exposed a bug in my own HDL code’s data flow? Why do I blame Xilinx?

The answer was found in the dbg_* debug lines supplied from within the PCIe core. These lines go high whenever something bad happens in the core’s lower layers. Running without ASPM these lines stayed zero. When ASPM was enabled, and in conjunction with packet drops, the following lines were asserted:

  • dbg_reg_detected_fatal: Well, I knew this already. A fatal error was detected.
  • dbg_reg_detected_correctable: A correctable error was detected. Nice, but I really don’t care.
  • dbg_rply_timeout_status: The replay timer fired off: A TLP packet was sent, but didn’t receive an acknowledgement. That indicates that things aren’t perfect, but if the packet was retransmitted, this doesn’t indicate a user-visible issue.
  • dbg_dl_protocol_status: Ayeee. This means that an out of range ACK or NAK was received. In other words, the link partners are not on the same page regarding which packets are waiting for acknowledgement.

The last bullet is our smoking gun: It indicates that the PCIe link protocol has been violated. There is nothing the application HDL code can do to make this happen. The two last bullets indicate some problem in the domain of a TLP being lost, retransmitted, and some problem with the acknowledge. Not a sign saying “a packet was lost”, but as close as one gets to that, I suppose.

Update: My attention to some interesting Xilinx Answer records was drawn in a comment below. Answer record #33871 mentions LL_REPLAY_TIMEOUT as the a parameter to fix, in order to solve a fatal error condition, but says nothing about packet dropping. It looks like this issue has been fixed in the official PCIe wrapper lately. This leaves me wondering whether people didn’t notice they lost packets, or if Xilinx decided not to admit that too loud.

Reader Comments

Eli,

Thanks for your posting; were experiencing problems with the S6 too and after reviewing your post it helped to focus in on an issue which may or may not be relevant to your issue.
The reason for posting here is that the errors we found were spookily similar to yours in that the Core was reporting EXACTLY the same correctable and fatal errors that you reported but the cause was different

Our board previously operated on lots of Motherboards with no issues but on some customer systems deploying large SuperMicro Xeon Motherboards a problem appeared when larger (256) payload sizes were used.

If you look through the Xilinx site you find that there is an issue with LL_REPLAY_TIMEOUT in the wrapper on their reference design which when changed according to AR39548 fixes the problem.

As I said this may or may not be related to your problem but as I said it results in exactly the same symptoms !!!!!

Note AR39548 does not mention this but I believe the LL_ACK_TIMEOUT needs to be changed too.

Regards

John

#1 
Written By John McLean on July 6th, 2011 @ 13:00

Eli,

Sorry quick follow-up – did not check the Xilinx website thoroughly enough

In AR33871 the LL_ACK_TIMEOUT is mentioned and it also related this to PM state so it could be an alternative way of fixing your problem

#2 
Written By John McLean on July 6th, 2011 @ 13:09

Thanks a lot for your comment. I’ve updated the post above.

#3 
Written By eli on July 6th, 2011 @ 16:50

This is an issue with the Virtex-5 Endpoint Block Plus for PCI Express core where turning on ASPM causes the core to cycle in and out of recovery periodically.

This is a known issue with the GTPs in the Virtex-5. The issue is thoroughly documented in UG341 in the “Known Issues” section. The two issues that may come up are titled: “Receive Path L0s Exit to L0″ and “Transceiver Exit from Rx.L0s”

The only workaround is to disable ASPM.

This issue has been resolved in 6-Series and 7-Series.

#4 
Written By Gareth on February 6th, 2013 @ 19:21

Add a Comment

required, use real name
required, will not be published
optional, your blog address