Wednesday, November 4, 2015

Pushing TLPs

Now that the physical interfaces on Titan have shown to work, the fun part begins. I've given some thought to the firmware framework that Titan needs. In the simplest terms, I want to be able to control and test all of Titan's interfaces from a PC (via PCIe). Developing this firmware will require interfacing to Lattice's PCIe core, the DDR3 core, and a bit of logic glue here and there. I also want to be able to simulate the entire design. A complete simulation environment will allow me to "crack open" the hood and locate bugs much faster than with hardware-only testing.

Lattice's PCIe core presents TLP (transaction layer packets) to the user side of the PCIe core. In past I've used higher level bridges to avoid dealing with TLPs directly. Xilinx has cores available that provide a bus interface to the user (ie. AXI or PLB). Lattice doesn't have a higher level core, but they do have an example of a TLP to Wishbone bridge in the firmware for the ECP3 and ECP5 Versa cards.

While I was tempted to try the TLP to Wishbone bridge, I decided that building firmware to consume TLPs directly had two advantages: It will likely be smaller in the FPGA, and it will give me a chance to understand PCIe transactions better than I have before. How can I pass up a chance to dive a little deeper into PCIe?

The implementation plan is summarized in the Figure 1 block diagram. Then idea is to build two state machines to control the flow of TLP messages between the PCIe core and the registers or FIFOs. One state machine will handle the RX (receive) TLP messages while the second will handle the TX (transmit) messages. The registers will be used for simple interfaces such as the GPIO pins. The FIFOs will handle the buffering and clock domain transition required to interface to DDR3.

Figure 1. Block Diagram of the Titan FPGA Design Connected to a PC.

To simulate the PCIe core, I'm building a testbench where a second PCIe core is instantiated as a stand-in for the PC. I got the idea to connect two PCIe cores together from the Lattice PCIe Endpoint User's Guide. Using a core to test itself would be a bad idea if I was designing the PCIe core itself. Since I'm just implementing a user-side interface, I'm OK trusting that Lattice meets the PCIe SIG standards. Initially this will allow me to write some simple state machines to control the simulation PCIe (Figure 2). Eventually I'll abstract the control interface to the simulation-side PCIe core to allow higher level interface; This will likely be implemented in SystemVerilog, but I'm also considering a python-based interface using MyHDL.


Figure 2. Block Diagram of the Titan FPGA Design Connected to Another PCIe Core Acting as a BFM

So, where to start? I've been reviewing documentation of the structure of TLPs. While TLP isn't overly complex, a complete implementation to handle every possible transaction will take awhile to build. To speed things up a bit I decided to focus my initial work on the most relevant TLPs. My plan is to install Titan in a PC running Linux and use the pyhwtest to send write and read messages. Inside the FPGA, Lattice's Reveal analyzer will be instantiated so I can capture the TLPs received for various commands from Linux. See Figure 3.

I'll use the captured data to design simple synthesizable FSMs (finite state machines) to decode and act on the TLP messages captured in Reveal. Once I have that working, I'll design some simulation FSMs to generate the same messages as the Reveal captured messages. Together, these FSMs will comprise a starting point for the BFM (bus functional model) and Titan simulation of the PCIe link.

Figure 3. Block Diagram of the Titan FPGA Design Using Reveal to Capture TLP messages.