Thursday, December 3, 2015

Monitoring TLPs

So far; so good. I implemented the first part of my TLP master plan. The test that I ran is rather simple: I have a version of the titan_wiggle FPGA project with the Lattice's Reveal logic analyzer installed and connected to the PCIe core's receive TLP interface (Fig 1). Titan is installed in a PC running Linux and Shaun's pyhwtest utility.
Figure 1. Block Diagram of the Titan FPGA Design Using Reveal to Capture TLP messages.

pyhwtest is a great little utility. With it I can access the memory space on a PCIe card using python. Everything from simple read/writes to DMA transactions work. With pyhwtest I don't have to write a kernel driver. To use it, all that is required is to find the base address for the BAR that I want to access. Listing 1 (below) shows the output of lspci: BAR0 is mapped to 0xDC000000.


 [root@localhost src]# lspci -d dead:* -v  
 04:00.0 Non-VGA unclassified device: Indigita Corporation Device beef (rev 06)  
     Flags: bus master, fast devsel, latency 0, IRQ 10  
     Memory at dc000000 (32-bit, non-prefetchable) [size=256]  
     Memory at d8000000 (32-bit, non-prefetchable) [size=64M]  
     Capabilities: [50] Power Management version 3  
     Capabilities: [70] MSI: Enable- Count=1/1 Maskable- 64bit+  
     Capabilities: [90] Express Endpoint, MSI 00  
     Capabilities: [100] Device Serial Number 00-00-00-00-00-00-00-00  
 [root@localhost src]#  
Listing 1. Linux Terminal Output Showing the Output from lspci.


Listing 2 shows the commands required to write 0x12345678 to BAR0 (address 0xDC000000), and Figure 2 shows the Reveal capture that results. It's a good sign that the data written in pyhwtest shows up in the capture. I built a simple spreadsheet to decode the TLP packet (Table 1). From here I'll start thinking about how best to decode the TLP packets in the FPGA and the best way to handle data flow to and from the PC.


 [root@localhost refresh_test]#  
 [root@localhost refresh_test]# python -i titan_test.py -b 0xdc000000  
 Base address: 0xdc000000L  
 >>> hwtest.writelw(BASE, le32_to_be(0x12345678))  
 >>>  
 [root@localhost refresh_test]#  
Listing 2. Python Terminal Output Showing a Data Write to the Base Address of BAR0.


Figure 2. Reveal Capture of the Data Write TLP Message.


H0
RFMTTypeRTCRTDEPAttrRLength
01000000000000000001000000000001
H1
Requester IDTagLast BEFirst BE
00000000000000000000000100001111
H2
AddressR
11011100000000000000000000000000
D0
Data
00010010001101000101011001111000
Table 1. Decode of Captured RX TLP Message.

Wednesday, November 4, 2015

Pushing TLPs

Now that the physical interfaces on Titan have shown to work, the fun part begins. I've given some thought to the firmware framework that Titan needs. In the simplest terms, I want to be able to control and test all of Titan's interfaces from a PC (via PCIe). Developing this firmware will require interfacing to Lattice's PCIe core, the DDR3 core, and a bit of logic glue here and there. I also want to be able to simulate the entire design. A complete simulation environment will allow me to "crack open" the hood and locate bugs much faster than with hardware-only testing.

Lattice's PCIe core presents TLP (transaction layer packets) to the user side of the PCIe core. In past I've used higher level bridges to avoid dealing with TLPs directly. Xilinx has cores available that provide a bus interface to the user (ie. AXI or PLB). Lattice doesn't have a higher level core, but they do have an example of a TLP to Wishbone bridge in the firmware for the ECP3 and ECP5 Versa cards.

While I was tempted to try the TLP to Wishbone bridge, I decided that building firmware to consume TLPs directly had two advantages: It will likely be smaller in the FPGA, and it will give me a chance to understand PCIe transactions better than I have before. How can I pass up a chance to dive a little deeper into PCIe?

The implementation plan is summarized in the Figure 1 block diagram. Then idea is to build two state machines to control the flow of TLP messages between the PCIe core and the registers or FIFOs. One state machine will handle the RX (receive) TLP messages while the second will handle the TX (transmit) messages. The registers will be used for simple interfaces such as the GPIO pins. The FIFOs will handle the buffering and clock domain transition required to interface to DDR3.

Figure 1. Block Diagram of the Titan FPGA Design Connected to a PC.

To simulate the PCIe core, I'm building a testbench where a second PCIe core is instantiated as a stand-in for the PC. I got the idea to connect two PCIe cores together from the Lattice PCIe Endpoint User's Guide. Using a core to test itself would be a bad idea if I was designing the PCIe core itself. Since I'm just implementing a user-side interface, I'm OK trusting that Lattice meets the PCIe SIG standards. Initially this will allow me to write some simple state machines to control the simulation PCIe (Figure 2). Eventually I'll abstract the control interface to the simulation-side PCIe core to allow higher level interface; This will likely be implemented in SystemVerilog, but I'm also considering a python-based interface using MyHDL.


Figure 2. Block Diagram of the Titan FPGA Design Connected to Another PCIe Core Acting as a BFM

So, where to start? I've been reviewing documentation of the structure of TLPs. While TLP isn't overly complex, a complete implementation to handle every possible transaction will take awhile to build. To speed things up a bit I decided to focus my initial work on the most relevant TLPs. My plan is to install Titan in a PC running Linux and use the pyhwtest to send write and read messages. Inside the FPGA, Lattice's Reveal analyzer will be instantiated so I can capture the TLPs received for various commands from Linux. See Figure 3.

I'll use the captured data to design simple synthesizable FSMs (finite state machines) to decode and act on the TLP messages captured in Reveal. Once I have that working, I'll design some simulation FSMs to generate the same messages as the Reveal captured messages. Together, these FSMs will comprise a starting point for the BFM (bus functional model) and Titan simulation of the PCIe link.

Figure 3. Block Diagram of the Titan FPGA Design Using Reveal to Capture TLP messages.

Thursday, October 1, 2015

DesignCon 2016

I purchased my pass for DesignCon 2016 today! Tomorrow is the last day to get the Super Early Bird Special. I'm excited about the trip, and hopefully I'll see some of you out there!

Sunday, September 20, 2015

DDR3 testing

I've been building the simulation environment for Titan lately. For each core that I added to my design, the Lattice tools generate a testbench. I've been working on a unified testbench for Titan that references portions of the Lattice core testbenches. By referencing the Lattice cores test code rather than simply copying them, I can keep my source on GitHub clean.

My initial simulation focus has been on DDR3. I chose DDR3 since it is the last hardware subsystem on Titan that has not been validated. I built a simple state machine that wrote data to two different addresses then read it back. From this simple test it looks like DDR3 is working. Figures 1-6 (below) show the simulation output. Figures 7-9 show the output from the Reveal Logic Analyzer on Titan.

I still need to take some measurements to validate that the signal integrity on the PCB is good, but for now it's good to know that the last subsystem on Titan (DDR3) is functioning.

Figure 1. Write to Address 0x0001400 Marked (Data is 0x1AAA2AAAA3AAA4AAA Followed by 0xE555D555C555B555).

Figure 2. Write to Address 0x0001500 Marked (Data is 0x0123456789ABCDEF Followed by 0xFEDCBA987643210).

Figure 3. Read from 0x0001400 Marked Showing First Word (0x1AAA2AAAA3AAA4AAA).

Figure 4. Read from 0x0001400 Marked Showing Second Word (0xE555D555C555B555).

Figure 5. Read from 0x0001500 Marked Showing First Word (0x0123456789ABCDEF).

Figure 6. Read from 0x0001500 Marked Showing Second Word (0xFEDCBA9876543210).

Figure 7. Overall View Showing DDR3 Test in Hardware (Reveal Analyzer).

Figure 8. Close-Up View Showing DDR3 Data Read from 0x0001400 in Hardware (Reveal Analyzer).

Figure 9. Close-Up View Showing DDR3 Data Read from 0x0001500 in Hardware (Reveal Analyzer).

Thursday, July 16, 2015

Third (and fourth) hand


Am I the only one who gets excited by stuff like this?

Figure 1. Using Hobby Creek's Third Hand to Probe Titan.

Wednesday, July 15, 2015

X-rays and phase shifts

Between taking a trip to see the WNT play in Canada and fighting a persistent bug on Titan, it's been a busy summer.

The initial build and board testing when so well, I was surprised when I started having intermittent problems with the USB JTAG circuit. After looking into USB signal quality, the FT2232H circuit on Titan, and tool issues, I decided that the principle problem was with my PC. I was using Macbook that ran Windows 7 virtually, and a native Windows box seemed more stable.

Unfortunately, when I started to test the SPI flash, I encountered intermittent issues again. The errors varied from failure to identify the SPI flash to intermittent boot failures. While I was at the World Cup, Kevin graciously agreed to debug the problem. He decided to focus on the SPI booting problem exclusively, since it was limited in scope and excluded potential issues with the PC, USB, or JTAG. He confirmed that Titan does not have any power sequencing mistakes and that the part should be booting from flash.

After checking with Lattice, I decided to run another test that validated that the SPI flash was not being accessed too quickly; The SPI flash that I currently have installed cannot be read for 30us after it's VCC (supply voltage) rises to its minimum operating voltage. I probed the SPI flash to monitor the delay between the 3.3V rail going active and the first SPI_CSn (chip select) access. The time before the first SPI access was 16ms which is well outside the 30us requirement.

While monitoring the SPI transactions, I noticed something odd. SPI_CSn and SPI_MOSI were always present, but SPI_CLK was missing at times. This meant that the ECP5 was correctly entering SPI boot mode, and a single signal was the culprit. The only plausible explanation for SPI_CLK disappearing was a connection problem beween the ECP5 and the SPI flash. Since PCBs from PCB-Pool are electrically tested, I began to suspect a solder joint issue. I tried squeezing the ECP5 aganst the PCB and SPI_CLK appeared (Figures 1-3).

I took Titan to a local contract manufacturer and had them X-Ray the ECP5, replace it, and then X-Ray the newly installed part. As Figures 4-5 show, the ECP5 was twisted ever so slightly. I had a similar issue in the past, and I updated my manual reflow profile to prevent it. Apparently I was only preventing gross twisting, but this more subtle twist was a lot harder to detect without an X-Ray.

This demonstrates the problem of working on a new design while also developing a process for prototyping it. I have improvements coming that should eliminate this problem. A colleague is assembling a low volume pick and place machine that I ordered, and I have a controller on the way for my oven. I'll post more about these soon. This was a frustrating problem, but now Titan is almost validated!

Figure 1. Failed SPI Boot.

Figure 2. Failed SPI Boot Attempt with Thumb Pressure on the ECP5.

Figure 3. Successful SPI Boot with Thumb Pressure on the ECP5.

Figure 4. X-Ray Image of the ECP5 IC as Originally Installed.

Figure 5. Close-up of Figure 4 with a PCB Pad Marked in Blue and a BGA Ball in Red.

Figure 6. X-Ray Image of the Newly Installed ECP5.

Thursday, May 7, 2015

C.H.I.P.

I've been very interested in using low cost ARM platforms for my test computers and Linux servers in my home and lab. So far I've got two Raspberry Pi 2 boards as well as a Hummingboard-i2ex from SolidRun. The Hummingboard-i2ex is especially useful for me since it has a PCI Express interface that I need to validate Titan (Figure 1).

Figure 1. Titan and Hummingboard Getting Friendly.

I've never supported a KickStarter project before, but the C.H.I.P. project is a single core ARM board for $9. I thought that a Raspberry Pi was a steal for $35, but the C.H.I.P. is $26 cheaper and it includes flash (eMMC).