Thursday, December 18, 2014

Stupid mistakes

Through my career I've had the pleasure of working with some intelligent and talented engineers and scientists. Along the way I've learned to listen when I'm given advice. I might decide not to heed their advice, but I always listen. The two most memorable nuggets of advice that I've received are:
  1. Don't be a candy ass
  2. Don't sweat the stupid mistakes, it's the smart mistakes that should worry you
Tonight I'm trying hard to remind myself of #2. The principle is simple; We all make silly mistakes, overlook things, or just forget something. As long as we identify our mistakes quickly and move on it shouldn't be a problem. The bad mistakes are ones where you focus on something for a long time and come to the wrong conclusion. These smart mistakes often mean that you really don't understand the problem.

While debugging our two rev B boards, we've seen several successes. Our external expansion headers are working, the power rails all work, and the USB programming circuit all work. Unfortunately we were not able to get the PCI Express interface to operate. We found three issues that prevented PCIe from working:
  1. From testing on the Lattice ECP5 PCI Express card, I showed that my implementation would only enumerate the PCIe interface on Diamond 3.2, not 3.3. This isn't too big a deal since I can just go back to 3.2 for now.
  2. We did not include LDOs for the SerDes (PCIe) transmitter and receiver. We were able to repurpose an LDO on Titan to provide the 1.2V as required by the ECP5 SERDES/PCS Usage Guide. This LDO wire-mod can be seen in the LED blinky video (below) on the left side of the ECP5. The mod worked well, and I verified that the noise level at the decoupling caps nearest to the ECP5 was similar to that measured on the Lattice's ECP5 card.
  3. The third issue keeping PCIe from working on Titan can be seen in the blinky video. Watch it again at 720p or higher as see if you can find it. I recommend watching it in full screen.


Did you see it? The engineering samples that we used on the rev B boards were part number LFE5U-85F-8BG381IES. I noticed this when I was building a DDR3 test project tonight. The correct part number is LFE5UM-85F-8BG381IES. What does the M mean? SerDes. We are testing parts without the SerDes (PCIe) interface included.

3 comments:

Tim 'Mithro' Ansell said...

The great thing about stupid mistakes is that computers are really awesome a detecting them. It is why we use things like lint tools when doing "Software Engineering".

I wish PCB design had more tools along those lines too. I'd love to be able to check in my newly updated board and have my CI tell me that I did something stupid. I guess the DRC is a form of that, but we could do so much better.

It's was excites me most about KiCad. The fact it's FOSS means these tools now have a chance to flourish.

Unknown said...

Good point. In general DRC testing has gotten better, but it is only as good as the rules that you enable.

In this case I'm really surprised that the Diamond tool didn't detect that I built my project for an LFE5UM and the board had an LFE5U installed. It seems like the programmer should have flagged the discrepancy.

Unknown said...

And there I was looking for some clue in the blinking LEDS....