I designed my own HyperRAM controller for the TE0890 Spartan-7 mini FPGA module.
The code for my HyperRAM controller and test driver is on Github:
https://github.com/jorisvr/te0890-utils/tree/master/hyperram_testThe whole thing is working very nicely, except that I see short bursts of data corruption approximately once every 10 to 20 hours. I have tried everything I could think of to find the source of these issues, but I just can't figure it out.
I'm not using the BlackMesaLabs hyperam Verilog core, because it only supports dword-level access while I want byte-level write enables, and because the Verilog code only works up to 80 MHz while I want to run at 100 MHz.
So I designed my own controller in VHDL. It operates the HyperRAM at 100 MHz, which should be the maximum supported frequency of the device. My test consists of a simple but intensive march test with varying data patterns (this is sometimes called "moving inversions"). The whole thing seems works
almost flawlessly. It correctly handles repeated runs of the test pattern for many hours. But approximately once every 10 to 20 hours, the test detects a burst of between 1 and 4000 errors, then continues to run for hours again without errors.
Based on the error patterrns, I suspect the corruption occurs in the write data path and perhaps sometimes in the address, but not in the read data path. However it is really difficult to determine this with my current test method.
I tried shifting the clock phases used to drive data to the HyperRAM and to capture data from the HyperRAM. This confirms that I have at least 30 degrees margin in both directions before error rates increase significantly. I therefore find it unlikely that the data corruption is caused by something like setup/hold time violations on the HyperRAM interface.
I tried relaxing the timing configuration of the HyperRAM. At 100 MHz it should support tACC=4 cycles, tRWR=4 cycles. I tried running at default tACC=6 cycles, fixed 2x access latency, and tRWR=6 cycles. I still get data corruption in that configuration.
I tried running at 80 MHz instead of 100 MHz, but I still see data corruption.
I'm currently testing at 50 MHz and no corruption yet, but the errors are so infrequent that I will have to test for several days to be sure.
However I really want to run the RAM at 100 MHz and I believe it should be possible.
At this point I don't know how to debug this any further.
It is remarkable that the errors are extremely rare, but burst-like in nature (no errors for many hours, then hundreds of errors in a fraction of a second). This suggests to me that some aspect of the system is intermittently unstable.
Could the MMCM lose phase lock? - Why would it do that?
Could the signal amplitude of the HyperRAM interface drop below the noise margin? - What could cause that?
Is this just the best that HyperRAM can do? - But then it is basically unusable without ECC.
I'm powering my TE0890 module from the USB bus of my computer. Not the most low-noise supply, but I think it should be good enough.
Question 1: Does anybody have any clue what might be going on here?
Question 2: Does anybody have experience with the TE0890 HyperRAM? I'm interested in success stories, similar problems, different problems.
Question 3: Is anybody willing to run my HyperRAM test on their own TE0890 for a few days. The HyperRAM controller and test design are in my Github, linked above. Note that the expected error rate is extremely low, so the test may need to run for many hours to draw conclusions.