Hello,
Sorry for taking so long. This took a bunch of time to figure out and test. We also discovered there was a big difference using the glmark2, graphics benchmark, which is also a big issue for use.
Yes, I did find that the ultra96 had released a 2020.1 BSP so I was able to start using that as a reference. The ultra96 is a xczu3eg-sbva484-1-e. The trenz SOM that we have is trenz.biz:te0820_3eg_1e:part0:2.0. The ultra96 has both higher CPU and GPU benchmarks.
I took the trenz reference 2018.3 fpga design and updated it to 2020.1 and was able to build a 2020.1 petalinux image from my tree by just modifying the device-tree. As it turns out, the CPU benchmark was as fast as the ultra96 but the GPU was still much slower.
For the CPU, initially I thought this was due to the scheduler clock running at 33MHz instead of 100, but changing that didn't make any difference. Eventually I was able to figure out that the issue had to do with the WIFI module we are using. We are using a cypress WIFI module, and found that just the loading of the kernel modules(not actually connecting) will make the stress-ng score drop from 195 to 130 bogoflops.
The glmark2 benchmark was getting a value of 33 vs 200+ on the ultra96. Eventually, in the output clocks, I configured all the PLL sources to match the PLL source I saw in the ultra96 config. Note, that the output/actually frequency did not or had very minimal change, so in theory this shouldn't really matter. However, the glmark2 result jumped from 33 to 218 which is about what we saw on the ultra96. So there is something strange going on here.
Agreed about the odd/even releases. But we started with an 821 which only had 2019.2 and I decided to make the jump myself to 2020.1 since it had a new kernel and lots of vitis fixes.
jeff