News:

Attention: For security reasons,please choose a user name *different* from your login name.
Also make sure to choose a secure password and change it regularly.

Main Menu

TE820 significant performance difference between petalinux versions

Started by jhane, January 09, 2021, 08:46:09 PM

Previous topic - Next topic

jhane

Hello,

  We are using a TE820 SOM with a customer carrier board.  Our original dev platform was an Ultra96 running petalinux 2018.3 and I ported the design to the Trenz SOM and petalinux 2020.1(note a 2020.1 version has not been released by trenz)
  We are working on a custom display pipeline and we did some benchmarking and found a big difference using glmark2 between the ultra96 and the 820.  So we tried the stress-ng benchmark and again we see a big difference.
  We originally started with the 821 and switch to the 820 because of availability so discovered that the 820 had a 2018.3 release.  I created an SD card for the 820 and a 703 carrier board and ran stress-ng on this and on the 2020.1 version on our custom carrier and we see significant difference there also.

stress-ng --cpu-method all --metrics-brief --perf -t 5m --cpu  0

custom carrier w/2020.1
stress-ng: info:  [946] stressor       bogo ops   real time    usr time    sys time     bogo ops/s   bogo ops/s
stress-ng: info:  [946]                                     (secs)        (secs)        (secs)        (real time)     (usr+sys time)
stress-ng: info:  [946] cpu             43493       300.09       855.27       166.22       144.93          42.58

trenz-carrier w/2018.3
stress-ng: info:  [946] stressor       bogo ops   real time    usr time    sys time     bogo ops/s   bogo ops/s
stress-ng: info:  [946]                                     (secs)         (secs)        (secs)        (real time)     (usr+sys time)
stress-ng: info:  [2357] cpu               58560    300.10       1199.72     0.06           195.14        48.81

We also ran the glmark2-es2-drm on the ultra96 and our custom board.  The ultra96 w/2018.3 could do close to 300 FPS with a frametime ~3ms and on the 820 we can only get 37 FPS with a frametime ~27ms

Just trying to get a handle on what might be going on here.   Since the CPU benchmark looks good on 2018.3 on the trenz carrier I don't believe it's some kind of strange HW issue.  There is a big kernel jump from 4.14 to 5.4 when going from 2018.3 to 2020.1.    Also, I did try the 2019.2 build and the CPU benchmark looks similar to the 2018.3.  2019.2 is a 4.19 kernel.

Has anybody seen anything like this?   I know trenz has not release a 2020 image yet but wanted to see if it is in the works.  At this point 2020.* is not looking very usable.

thanks,
jeff


JH

Hi,

sorry I've not much experience with these benchmarks and we didn't make any performance tests.
If I understand you correctly you performance difference on TE0820 with 18.3 and 20.2 and the second different is between U96 (with 18.3) and TE0820 (with 18.3 ).
Did you test U96 with 20.1?

Which assembly version of TE0820 did you use?  --> I need to know this to see the different between U96 and TE0820 SoC device.
PS should be nearly the same (In case you has also bough a EG).

You display pipeline is realised on PL, or? Did you checked Timings? Maybe you has some timing problems? So your PL design slow down the design. Can you check you PL IPs works correctly.
Did you compare PS setup between U96 and your TE0820 design? Did you checked difference on petalinux configuration between these 2 boards?

Maybe you should also write to the Xilinx forum, the different between 20.1 and 18.3 sound like a more general issue, maybe you get faster help there.



PS: we starting updating all our board to 20.2, but this will takes a while. In the most cases even Vivado/Petalinux versions are more stable then odd. Even versions are more used for bugfixes. So I mostly suggests to use only even Vivado versions.

br
John


jhane

Hello,
  Sorry for taking so long.  This took a bunch of time to figure out and test.   We also discovered there was a big difference using the glmark2, graphics benchmark, which is also a big issue for use.   

   Yes, I did find that the ultra96 had released a 2020.1 BSP so I was able to start using that as a reference.  The ultra96 is a xczu3eg-sbva484-1-e.  The trenz SOM that we have is trenz.biz:te0820_3eg_1e:part0:2.0.    The ultra96 has both higher CPU and GPU benchmarks.

   I took the trenz reference 2018.3 fpga design and updated it to 2020.1 and was able to build a 2020.1 petalinux image from my tree by just modifying the device-tree.   As it turns out,  the CPU benchmark was as fast as the ultra96 but the GPU was still much slower. 

  For the CPU, initially I thought this was due to the scheduler clock running at 33MHz instead of 100, but changing that didn't make any difference. Eventually I was able to figure out that the issue had to do with the WIFI module we are using.   We are using a cypress WIFI module, and found that just the loading of the kernel modules(not actually connecting) will make the stress-ng score drop from 195 to 130 bogoflops. 

  The glmark2 benchmark was getting a value of 33 vs 200+ on the ultra96.   Eventually, in the output clocks, I configured all the PLL sources to match the PLL source I saw in the ultra96 config.  Note, that the output/actually frequency did not or had very minimal change, so in theory this shouldn't really matter.  However, the glmark2 result jumped from 33 to 218 which is about what we saw on the ultra96.  So there is something strange going on here. 

Agreed about the odd/even releases.  But we started with an 821 which only had 2019.2 and I decided to make the jump myself to 2020.1 since it had a new kernel and lots of vitis fixes. 

jeff


JH

Hi,
thanks for sharing this results.

QuoteThe glmark2 benchmark was getting a value of 33 vs 200+ on the ultra96.   Eventually, in the output clocks, I configured all the PLL sources to match the PLL source I saw in the ultra96 config.  Note, that the output/actually frequency did not or had very minimal change, so in theory this shouldn't really matter.  However, the glmark2 result jumped from 33 to 218 which is about what we saw on the ultra96.  So there is something strange going on here. 

This sounds strange, maybe there are some driver problems together with PS setup.
Maybe you should also share your experience in the Xilinx post, this sounds like a general ZynqMP problem with the drivers and benchmarks.

br
John