Update laysim-GR740 v0.11 & Real-Time Performance & Cycle Accuracy (II)
Update laysim-GR740 v0.11 & Real-Time Performance & Cycle Accuracy (II)
laysim-GR740 is updated to v0.11 in order to enhance the real-time performance under the SMP environment which utilizes almost 100% CPU load of GR740.
The performance of laysim-gr740-dbt v0.11 is improved by 23% on average, and the real-time performance of SMPLOCK01 is improved by 35.85% compared to v0.10 for example.
Despite the performance improvement of laysim-gr740-dbt, it still doesn't satisfy the performance of GR740 under the SMP environment.
KARI/FSW team is developing a new processor emulator which based on multi-thread framework with DBT to satisfy the real-time performance of GR740 multicores. It will allocate each core of the GR740 processor to each core of x64 respectively, and achieve maximum performance with minimal synchronization among each cores. (But it will require at least 6 cores of x64 host.)
1. Test Environment
The performance and cycle accuracy of laysim-GR740 were measured on Intel i7-6700K @4.00GHz CentOS 7 64bit (4 cores and 8GB RAM) under VMware Workstation 15 Pro. And Its performance and cycle accuracy were compared with GR740 development board and TSIM3-LEON4 3.0.0 evaluation version. Since TSIM3-LEON4 3.0.0 evaluation version from Gaisler only operates as FAST UART, some codes are modified to block UART output in order to exclude the cycle effect of UART model.
2. Dhrystone Performance & Cycle Accuracy
Dhrystone benchmark with 1000000 was compiled -O3 option using BCC-2.1.0 without UART output.
1) Single Dhrystone
The raw MIPS of laysim-gr740-dbt v0.11 shows 469 MIPS with 100.23% cycle accuracy.
Table 1. Single Dhrystone Performance & Cycle Accuracy
2) Two Instances of Dhrystone
When two instances of Dhrystone were executed as AMP application on laysim-gr740-dbt v0.11, it shows 488.35 raw MIPS and its real-time performance is 126.93%. It can meet the real-time performance of GR740.
Table 2. Two Instances of Dhrystone Performance & Cycle Accuracy
3) Three Instances of Dhrystone
When three instances of Dhrystone were executed as AMP application on laysim-gr740-dbt v0.11, it shows 491.02 raw MIPS and its real-time performance is 86.73%. It cannot meet the real-time performance of the GR740.
Table 3. Three Instances of Dhrystone Performance & Cycle Accuracy
4) Four Instances of Dhrystone
When four instances of Dhrystone were executed as AMP application on laysim-gr740-dbt v0.11, it shows 471.93 raw MIPS and its real-time performance is 63.73%. It cannot meet the real-time performance of the GR740.
Table 4. Four Instances of Dhrystone Performance & Cycle Accuracy
5) Overall RAW MIPS of Dhrystone
Figure 1 shows the overall RAW MIPS of Dhrystone on GR740, laysim-gr740, laysim-gr740-dbt v0.10/v0.11 and TSIM3-LEON4 evaluation. The peak raw MIPS of laysim-gr740-dbt is 491.02 raw MIPS, and it can reach 471.93 raw MIPS on four cores.
6) Overall Real-Time Performance & Cycle Accuracy of Dhrystone
Figure 2 shows the overall real-time performance and cycle accuracy of Dhrystone on GR740, laysim-gr740, laysim-gr740-dbt v0.10/v0.11 and TSIM3-LEON4 evaluation. When a single core of GR740 only works, laysim-gr740-dbt outperforms the real-time performance of GR740. And when two cores of GR740 are running, laysim-gr740-dbt shows 126.93% of the real-time performance of GR740. However, when three or more cores of GR740 are running, laysim-gr740-dbt shows approximately 87.73%/63.33% of the real-time performance of GR740.
Figure 2. Overall Real-Time Performance & Cycle Accuracy of Dhrystone
3. Stanford Performance & Cycle Accuracy
Stanford benchmark was compiled -O3 option using BCC-2.1.0 without UART output.
1) Single Stanford
The raw MIPS of laysim-gr740-dbt v0.11 shows 366.21 MIPS with 99.55% cycle accuracy.
Table 5. Single Stanford Performance & Cycle Accuracy
2) Two Instances of Stanford
When two instances of Stanford were executed as AMP application on laysim-gr740-dbt v0.11, it shows 389.36 raw MIPS and its real-time performance is 107.83%. It can meet the real-time performance of GR740.
Table 6. Two Instances of Stanford Performance & Cycle Accuracy
3) Three Instances of Stanford
When three instances of Stanford were executed as AMP application on laysim-gr740-dbt v0.11, it shows 395.28 raw MIPS and its real-time performance is 74.66%. It cannot meet the real-time performance of the GR740.
Table 7. Three Instances of Stanford Performance & Cycle Accuracy
4) Four Instances of Stanford
When four instances of Stanford were executed as AMP application on laysim-gr740-dbt v0.11, it shows 388.13 raw MIPS and its real-time performance is 56.06%. It cannot meet the real-time performance of the GR740.
Table 8. Four Instances of Stanford Performance & Cycle Accuracy
5) Overall RAW MIPS of Stanford
Figure 3 shows the overall RAW MIPS of Stanford on GR740, laysim-gr740, laysim-gr740-dbt v0.10/v0.11 and TSIM3-LEON4 evaluation. The peak raw MIPS of laysim-gr740-dbt is 395.28 raw MIPS, and it can reach 388.13 raw MIPS on four cores.
6) Overall Real-Time Performance & Cycle Accuracy of Stanford
Figure 5 shows the overall real-time performance and cycle accuracy of Stanford on GR740, laysim-gr740, laysim-gr740-dbt v0.10/v0.11 and TSIM3-LEON4 evaluation. When a single core of GR740 only works, laysim-gr740-dbt outperforms the real-time performance of GR740. And when two cores of GR740 are running, laysim-gr740-dbt shows 107.83% of the real-time performance of GR740. However, when three or more cores of GR740 are running, laysim-gr740-dbt shows approximately 74.66%/56.06% of the real-time performance of GR740.
Figure 4. Overall Real-Time Performance & Cycle Accuracy of Standford
4. RTEMS 5.1.0-RC1 SMP testsuites Performance & Cycle Accuracy
Four examples of SMP testsuites of RTEMS 5.1.0-RC1 (SMPATOMIC01, SMPSCHEDAFFINITY05, SMPSCHEDEDF03, SMPLOCK01) which are modified to block UART output in order to exclude the cycle effect of UART model are executed on GR740, laysim-gr740, laysim-gr740-dbt and TSIM3-LEON4 evaluation. Four examples use all cores of GR740 and their CPU load is almost 100%.
Figure 5. Real-Time Performance & Cycle Accuracy of RTEMS 5.1.0-RC1 SMP testsuites
1) SMPATOMIC01
When the SMPATOMIC01 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 9 below.
The real-time performance of laysim-gr740-dbt v0.11 shows 75.04% with 100.77% cycle accuracy.
Table 9. Performance & Cycle Accuracy of SMPATOMIC01
2) SMPSCHEDAFFINITY05
When the SMPSCHEDAFFINITY05 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 10 below.
The real-time performance of laysim-gr740-dbt v0.11 shows 39.11% with 97.74% cycle accuracy.
Table 10. Performance & Cycle Accuracy of SMPSCHEDAFFINITY05
3) SMPSCHEDEDF03
When the SMPSCHEDEDF03 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 11 below.
The real-time performance of laysim-gr740-dbt v0.11 shows 29.75% with 100.52% cycle accuracy.
Table 11. Performance & Cycle Accuracy of SMPSCHEDEDF03
4) SMPLOCK01
When the SMPLOCK01 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 12 below.
The real-time performance of laysim-gr740-dbt v0.11 shows 72.75% with 100.2% cycle accuracy.
Table 12. Performance & Cycle Accuracy of SMPLOCK01
5. Conclusion and Future Work Again!
Although laysim-gr740-dbt which uses the dynamic binary translation (JIT) shows the high performance and correct cycle accuracy for GR740 processor, but it cannot satisfy the real-time performance of GR740 under the SMP environment. Its peak performance is about 75.04% and it shows 55.16% performance at average under the SMP environment which utilizes almost 100% CPU load of GR740.
A new GR740 processor emulator which adopting the multi-thread framework with DBT is being developed by KARI/FSW team. However, in the process of development, it is not easy to control each x64 core to synchronize emulated LEON4 core without loosing performance. But development will still continue!
댓글
댓글 쓰기