Real-Time Performance & Cycle Accuracy of laysim-GR740
Real-Time Performance & Cycle Accuracy of laysim-GR740
In general, a processor emulator based on interpretation method cannot not satisfy the real-time performance of guest CPU when the guest CPU operates at 70MHz or higher. A processor emulator adopting dynamic binary translation (JIT) method can achieve its real-time performance to 500 MIPS at maximum (for laysim-LEON3-PRO), so it can meet the real-time performance of single core of GR740 processor which runs at 250MHz. However, a processor emulator based on DBT can never satisfy real-time performance of GR740 under SMP multicore environment.
When analyzing the performance of laysim-GR740 with various multi-core benchmarking programs such as NPB, SPEC 2012 OMP and RTEMS5-SMP, the real-time performance of laysim-gr740-dbt can achieve 46.64% performance at maximum and 30.87% performance at average. So KARI/FSW team is developing a new processor emulator which based on multi-thread framework with DBT to satisfy the real-time performance of GR740 multicores.
1. Test Environment
The performance and cycle accuracy of laysim-GR740 were measured on Intel i7-6700K @4.00GHz CentOS 7 64bit (4 cores and 8GB RAM) under VMware Workstation 15 Pro. And Its performance and cycle accuracy were compared with GR740 development board and TSIM3-LEON4 3.0.0 evaluation version. Since TSIM3-LEON4 3.0.0 evaluation version from Gaisler only operates as FAST UART, some codes are modified to block UART output in order to exclude the cycle effect of UART model.
2. Dhrystone Performance & Cycle Accuracy
Dhrystone benchmark with 1000000 was compiled -O3 option using BCC-2.1.0 without UART output.
1) Single Dhrystone
When the Dhrystone was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 1 below.
a) The raw MIPS of GR740 board is 193.73 MIPS.
b) The raw MIPS of laysim-gr740-dbt is 376.46 MIPS and it outperforms 1.94 times better than the real-time performance of GR740.
c) However, laysim-gr740 and TSIM3-LEON4 evaluation based on interpretation method do not satisfy the real-time performance of the GR740.
d) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 100%.
Table 1. Single Dhrystone Performance & Cycle Accuracy
Real-Time Performance (%) |
2) Two Instances of Dhrystone
Dhrystone was compiled as the AMP configuration using BCC-2.1.0. After two instances of Dhrystone were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 2 below.
a) The raw MIPS of GR740 board is 384.75 MIPS including CORE#0/#1.
b) The raw MIPS of laysim-gr740-dbt is 357.55 MIPS including CORE#0/#1 and its real-time performance is 92.93%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 6.7%.
d) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 100%.
Table 2. Two Instances of Dhrystone Performance & Cycle Accuracy
Real-Time Performance (%) |
3) Three Instances of Dhrystone
Dhrystone was compiled as the AMP configuration using BCC-2.1.0. After three instances of Dhrystone were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 3 below.
a) The raw MIPS of GR740 board is 566.16 MIPS including CORE#0/#1/#2.
b) The raw MIPS of laysim-gr740-dbt is 345.37 MIPS including CORE#0/#1/#2 and its real-time performance is 61%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 6.7%.
d) The cycle accuracy of laysim-GR740 is decreased to 98.09% and 97.64%.
e) The cycles of TSIM3-LEON4 are increased and its cycle accuracy is decreased to 124.10%.
Table 3. Three Instances of Dhrystone Performance & Cycle Accuracy
Real-Time Performance (%) |
4) Four Instances of Dhrystone
Dhrystone was compiled as the AMP configuration using BCC-2.1.0. After four instances of Dhrystone were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 4 below.
a) The raw MIPS of GR740 board is 745.25 MIPS including CORE#0/#1/#2/#3.
b) The raw MIPS of laysim-gr740-dbt is 329.25 MIPS including CORE#0/#1/#2/#3 and its real-time performance is 44.18%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 4.10% and 2.59%.
d) The cycle accuracy of laysim-GR740 is decreased to 96%.
e) The cycles of TSIM3-LEON4 are increased and its cycle accuracy is decreased to 157.10%.
Table 4. Four Instances of Dhrystone Performance & Cycle Accuracy
Real-Time Performance (%) |
5) Overall RAW MIPS of Dhrystone
Figure 1 shows the overall RAW MIPS of Dhrystone on GR740, laysim-gr740, laysim-gr740-dbt and TSIM3-LEON4 evaluation. The peak raw MIPS of laysim-gr740-dbt is 376.46 MIPS on single core, and it can reach 329.25 MIPS on four cores.
Figure 1. Overall RAW MIPS of Dhrystone
6) Overall Real-Time Performance & Cycle Accuracy of Dhrystone
Figure 2 shows the overall real-time performance and cycle accuracy of Dhrystone on GR740, laysim-gr740, laysim-gr740-dbt and TSIM3-LEON4 evaluation. When a single core of GR740 only works, laysim-gr740-dbt outperforms the real-time performance of GR740. And when two cores of GR740 are running, laysim-gr740-dbt shows 92.93% of the real-time performance of GR740. However, when three or more cores of GR740 are running, laysim-gr740-dbt shows approximately 44.18% of the real-time performance of GR740.
Figure 2. Overall Real-Time Performance & Cycle Accuracy of Dhrystone
3. Stanford Performance & Cycle Accuracy
Stanford benchmark was compiled -O3 option using BCC-2.1.0 without UART output.
1) Single Stanford
When the Stanford was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 5 below.
a) The raw MIPS of GR740 board is 185.18 MIPS.
b) The raw MIPS of laysim-gr740-dbt is 280.34 MIPS and it outperforms 1.51 times better than the real-time performance of GR740.
c) However, laysim-gr740 and TSIM3-LEON4 evaluation based on interpretation method do not satisfy the real-time performance of the GR740.
d) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 109% and 99.5%.
Table 5. Single Stanford Performance & Cycle Accuracy
Real-Time Performance (%) |
2) Two Instances of Stanford
Stanford was compiled as the AMP configuration using BCC-2.1.0. After two instances of Stanford were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 6 below.
a) The raw MIPS of GR740 board is 362.20 MIPS including CORE#0/#1.
b) The raw MIPS of laysim-gr740-dbt is 254.72 MIPS including CORE#0/#1 and its real-time performance is 70.54%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 7%.
d) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 107.19%, 97.72% and 102.73%.
Table 6. Two Instances of Stanford Performance & Cycle Accuracy
Real-Time Performance (%) |
3) Three Instances of Stanford
Stanford was compiled as the AMP configuration using BCC-2.1.0. After three instances of Stanford were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 7 below.
a) The raw MIPS of GR740 board is 531.75 MIPS including CORE#0/#1/#2.
b) The raw MIPS of laysim-gr740-dbt is 264.66 MIPS including CORE#0/#1/#2 and its real-time performance is 49.99%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 5.42% and 4.70%.
d) The cycle accuracy of laysim-GR740 is decreased to 105.0% and 95.7%.
e) The cycles of TSIM3-LEON4 are increased and its cycle accuracy is decreased to 112%.
Table 7. Three Instances of Stanford Performance & Cycle Accuracy
Real-Time Performance (%) |
4) Four Instances of Stanford
Stanford was compiled as the AMP configuration using BCC-2.1.0. After four instances of Stanford were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 8 below.
a) The raw MIPS of GR740 board is 696.59 MIPS including CORE#0/#1/#2/#3.
b) The raw MIPS of laysim-gr740-dbt is 262.42 MIPS including CORE#0/#1/#2/#3 and its real-time performance is 37.9%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 4.51% and 3.38%.
d) The cycle accuracy of laysim-GR740 is decreased to 103.12% and 94.01%.
e) The cycles of TSIM3-LEON4 are increased and its cycle accuracy is decreased to 123.32%.
Table 8. Four Instances of Stanford Performance & Cycle Accuracy
Real-Time Performance (%) |
5) Overall RAW MIPS of Stanford
Figure 3 shows the overall RAW MIPS of Stanford on GR740, laysim-gr740, laysim-gr740-dbt and TSIM3-LEON4 evaluation. The peak raw MIPS of laysim-gr740-dbt is 280.34 MIPS on single core, and it can reach 262.42 MIPS on four cores.
6) Overall Real-Time Performance & Cycle Accuracy of Stanford
Figure 5 shows the overall real-time performance and cycle accuracy of Stanford on GR740, laysim-gr740, laysim-gr740-dbt and TSIM3-LEON4 evaluation. When a single core of GR740 only works, laysim-gr740-dbt outperforms the real-time performance of GR740. And when two cores of GR740 are running, laysim-gr740-dbt shows 70.54% of the real-time performance of GR740. However, when three or more cores of GR740 are running, laysim-gr740-dbt shows approximately 37.9% of the real-time performance of GR740.
Figure 4. Overall Real-Time Performance & Cycle Accuracy of Standford
4. RTEMS 5.1.0-RC1 SMP testsuites Performance & Cycle Accuracy
Four examples of SMP testsuites of RTEMS 5.1.0-RC1 (SMPATOMIC01, SMPSCHEDAFFINITY05, SMPSCHEDEDF03, SMPLOCK01) which are modified to block UART output in order to exclude the cycle effect of UART model are executed on GR740, laysim-gr740, laysim-gr740-dbt and TSIM3-LEON4 evaluation. Four examples use all cores of GR740 and their CPU load is almost 100%.
Figure 5. Real-Time Performance & Cycle Accuracy of RTEMS 5.1.0-RC1 SMP testsuites
1) SMPATOMIC01
When the SMPATOMIC01 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 9 below.
a) The real-time performance of laysim-gr740-dbt is only 46.64%, and it do not satisfy the real-time performance of the GR740.
b) The real-time performance of laysim-gr740 and TSIM3-LEON4 is only 6.85% and 16.10%.
c) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 100%.
Table 9. Performance & Cycle Accuracy of SMPATOMIC01
Real-Time Performance (%) |
2) SMPSCHEDAFFINITY05
When the SMPSCHEDAFFINITY05 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 10 below.
a) The real-time performance of laysim-gr740-dbt is only 22.14%, and it do not satisfy the real-time performance of the GR740.
b) The real-time performance of laysim-gr740 and TSIM3-LEON4 is only 3.2%.
c) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 98%.
Table 10. Performance & Cycle Accuracy of SMPSCHEDAFFINITY05
Real-Time Performance (%) |
3) SMPSCHEDEDF03
When the SMPSCHEDEDF03 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 11 below.
a) The real-time performance of laysim-gr740-dbt is only 17.80%, and it do not satisfy the real-time performance of the GR740.
b) The real-time performance of laysim-gr740 and TSIM3-LEON4 is only 3.5%.
c) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 100%.
Table 11. Performance & Cycle Accuracy of SMPSCHEDEDF03
Real-Time Performance (%) |
4) SMPLOCK01
When the SMPLOCK01 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 12 below.
a) The real-time performance of laysim-gr740-dbt is only 36.90%, and it do not satisfy the real-time performance of the GR740.
b) The real-time performance of laysim-gr740 is 5.53%.
c) The result of TSIM3-LEON4 shows inaccurate results. Its cycle accuracy is only 58.41%.
Table 12. Performance & Cycle Accuracy of SMPLOCK01
Real-Time Performance (%) |
5. Conclusion and Future Work
Although laysim-gr740-dbt which uses the dynamic binary translation (JIT) shows the high performance and correct cycle accuracy for GR740 processor, but it cannot satisfy the real-time performance of GR740 under the SMP environment. Its peak performance is about 46.64% and it shows 30.87% performance at average under the SMP environment which utilizes almost 100% CPU load of GR740.
The real-time performance of laysim-gr740 and TSIM3-LEON4 adopting the interpretation method cannot be expected at all, which shows the average 4.8% of real-time performance of GR740 under the SMP environment.
Since the performance of the processor emulator does not satisfy the performance of the GR740 under the SMP environment at all, so KARI/FSW team is developing a new processor emulator which based on multi-thread framework with DBT to satisfy the real-time performance of GR740 multicores. It will allocate each core of the GR740 processor to each core of x64 respectively, and achieve maximum performance with minimal synchronization among each cores. (But it will require at least 6 cores of x64 host.)
댓글
댓글 쓰기