Real-Time Performance & Cycle Accuracy of laysim-GR740

Real-Time Performance & Cycle Accuracy of laysim-GR740

In general, a processor emulator based on interpretation method cannot not satisfy the real-time performance of guest CPU when the guest CPU operates at 70MHz or higher. A processor emulator adopting dynamic binary translation (JIT) method can achieve its real-time performance to 500 MIPS at maximum (for laysim-LEON3-PRO), so it can meet the real-time performance of single core of GR740 processor which runs at 250MHz. However, a processor emulator based on DBT can never satisfy real-time performance of GR740 under SMP multicore environment.

When analyzing the performance of laysim-GR740 with various multi-core benchmarking programs such as NPB, SPEC 2012 OMP and RTEMS5-SMP, the real-time performance of laysim-gr740-dbt can achieve 46.64% performance at maximum and 30.87% performance at average. So KARI/FSW team is developing a new processor emulator which based on multi-thread framework with DBT to satisfy the real-time performance of GR740 multicores.


1. Test Environment

The performance and cycle accuracy of laysim-GR740 were measured on Intel i7-6700K @4.00GHz CentOS 7 64bit (4 cores and 8GB RAM) under VMware Workstation 15 Pro. And Its performance and cycle accuracy were compared with GR740 development board and TSIM3-LEON4 3.0.0 evaluation version. Since TSIM3-LEON4 3.0.0 evaluation version from Gaisler only operates as FAST UART, some codes are modified to block UART output in order to exclude the cycle effect of UART model.

2. Dhrystone Performance & Cycle Accuracy

Dhrystone benchmark with 1000000 was compiled -O3 option using BCC-2.1.0 without UART output.

1) Single Dhrystone

When the Dhrystone was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 1 below. 

a) The raw MIPS of GR740 board is 193.73 MIPS. 
b) The raw MIPS of laysim-gr740-dbt is 376.46 MIPS and it outperforms 1.94 times better than the real-time performance of GR740. 
c) However, laysim-gr740 and TSIM3-LEON4 evaluation based on interpretation method do not satisfy the real-time performance of the GR740.
d) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 100%.

Table 1. Single Dhrystone Performance & Cycle Accuracy

Dhrystone (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle Accuracy (%)
GR740 (250MHz)3340101441724.134784193.73 100.00 431033696100.00
laysim-gr74033401014613262.125.19 13.00 434026883100.69
laysim-gr740-dbt334010146887.238376.46 194.33 432016291100.23
TSIM3-LEON4 Eval.3340101461052031.75 16.39 436030672101.16

2) Two Instances of Dhrystone

Dhrystone was compiled as the AMP configuration using BCC-2.1.0. After two instances of Dhrystone were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 2 below. 

a) The raw MIPS of GR740 board is 384.75 MIPS including CORE#0/#1. 
b) The raw MIPS of laysim-gr740-dbt is 357.55 MIPS including CORE#0/#1 and its real-time performance is 92.93%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 6.7%.
d) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 100%.

Table 2. Two Instances of Dhrystone Performance & Cycle Accuracy

Dhrystone 2 (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle Accuracy (%)
GR740 (250MHz)6680110651736.242072384.75 100.00 431033696100.00
laysim-gr74066800172925918.925.77 6.70 434026940100.69
laysim-gr740-dbt6680017291868.28357.55 92.93 432016303100.23
TSIM3-LEON4 Eval.6680093842559026.10 6.78 466049427108.12

3) Three Instances of Dhrystone

Dhrystone was compiled as the AMP configuration using BCC-2.1.0. After three instances of Dhrystone were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 3 below. 

a) The raw MIPS of GR740 board is 566.16 MIPS including CORE#0/#1/#2. 
b) The raw MIPS of laysim-gr740-dbt is 345.37 MIPS including CORE#0/#1/#2 and its real-time performance is 61%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 6.7%.
d) The cycle accuracy of laysim-GR740 is decreased to 98.09% and 97.64%.
e) The cycles of TSIM3-LEON4 are increased and its cycle accuracy is decreased to 124.10%.

Table 3. Three Instances of Dhrystone Performance & Cycle Accuracy

Dhrystone 3 (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle Accuracy (%)
GR740 (250MHz)10020129421769.840384566.16 100.00 442460096100.00
laysim-gr740100199330635060.128.58 5.05 43402693998.09
laysim-gr740-dbt10019933062901.21345.37 61.00 43201630397.64
TSIM3-LEON4 Eval.10020072114793020.91 3.69 549074177124.10

4) Four Instances of Dhrystone

Dhrystone was compiled as the AMP configuration using BCC-2.1.0. After four instances of Dhrystone were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 4 below. 

a) The raw MIPS of GR740 board is 745.25 MIPS including CORE#0/#1/#2/#3. 
b) The raw MIPS of laysim-gr740-dbt is 329.25 MIPS including CORE#0/#1/#2/#3 and its real-time performance is 44.18%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 4.10% and 2.59%.
d) The cycle accuracy of laysim-GR740 is decreased to 96%.
e) The cycles of TSIM3-LEON4 are increased and its cycle accuracy is decreased to 157.10%.

Table 4. Four Instances of Dhrystone Performance & Cycle Accuracy

Dhrystone 4 (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle Accuracy (%)
GR740 (250MHz)13360133741792.693464745.25 100.00 448173366100.00
laysim-gr740133598488443686.330.58 4.10 43402693996.84
laysim-gr740-dbt13359848844057.63329.25 44.18 43201630396.39
TSIM3-LEON4 Eval.13360065376932019.27 2.59 704097695157.10

5) Overall RAW MIPS of Dhrystone

Figure 1 shows the overall RAW MIPS of Dhrystone on GR740, laysim-gr740, laysim-gr740-dbt and TSIM3-LEON4 evaluation. The peak raw MIPS of laysim-gr740-dbt is 376.46 MIPS on single core, and it can reach 329.25 MIPS on four cores.

Figure 1. Overall RAW MIPS of Dhrystone



6) Overall Real-Time Performance & Cycle Accuracy of Dhrystone

Figure 2 shows the overall real-time performance and cycle accuracy of Dhrystone on GR740, laysim-gr740laysim-gr740-dbt and TSIM3-LEON4 evaluation. When a single core of GR740 only works, laysim-gr740-dbt outperforms the real-time performance of GR740. And when two cores of GR740 are running, laysim-gr740-dbt shows 92.93% of the real-time performance of GR740. However, when three or more cores of GR740 are running, laysim-gr740-dbt shows approximately 44.18% of the real-time performance of GR740.

Figure 2. Overall Real-Time Performance & Cycle Accuracy of Dhrystone


3. Stanford Performance & Cycle Accuracy

Stanford benchmark was compiled -O3 option using BCC-2.1.0 without UART output.

1) Single Stanford

When the Stanford was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 5 below. 

a) The raw MIPS of GR740 board is 185.18 MIPS. 
b) The raw MIPS of laysim-gr740-dbt is 280.34 MIPS and it outperforms 1.51 times better than the real-time performance of GR740. 
c) However, laysim-gr740 and TSIM3-LEON4 evaluation based on interpretation method do not satisfy the real-time performance of the GR740.
d) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 109% and 99.5%.

Table 5. Single Stanford Performance & Cycle Accuracy

Stanford (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle Accuracy (%)
GR740 (250MHz)1305844870.518936185.18 100.00 17629734100.00
laysim-gr74013059162529.57324.66 13.32 19307982109.52
laysim-gr740-dbt1305916246.584280.34 151.38 1755074899.55
TSIM3-LEON4 Eval.1305916241031.85 17.20 1754676099.53

2) Two Instances of Stanford

Stanford was compiled as the AMP configuration using BCC-2.1.0. After two instances of Stanford were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 6 below. 

a) The raw MIPS of GR740 board is 362.20 MIPS including CORE#0/#1. 
b) The raw MIPS of laysim-gr740-dbt is 254.72 MIPS including CORE#0/#1 and its real-time performance is 70.54%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 7%.
d) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 107.19%, 97.72% and 102.73%.

Table 6. Two Instances of Stanford Performance & Cycle Accuracy

Stanford 2 (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle Accuracy (%)
GR740 (250MHz)2602108471.84244362.20 100.00 17960610100.00
laysim-gr740259403331024.3325.32 7.01 19251110107.19
laysim-gr740-dbt25940333101.84254.72 70.54 1755076097.72
TSIM3-LEON4 Eval.2600347491028.58 7.89 18451460102.73

3) Three Instances of Stanford

Stanford was compiled as the AMP configuration using BCC-2.1.0. After three instances of Stanford were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 7 below. 

a) The raw MIPS of GR740 board is 531.75 MIPS including CORE#0/#1/#2. 
b) The raw MIPS of laysim-gr740-dbt is 264.66 MIPS including CORE#0/#1/#2 and its real-time performance is 49.99%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 5.42% and 4.70%.
d) The cycle accuracy of laysim-GR740 is decreased to 105.0% and 95.7%.
e) The cycles of TSIM3-LEON4 are increased and its cycle accuracy is decreased to 112%.

Table 7. Three Instances of Stanford Performance & Cycle Accuracy

Stanford 3 (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle Accuracy (%)
GR740 (250MHz)3898956173.32266531.75 100.00 18330665100.00
laysim-gr740388215001353.4928.68 5.42 19251110105.02
laysim-gr740-dbt38821500146.684264.66 49.99 1755076095.75
TSIM3-LEON4 Eval.38967675156024.98 4.70 20529851112.00

4) Four Instances of Stanford

Stanford was compiled as the AMP configuration using BCC-2.1.0. After four instances of Stanford were executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 8 below. 

a) The raw MIPS of GR740 board is 696.59 MIPS including CORE#0/#1/#2/#3. 
b) The raw MIPS of laysim-gr740-dbt is 262.42 MIPS including CORE#0/#1/#2/#3 and its real-time performance is 37.9%, but it cannot meet the real-time performance of the GR740.
c) The real-time performance of laysim-gr740 and TSIM3-LEON4 based on interpretation method is about 4.51% and 3.38%.
d) The cycle accuracy of laysim-GR740 is decreased to 103.12% and 94.01%.
e) The cycles of TSIM3-LEON4 are increased and its cycle accuracy is decreased to 123.32%.

Table 8. Four Instances of Stanford Performance & Cycle Accuracy

Stanford 4 (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle Accuracy (%)
GR740 (250MHz)5201929674.676784696.59 100.00 18669196100.00
laysim-gr740517026671654.3731.25 4.51 19251110103.12
laysim-gr740-dbt51702667197.023262.42 37.90 1755076094.01
TSIM3-LEON4 Eval.51802687221023.44 3.38 23022597123.32

5) Overall RAW MIPS of Stanford

Figure 3 shows the overall RAW MIPS of Stanford on GR740, laysim-gr740laysim-gr740-dbt and TSIM3-LEON4 evaluation. The peak raw MIPS of laysim-gr740-dbt is 280.34 MIPS on single core, and it can reach 262.42 MIPS on four cores.

Figure 3. Overall RAW MIPS of Stanford


6) Overall Real-Time Performance & Cycle Accuracy of Stanford

Figure 5 shows the overall real-time performance and cycle accuracy of Stanford on GR740, laysim-gr740laysim-gr740-dbt and TSIM3-LEON4 evaluation. When a single core of GR740 only works, laysim-gr740-dbt outperforms the real-time performance of GR740. And when two cores of GR740 are running, laysim-gr740-dbt shows 70.54% of the real-time performance of GR740. However, when three or more cores of GR740 are running, laysim-gr740-dbt shows approximately 37.9% of the real-time performance of GR740.

Figure 4. Overall Real-Time Performance & Cycle Accuracy of Standford


4. RTEMS 5.1.0-RC1 SMP testsuites Performance & Cycle Accuracy

Four examples of SMP testsuites of RTEMS 5.1.0-RC1 (SMPATOMIC01, SMPSCHEDAFFINITY05, SMPSCHEDEDF03, SMPLOCK01) which are modified to block UART output in order to exclude the cycle effect of UART model are executed on GR740, laysim-gr740, laysim-gr740-dbt and TSIM3-LEON4 evaluation. Four examples use all cores of GR740 and their CPU load is almost 100%.

Figure 5. Real-Time Performance & Cycle Accuracy of RTEMS 5.1.0-RC1 SMP testsuites


1) SMPATOMIC01

When the SMPATOMIC01 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 9 below. 

a) The real-time performance of laysim-gr740-dbt is only 46.64%, and it do not satisfy the real-time performance of the GR740.
b) The real-time performance of laysim-gr740 and TSIM3-LEON4 is only 6.85% and 16.10%.
c) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 100%.

Table 9. Performance & Cycle Accuracy of SMPATOMIC01

SMPATOMIC (No UART)CyclesWall Clock (ms)Real-Time
Performance (%)
Cycle Accuracy (%)
GR740 (250MHz)20061276898024.510756100.00 100.00
laysim-gr74020242861351172066.85 100.91
laysim-gr740-dbt202149806117204.646.64 100.77
TSIM3-LEON4 Eval.20059353604983016.10 99.99

2) SMPSCHEDAFFINITY05

When the SMPSCHEDAFFINITY05 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 10 below. 

a) The real-time performance of laysim-gr740-dbt is only 22.14%, and it do not satisfy the real-time performance of the GR740.
b) The real-time performance of laysim-gr740 and TSIM3-LEON4 is only 3.2%.
c) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 98%.

Table 10. Performance & Cycle Accuracy of SMPSCHEDAFFINITY05 

SMPSCHEDAFFINITY05 (No UART)CyclesWall Clock (ms)Real-Time
Performance (%)
Cycle Accuracy (%)
GR740 (250MHz)4004364151601.74566100.00 100.00
laysim-gr74039406649149295.83.25 98.41
laysim-gr740-dbt3938813157233.1622.14 98.36
TSIM3-LEON4 Eval.396375290473403.38 98.99

3) SMPSCHEDEDF03

When the SMPSCHEDEDF03 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 11 below. 

a) The real-time performance of laysim-gr740-dbt is only 17.80%, and it do not satisfy the real-time performance of the GR740.
b) The real-time performance of laysim-gr740 and TSIM3-LEON4 is only 3.5%.
c) laysim-GR740 and TSIM3-LEON4 evaluation shows the cycle accuracy close to 100%.

Table 11. Performance & Cycle Accuracy of SMPSCHEDEDF03 

SMPSCHEDEDF03 (No UART)CyclesWall Clock (ms)Real-Time
Performance (%)
Cycle Accuracy (%)
GR740 (250MHz)250545520910021.82084100.00 100.00
laysim-gr74025221182842804313.57 100.67
laysim-gr740-dbt251839971856309.317.80 100.52
TSIM3-LEON4 Eval.25050259732813303.56 99.98

4) SMPLOCK01

When the SMPLOCK01 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 12 below. 

a) The real-time performance of laysim-gr740-dbt is only 36.90%, and it do not satisfy the real-time performance of the GR740.
b) The real-time performance of laysim-gr740 is 5.53%.
c) The result of TSIM3-LEON4 shows inaccurate results. Its cycle accuracy is only 58.41%.

Table 12. Performance & Cycle Accuracy of SMPLOCK01

SMPLOCK01 (No UART)CyclesWall Clock (ms)Real-Time
Performance (%)
Cycle Accuracy (%)
GR740 (250MHz)775123312731004.93251100.00 100.00
laysim-gr74077694058015610585.53 100.23
laysim-gr740-dbt776656201084027.136.90 100.20
TSIM3-LEON4 Eval.42949670403130909.90 55.41

5. Conclusion and Future Work

Although laysim-gr740-dbt which uses the dynamic binary translation (JIT) shows the high performance and correct cycle accuracy for GR740 processor, but it cannot satisfy the real-time performance of GR740 under the SMP environment. Its peak performance is about 46.64% and it shows 30.87% performance at average under the SMP environment which utilizes almost 100% CPU load of GR740.

The real-time performance of laysim-gr740 and TSIM3-LEON4 adopting the interpretation method cannot be expected at all, which shows the average 4.8% of real-time performance of GR740 under the SMP environment.

Since the performance of the processor emulator does not satisfy the performance of the GR740 under the SMP environment at all, so KARI/FSW team is developing a new processor emulator which based on multi-thread framework with DBT to satisfy the real-time performance of GR740 multicores. It will allocate each core of the GR740 processor to each core of x64 respectively, and achieve maximum performance with minimal synchronization among each cores. (But it will require at least 6 cores of x64 host.)

댓글

이 블로그의 인기 게시물

laysim Processor Emulator for GR740, GR712RC, UT700 and more..

Download laysim-NOELV, laysim-GR740, laysim-GR712RC, and laysim-UT700

Release laysim-GR740 for Evaluation