Update laysim-GR740 v0.11 & Real-Time Performance & Cycle Accuracy (II)

 

Update laysim-GR740 v0.11 & Real-Time Performance & Cycle Accuracy (II)


laysim-GR740 is updated to v0.11 in order to enhance the real-time performance under the SMP environment which utilizes almost 100% CPU load of GR740.

The performance of laysim-gr740-dbt v0.11 is improved by 23% on average, and the real-time performance of SMPLOCK01 is improved by 35.85% compared to v0.10 for example.

Despite the performance improvement of laysim-gr740-dbt, it still doesn't satisfy the performance of GR740 under the SMP environment. 

KARI/FSW team is developing a new processor emulator which based on multi-thread framework with DBT to satisfy the real-time performance of GR740 multicores. It will allocate each core of the GR740 processor to each core of x64 respectively, and achieve maximum performance with minimal synchronization among each cores. (But it will require at least 6 cores of x64 host.)


1. Test Environment

The performance and cycle accuracy of laysim-GR740 were measured on Intel i7-6700K @4.00GHz CentOS 7 64bit (4 cores and 8GB RAM) under VMware Workstation 15 Pro. And Its performance and cycle accuracy were compared with GR740 development board and TSIM3-LEON4 3.0.0 evaluation version. Since TSIM3-LEON4 3.0.0 evaluation version from Gaisler only operates as FAST UART, some codes are modified to block UART output in order to exclude the cycle effect of UART model.

2. Dhrystone Performance & Cycle Accuracy

Dhrystone benchmark with 1000000 was compiled -O3 option using BCC-2.1.0 without UART output.

1) Single Dhrystone

The raw MIPS of laysim-gr740-dbt v0.11 shows 469 MIPS with 100.23% cycle accuracy.

Table 1. Single Dhrystone Performance & Cycle Accuracy

Dhrystone (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle
Accuracy (%)
GR740 (250MHz)3340101441724.134784193.73100431033696100
laysim-gr74033401014613262.125.1913434026883100.69
laysim-gr740-dbt v0.10334010146887.238376.46194.33432016291100.23
laysim-gr740-dbt v0.11334010146712.172469242.1432016291100.23
TSIM3-LEON4 Eval.3340101461052031.7516.39436030672101.16

2) Two Instances of Dhrystone

When two instances of Dhrystone were executed as AMP application on laysim-gr740-dbt v0.11, it shows 488.35 raw MIPS and its real-time performance is 126.93%. It can meet the real-time performance of GR740.

Table 2. Two Instances of Dhrystone Performance & Cycle Accuracy

Dhrystone 2 (No UART)# of INSTWall
Clock (ms)
RAW MIPSReal-Time
Performance (%)
CyclesCycle
Accuracy (%)
GR740 (250MHz)6680110651736.242072384.75100431033696100
laysim-gr74066800172925918.925.776.7434026940100.69
laysim-gr740-dbt v0.106680017291868.28357.5592.93432016303100.23
laysim-gr740-dbt v0.116680017291367.88488.35126.93432016303100.23
TSIM3-LEON4 Eval.6680093842559026.16.78466049427108.12

3) Three Instances of Dhrystone

When three instances of Dhrystone were executed as AMP application on laysim-gr740-dbt v0.11, it shows 491.02 raw MIPS and its real-time performance is 86.73%. It cannot meet the real-time performance of the GR740.

Table 3. Three Instances of Dhrystone Performance & Cycle Accuracy

Dhrystone 3 (No UART)# of INSTWall
Clock (ms)
RAW MIPSReal-Time
Performance (%)
CyclesCycle
Accuracy (%)
GR740 (250MHz)10020129421769.840384566.16100442460096100
laysim-gr740100199330635060.128.585.0543402693998.09
laysim-gr740-dbt v0.1010019933062901.21345.376143201630397.64
laysim-gr740-dbt v0.1110019933062040.63491.0286.7343201630397.64
TSIM3-LEON4 Eval.10020072114793020.913.69549074177124.1

4) Four Instances of Dhrystone

When four instances of Dhrystone were executed as AMP application on laysim-gr740-dbt v0.11, it shows 471.93 raw MIPS and its real-time performance is 63.73%. It cannot meet the real-time performance of the GR740.

Table 4. Four Instances of Dhrystone Performance & Cycle Accuracy

Dhrystone 4 (No UART)# of INSTWall
Clock (ms)
RAW MIPSReal-Time
Performance (%)
CyclesCycle
Accuracy (%)
GR740 (250MHz)13360133741792.693464745.25100448173366100
laysim-gr740133598488443686.330.584.143402693996.84
laysim-gr740-dbt v0.1013359848844057.63329.2544.1843201630396.39
laysim-gr740-dbt v0.1113359848842830.92471.9363.3343201630396.39
TSIM3-LEON4 Eval.13360065376932019.272.59704097695157.1

5) Overall RAW MIPS of Dhrystone

Figure 1 shows the overall RAW MIPS of Dhrystone on GR740, laysim-gr740laysim-gr740-dbt v0.10/v0.11 and TSIM3-LEON4 evaluation. The peak raw MIPS of laysim-gr740-dbt is 491.02 raw MIPS, and it can reach 471.93 raw MIPS on four cores.

Figure 1. Overall RAW MIPS of Dhrystone



6) Overall Real-Time Performance & Cycle Accuracy of Dhrystone

Figure 2 shows the overall real-time performance and cycle accuracy of Dhrystone on GR740, laysim-gr740laysim-gr740-dbt v0.10/v0.11 and TSIM3-LEON4 evaluation. When a single core of GR740 only works, laysim-gr740-dbt outperforms the real-time performance of GR740. And when two cores of GR740 are running, laysim-gr740-dbt shows 126.93% of the real-time performance of GR740. However, when three or more cores of GR740 are running, laysim-gr740-dbt shows approximately 87.73%/63.33% of the real-time performance of GR740.

Figure 2. Overall Real-Time Performance & Cycle Accuracy of Dhrystone


3. Stanford Performance & Cycle Accuracy

Stanford benchmark was compiled -O3 option using BCC-2.1.0 without UART output.

1) Single Stanford

The raw MIPS of laysim-gr740-dbt v0.11 shows 366.21 MIPS with 99.55% cycle accuracy.

Table 5. Single Stanford Performance & Cycle Accuracy

Stanford (No UART)# of INSTWall
Clock (ms)
RAW MIPSReal-Time
Performance (%)
CyclesCycle
Accuracy (%)
GR740 (250MHz)1305844870.518936185.1810017629734100
laysim-gr74013059162529.57324.6613.3219307982109.52
laysim-gr740-dbt v0.101305916246.584280.34151.381755074899.55
laysim-gr740-dbt v0.111305916235.66366.21197.751755100499.55
TSIM3-LEON4 Eval.1305916241031.8517.21754676099.53

2) Two Instances of Stanford

When two instances of Stanford were executed as AMP application on laysim-gr740-dbt v0.11, it shows 389.36 raw MIPS and its real-time performance is 107.83%. It can meet the real-time performance of GR740.

Table 6. Two Instances of Stanford Performance & Cycle Accuracy

Stanford 2 (No UART)# of INSTWall
Clock (ms)
RAW MIPSReal-Time
Performance (%)
CyclesCycle
Accuracy (%)
GR740 (250MHz)2602108471.84244362.210017960610100
laysim-gr740259403331024.3325.327.0119251110107.19
laysim-gr740-dbt v0.1025940333101.84254.7270.541755076097.72
laysim-gr740-dbt v0.112594033366.623389.36107.831755101697.72
TSIM3-LEON4 Eval.2600347491028.587.8918451460102.73

3) Three Instances of Stanford

When three instances of Stanford were executed as AMP application on laysim-gr740-dbt v0.11, it shows 395.28 raw MIPS and its real-time performance is 74.66%. It cannot meet the real-time performance of the GR740.

Table 7. Three Instances of Stanford Performance & Cycle Accuracy

Stanford 3 (No UART)# of INSTWall
Clock (ms)
RAW MIPSReal-Time
Performance (%)
CyclesCycle
Accuracy (%)
GR740 (250MHz)3898956173.32266531.7510018330665100
laysim-gr740388215001353.4928.685.4219251110105.02
laysim-gr740-dbt v0.1038821500146.684264.6649.991755076095.75
laysim-gr740-dbt v0.113882150098.213395.2874.661755101695.75
TSIM3-LEON4 Eval.38967675156024.984.720529851112

4) Four Instances of Stanford

When four instances of Stanford were executed as AMP application on laysim-gr740-dbt v0.11, it shows 388.13 raw MIPS and its real-time performance is 56.06%. It cannot meet the real-time performance of the GR740.

Table 8. Four Instances of Stanford Performance & Cycle Accuracy

Stanford 4 (No UART)# of INSTWall Clock (ms)RAW MIPSReal-Time
Performance (%)
CyclesCycle
Accuracy (%)
GR740 (250MHz)5201929674.676784696.5910018669196100
laysim-gr740517026671654.3731.254.5119251110103.12
laysim-gr740-dbt v0.1051702667197.023262.4237.91755076094.01
laysim-gr740-dbt v0.1151702667133.208388.1356.061755101694.01
TSIM3-LEON4 Eval.51802687221023.443.3823022597123.32

5) Overall RAW MIPS of Stanford

Figure 3 shows the overall RAW MIPS of Stanford on GR740, laysim-gr740laysim-gr740-dbt v0.10/v0.11 and TSIM3-LEON4 evaluation. The peak raw MIPS of laysim-gr740-dbt is 395.28 raw MIPS, and it can reach 388.13 raw MIPS on four cores.

Figure 3. Overall RAW MIPS of Stanford


6) Overall Real-Time Performance & Cycle Accuracy of Stanford

Figure 5 shows the overall real-time performance and cycle accuracy of Stanford on GR740, laysim-gr740laysim-gr740-dbt v0.10/v0.11 and TSIM3-LEON4 evaluation. When a single core of GR740 only works, laysim-gr740-dbt outperforms the real-time performance of GR740. And when two cores of GR740 are running, laysim-gr740-dbt shows 107.83% of the real-time performance of GR740. However, when three or more cores of GR740 are running, laysim-gr740-dbt shows approximately 74.66%/56.06% of the real-time performance of GR740.

Figure 4. Overall Real-Time Performance & Cycle Accuracy of Standford


4. RTEMS 5.1.0-RC1 SMP testsuites Performance & Cycle Accuracy

Four examples of SMP testsuites of RTEMS 5.1.0-RC1 (SMPATOMIC01, SMPSCHEDAFFINITY05, SMPSCHEDEDF03, SMPLOCK01) which are modified to block UART output in order to exclude the cycle effect of UART model are executed on GR740, laysim-gr740laysim-gr740-dbt and TSIM3-LEON4 evaluation. Four examples use all cores of GR740 and their CPU load is almost 100%.

Figure 5. Real-Time Performance & Cycle Accuracy of RTEMS 5.1.0-RC1 SMP testsuites



1) SMPATOMIC01

When the SMPATOMIC01 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 9 below. 

The real-time performance of laysim-gr740-dbt v0.11 shows 75.04% with 100.77% cycle accuracy.

Table 9. Performance & Cycle Accuracy of SMPATOMIC01

SMPATOMIC01 (No UART)CyclesWall Clock (ms)Real-Time
Performance (%)
Cycle Accuracy (%)
GR740 (250MHz)20061276898024.510756100100
laysim-gr74020242861351172066.85100.91
laysim-gr740-dbt v0.10202149806117204.646.64100.77
laysim-gr740-dbt v0.11202150206310693.575.04100.77
TSIM3-LEON4 Eval.20059353604983016.199.99

2) SMPSCHEDAFFINITY05

When the SMPSCHEDAFFINITY05 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 10 below. 

The real-time performance of laysim-gr740-dbt v0.11 shows 39.11% with 97.74% cycle accuracy.

Table 10. Performance & Cycle Accuracy of SMPSCHEDAFFINITY05 

SMPSCHEDAFFINITY05
(No UART)
CyclesWall Clock (ms)Real-Time
Performance (%)
Cycle Accuracy (%)
GR740 (250MHz)4004364151601.74566100100
laysim-gr74039406649149295.83.2598.41
laysim-gr740-dbt v0.103938813157233.1622.1498.36
laysim-gr740-dbt v0.113913819624095.7539.1197.74
TSIM3-LEON4 Eval.396375290473403.3898.99

3) SMPSCHEDEDF03

When the SMPSCHEDEDF03 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 11 below. 

The real-time performance of laysim-gr740-dbt v0.11 shows 29.75% with 100.52% cycle accuracy.

Table 11. Performance & Cycle Accuracy of SMPSCHEDEDF03 

SMPSCHEDEDF03
(No UART)
CyclesWall Clock (ms)Real-Time
Performance (%)
Cycle Accuracy (%)
GR740 (250MHz)250545520910021.82084100100
laysim-gr74025221182842804313.57100.67
laysim-gr740-dbt v0.10251839971856309.317.8100.52
laysim-gr740-dbt v0.11251837601833690.329.75100.52
TSIM3-LEON4 Eval.25050259732813303.5699.98

4) SMPLOCK01

When the SMPLOCK01 was executed on laysim-GR740, GR740 board and TSIM3-LEON4 evaluation, the performance and cycle accuracy are shown in Table 12 below. 

The real-time performance of laysim-gr740-dbt v0.11 shows 72.75% with 100.2% cycle accuracy.

Table 12. Performance & Cycle Accuracy of SMPLOCK01

SMPLOCK01 (No UART)CyclesWall Clock (ms)Real-Time
Performance (%)
Cycle Accuracy (%)
GR740 (250MHz)775123312731004.93251100100
laysim-gr74077694058015610585.53100.23
laysim-gr740-dbt v0.10776656201084027.136.9100.2
laysim-gr740-dbt v0.11776656481842615.972.75100.2
TSIM3-LEON4 Eval.42949670403130909.955.41

5. Conclusion and Future Work Again!

Although laysim-gr740-dbt which uses the dynamic binary translation (JIT) shows the high performance and correct cycle accuracy for GR740 processor, but it cannot satisfy the real-time performance of GR740 under the SMP environment. Its peak performance is about 75.04% and it shows 55.16% performance at average under the SMP environment which utilizes almost 100% CPU load of GR740.

A new GR740 processor emulator which adopting the multi-thread framework with DBT is being developed by KARI/FSW team. However, in the process of development, it is not easy to control each x64 core to synchronize emulated LEON4 core without loosing performance. But development will still continue!

댓글

이 블로그의 인기 게시물

laysim Processor Emulator for GR740, GR712RC, UT700 and more..

Download laysim-NOELV, laysim-GR740, laysim-GR712RC, and laysim-UT700

Release laysim-GR740 for Evaluation