LoongArch: Relax memory ordering for atomic operations
This patch relaxes the implementation while satisfying the memory ordering
requirements for atomic operations, which will help improve performance on
LA664+.
Unixbench with full threads (8)
before after
Dhrystone 2 using register variables
203910714.2
203909539.8 0.00%
Double-Precision Whetstone 37930.9 37931 0.00%
Execl Throughput 29431.5 29545.8 0.39%
File Copy 1024 bufsize 2000 maxblocks
6645759.5
6676320 0.46%
File Copy 256 bufsize 500 maxblocks
2138772.4
2144182.4 0.25%
File Copy 4096 bufsize 8000 maxblocks
11640698.4
11602703 -0.33%
Pipe Throughput
8849077.7
8917009.4 0.77%
Pipe-based Context Switching
1255108.5
1287277.3 2.56%
Process Creation 50825.9 50442.1 -0.76%
Shell Scripts (1 concurrent) 25795.8 25942.3 0.57%
Shell Scripts (8 concurrent) 3812.6 3835.2 0.59%
System Call Overhead
9248212.6
9353348.6 1.14%
=======
System Benchmarks Index Score 8076.6 8114.4 0.47%
Signed-off-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>