Overclockaholics Forums

Overclockaholics Forums (http://www.overclockaholics.com/forums/index.php)
-   Overclocking Utilities/Benches (http://www.overclockaholics.com/forums/forumdisplay.php?f=84)
-   -   'Monstrous Jesters' benchmark package (http://www.overclockaholics.com/forums/showthread.php?t=5132)

Sanmayce 03-20-2012 06:40 AM

'Monstrous Jesters' benchmark package
 
For a long time looking on tests I couldn't find some answers to some very basic (but/and important) aspects of CPU/RAM performance.
I am talking about sorting/decompressing/searching performed by console tools (written in C).
For example I have no opportunity to run my tests on some real powerhouse, this limits my quest of writing the fastest memmem (in C) function because i5/i7 have very different behavior (compared to Core 2) when comes to 1/2/4 bytes fetching. I mean already tuned functions for one CPU/RAM system are no longer superior on a newer system which demands intensive testing in order to retune them.

You all are welcome to use my latest benchmark (a NSIS installation) at:
http://www.sanmayce.com/Downloads/index.html#Jesters

'Monstrous Jesters' benchmark package short overview:

This is my latest 32bit/64bit (strstr-showdown included) CPU/RAM benchmark package (a NSIS installation).

File: Monstrous_Jesters.exe
Size: 153 MB (161,009,933 bytes)
Size unpacked: 500 MB
Size needed: 1200 MB

After installation 5 shortcuts (tests) are placed on Desktop/Programs.

http://www.sanmayce.com/Downloads/Monstrous_Jesters.png

All tests are written in C (sources included), and compiled with latest Intel 12.1 and Microsoft 16 optimizers.

The MEMMEM (strstr-showdown) takes some 21minutes to complete on Core2Duo_E7500_2.93Ghz.
Of course in order to obtain decent results stop all the concurrent processes before running the test.
Also enable 100% computing power.

Well, there are some additional tests (Intel 12.1 and Microsoft 16 executables included):
- lzpre a LZ77 32bit/64bit [de]compressor, written by Matt Mahoney;
- Yappy a LZ 32bit/64bit [de]compressor, written by IronPeter;
- Knight tour benchmark, finds first 9,000,000 tours (at rate some 1 billion per minute jumps), in fact tests/stresses only CPU clock;
- Quicksort 32bit/64bit used to sort 200,000,000+ pointers (pointing to 7bytes chunks).

Also I would be glad for some feedback and results on your machines.

Enjoy!

Splave 03-20-2012 07:03 AM

Cool I'll give it a shot on my x79

Sanmayce 03-20-2012 07:06 AM

I rely on you Splave, take your time I have been waiting years so I am not in a hurry.
Feel free to ask whatever interests you.

Neuromancer 03-20-2012 07:45 AM

I will take a look at it this week, if I like it I will toss it into my next review :)

Sanmayce 03-20-2012 07:50 AM

Thank you Neuromancer.

MaadDaawg 03-20-2012 12:01 PM

Are you looking for SB and SB-E testing only, or would a 980x system be helpful as well?

rickss69 03-20-2012 07:03 PM

5 Attachment(s)
No clue what all this means...or if I even did it correctly.

Sanmayce 03-22-2012 07:36 AM

Quote:

Originally Posted by MaadDaawg (Post 90539)
Are you looking for SB and SB-E testing only, or would a 980x system be helpful as well?

Are you looking for SB and SB-E testing only, or would a 980x system be helpful as well?

Wow, the three i7 systems will do perfectly, I am not pretentious as long as i7 is involved, nevertheless the latest Sandy-Bridge-E is gonna quench well the greediness in me.
I am very interested in how these super low memory latencies in SB are gonna affect my MEMMEM functions (stressing memory bandwidth along with physical RAM IOPS i.e. being latency bound).

A week ago I saw a 5GHz SB with 22GB/s Memory Read bandwidth, my miserable/old laptop gives 5GB/s whereas my MEMMEM functions work at 3-4GB/s do the math how close are they to the limit. Therefore the thing that would make my eyes happy is a machine with High Performance CPU-RAM bus maybe triple channel is the answer (the above mentioned 22GB/s were achieved with i7 2700K @ 4.9GHz (1.420V) 24/7 Max 69C; 4 x 4GB Samsung Extreme Low Voltage 1866MHz @ 8-9-9-24-1T at 1.5V's).

Just uploaded revision B of 'Monstrous Jesters' - a new multi-threaded (up to 48 threads stressing RAM/Cores) test was added.

Thank you MaadDaawg for your readiness to help me.

Sanmayce 03-22-2012 07:52 AM

Quote:

Originally Posted by rickss69 (Post 90544)
No clue what all this means...or if I even did it correctly.

Thanks a lot rickss69, I will explain but please give me some specs of your machine.

Last night I run the new Revision B on my T7500 2200MHz dual channel DDR2 667MHz:

http://www.sanmayce.com/Downloads/Mo...rB_2_T7500.png

Looking at Knight Tours test your/my results are: 90s/218s, let me guess here your CPU runs at 218/90*2200MHz = 5328MHz or I am wrong?

Results for 'Monstrous Jesters' revision B on my laptop T7500 2200MHz (4MB L2 cache) 4GB dual channel DDR2 667MHz using Windows 7 64bit:

Test #1: MEMMEM

OSHO.TXT:

SHORT-SHOWDOWN_Intel_O3_64bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2725KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496

BNDM_64 49 i.e. average performance: 2524KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968

Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2122KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448

Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2352KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104
]

strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_64bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2689KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496

BNDM_64 49 i.e. average performance: 2414KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968

Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 1737KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448

Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2565KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104
]

strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_32bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2947KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496

BNDM_64 49 i.e. average performance: 2201KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968

Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 1593KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448

Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2958KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104
]

hs_alt_HuRef_chr1.fa:

SHORT-SHOWDOWN_Intel_O3_64bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2711KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000

BNDM_64 49 i.e. average performance: 3535KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528

Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2636KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624

Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2397KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528
]

strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_64bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2868KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000

BNDM_64 49 i.e. average performance: 3397KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528

Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2266KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624

Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2592KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528
]

strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_32bit.exe:
[
Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2977KB/clock
Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000

BNDM_64 49 i.e. average performance: 3131KB/clock
BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528

Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2052KB/clock
Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624

Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 3035KB/clock
Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528
]

Test #2: LZ Yappy

Yappy_Intel_32bit_O3.exe: comp 29.9 MB/s uncomp 512.5 MB/s
Yappy_Intel_32bit_Ox.exe: comp 33.1 MB/s uncomp 513.0 MB/s
Yappy_Microsoft_32bit_Ox.exe: comp 32.3 MB/s uncomp 527.1 MB/s

Test #3: qpress

Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 2
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 505MB/s

Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 4
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 505MB/s

Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 6
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 505MB/s

Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 8
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 486MB/s

Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 12
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 467MB/s

Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 24
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 450MB/s

Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 32
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 467MB/s

Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 48
Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 332MB/s

Test #4: LZMM

lzpre2_32bit_Microsoft_Ox.exe: 29.25 sec
lzpre2_x64_Intel_O3.exe: 26.74 sec
lzpre2_x64_Microsoft_Ox.exe: 27.10 sec

Test #5: Quicksort

Simplicius_Simplicissimus_Septupleton_Intel_32bit_ v12_Ox.exe:
Sort took: 196062 clocks
Decompression to RAM without Dumping to DRIVE performance: 174943 KB/s or 170 MB/s
Benchmarking 'memcpy' by copying 197MB (OSHO.TXT size) ten times ...
Simplicius says for 'memcpy' performance: 1802 MB/s

Simplicius_Simplicissimus_Septupleton_Microsoft_32 bit_v16_Ox.exe:
Sort took: 220819 clocks
Decompression to RAM without Dumping to DRIVE performance: 212247 KB/s or 207 MB/s
Benchmarking 'memcpy' by copying 197MB (OSHO.TXT size) ten times ...
Simplicius says for 'memcpy' performance: 1418 MB/s

Test #6: Knight Tours

Knight-tour_Microsoft_V16_32bit_Ox.exe: 218.13 seconds
Knight-tour_Intel_V12_32bit_Ox.exe: 227.73 seconds

Hope the above results are a good (but poor in the same time) starting point to feel how Core 2 lags behind compared to new architectures.

rickss69 03-22-2012 08:05 AM

My runs were with the gamer which has no overclock atm (2600K).


All times are GMT -10. The time now is 10:49 AM.


Copyright ©2009 Overclockaholics.com