![]() |
'Monstrous Jesters' benchmark package
For a long time looking on tests I couldn't find some answers to some very basic (but/and important) aspects of CPU/RAM performance.
I am talking about sorting/decompressing/searching performed by console tools (written in C). For example I have no opportunity to run my tests on some real powerhouse, this limits my quest of writing the fastest memmem (in C) function because i5/i7 have very different behavior (compared to Core 2) when comes to 1/2/4 bytes fetching. I mean already tuned functions for one CPU/RAM system are no longer superior on a newer system which demands intensive testing in order to retune them. You all are welcome to use my latest benchmark (a NSIS installation) at: http://www.sanmayce.com/Downloads/index.html#Jesters 'Monstrous Jesters' benchmark package short overview: This is my latest 32bit/64bit (strstr-showdown included) CPU/RAM benchmark package (a NSIS installation). File: Monstrous_Jesters.exe Size: 153 MB (161,009,933 bytes) Size unpacked: 500 MB Size needed: 1200 MB After installation 5 shortcuts (tests) are placed on Desktop/Programs. http://www.sanmayce.com/Downloads/Monstrous_Jesters.png All tests are written in C (sources included), and compiled with latest Intel 12.1 and Microsoft 16 optimizers. The MEMMEM (strstr-showdown) takes some 21minutes to complete on Core2Duo_E7500_2.93Ghz. Of course in order to obtain decent results stop all the concurrent processes before running the test. Also enable 100% computing power. Well, there are some additional tests (Intel 12.1 and Microsoft 16 executables included): - lzpre a LZ77 32bit/64bit [de]compressor, written by Matt Mahoney; - Yappy a LZ 32bit/64bit [de]compressor, written by IronPeter; - Knight tour benchmark, finds first 9,000,000 tours (at rate some 1 billion per minute jumps), in fact tests/stresses only CPU clock; - Quicksort 32bit/64bit used to sort 200,000,000+ pointers (pointing to 7bytes chunks). Also I would be glad for some feedback and results on your machines. Enjoy! |
Cool I'll give it a shot on my x79
|
I rely on you Splave, take your time I have been waiting years so I am not in a hurry.
Feel free to ask whatever interests you. |
I will take a look at it this week, if I like it I will toss it into my next review :)
|
Thank you Neuromancer.
|
Are you looking for SB and SB-E testing only, or would a 980x system be helpful as well?
|
5 Attachment(s)
No clue what all this means...or if I even did it correctly.
|
Quote:
Wow, the three i7 systems will do perfectly, I am not pretentious as long as i7 is involved, nevertheless the latest Sandy-Bridge-E is gonna quench well the greediness in me. I am very interested in how these super low memory latencies in SB are gonna affect my MEMMEM functions (stressing memory bandwidth along with physical RAM IOPS i.e. being latency bound). A week ago I saw a 5GHz SB with 22GB/s Memory Read bandwidth, my miserable/old laptop gives 5GB/s whereas my MEMMEM functions work at 3-4GB/s do the math how close are they to the limit. Therefore the thing that would make my eyes happy is a machine with High Performance CPU-RAM bus maybe triple channel is the answer (the above mentioned 22GB/s were achieved with i7 2700K @ 4.9GHz (1.420V) 24/7 Max 69C; 4 x 4GB Samsung Extreme Low Voltage 1866MHz @ 8-9-9-24-1T at 1.5V's). Just uploaded revision B of 'Monstrous Jesters' - a new multi-threaded (up to 48 threads stressing RAM/Cores) test was added. Thank you MaadDaawg for your readiness to help me. |
Quote:
Last night I run the new Revision B on my T7500 2200MHz dual channel DDR2 667MHz: http://www.sanmayce.com/Downloads/Mo...rB_2_T7500.png Looking at Knight Tours test your/my results are: 90s/218s, let me guess here your CPU runs at 218/90*2200MHz = 5328MHz or I am wrong? Results for 'Monstrous Jesters' revision B on my laptop T7500 2200MHz (4MB L2 cache) 4GB dual channel DDR2 667MHz using Windows 7 64bit: Test #1: MEMMEM OSHO.TXT: SHORT-SHOWDOWN_Intel_O3_64bit.exe: [ Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2725KB/clock Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496 BNDM_64 49 i.e. average performance: 2524KB/clock BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968 Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2122KB/clock Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448 Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2352KB/clock Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104 ] strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_64bit.exe: [ Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2689KB/clock Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496 BNDM_64 49 i.e. average performance: 2414KB/clock BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968 Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 1737KB/clock Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448 Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2565KB/clock Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104 ] strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_32bit.exe: [ Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2947KB/clock Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2708288/6416464496 BNDM_64 49 i.e. average performance: 2201KB/clock BNDM_64 49 total Skip-Performance/Iterations: 2779920/6213485968 Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 1593KB/clock Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 1880784/8251788448 Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2958KB/clock Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2701232/6466619104 ] hs_alt_HuRef_chr1.fa: SHORT-SHOWDOWN_Intel_O3_64bit.exe: [ Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2711KB/clock Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000 BNDM_64 49 i.e. average performance: 3535KB/clock BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528 Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2636KB/clock Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624 Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2397KB/clock Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528 ] strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_64bit.exe: [ Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2868KB/clock Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000 BNDM_64 49 i.e. average performance: 3397KB/clock BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528 Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2266KB/clock Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624 Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 2592KB/clock Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528 ] strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox_32bit.exe: [ Railgun_Quadruplet_7Tridentx64 49 i.e. average performance: 2977KB/clock Railgun_Quadruplet_7Tridentx64 49 total Skip-Performance/Iterations: 2634368/7091550000 BNDM_64 49 i.e. average performance: 3131KB/clock BNDM_64 49 total Skip-Performance/Iterations: 2806144/6595760528 Railgun_Quadruplet_7Elsiane 49 i.e. average performance: 2052KB/clock Railgun_Quadruplet_7Elsiane 49 total Skip-Performance/Iterations: 2540592/9256480624 Railgun_Quadruplet_7Hasherezade 49 i.e. average performance: 3035KB/clock Railgun_Quadruplet_7Hasherezade 49 total Skip-Performance/Iterations: 2691888/7089590528 ] Test #2: LZ Yappy Yappy_Intel_32bit_O3.exe: comp 29.9 MB/s uncomp 512.5 MB/s Yappy_Intel_32bit_Ox.exe: comp 33.1 MB/s uncomp 513.0 MB/s Yappy_Microsoft_32bit_Ox.exe: comp 32.3 MB/s uncomp 527.1 MB/s Test #3: qpress Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 2 Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 505MB/s Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 4 Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 505MB/s Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 6 Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 505MB/s Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 8 Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 486MB/s Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 12 Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 467MB/s Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 24 Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 450MB/s Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 32 Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 467MB/s Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 48 Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 332MB/s Test #4: LZMM lzpre2_32bit_Microsoft_Ox.exe: 29.25 sec lzpre2_x64_Intel_O3.exe: 26.74 sec lzpre2_x64_Microsoft_Ox.exe: 27.10 sec Test #5: Quicksort Simplicius_Simplicissimus_Septupleton_Intel_32bit_ v12_Ox.exe: Sort took: 196062 clocks Decompression to RAM without Dumping to DRIVE performance: 174943 KB/s or 170 MB/s Benchmarking 'memcpy' by copying 197MB (OSHO.TXT size) ten times ... Simplicius says for 'memcpy' performance: 1802 MB/s Simplicius_Simplicissimus_Septupleton_Microsoft_32 bit_v16_Ox.exe: Sort took: 220819 clocks Decompression to RAM without Dumping to DRIVE performance: 212247 KB/s or 207 MB/s Benchmarking 'memcpy' by copying 197MB (OSHO.TXT size) ten times ... Simplicius says for 'memcpy' performance: 1418 MB/s Test #6: Knight Tours Knight-tour_Microsoft_V16_32bit_Ox.exe: 218.13 seconds Knight-tour_Intel_V12_32bit_Ox.exe: 227.73 seconds Hope the above results are a good (but poor in the same time) starting point to feel how Core 2 lags behind compared to new architectures. |
My runs were with the gamer which has no overclock atm (2600K).
|
Quote:
|
Just looked at:
http://ark.intel.com/ Sandy Bridge-E: Processor Number: i7-3930K # of Cores: 6 # of Threads: 12 Clock Speed: 3.2 GHz Max Turbo Frequency: 3.8 GHz Intel Smart Cache: 12 MB Lithography: 32nm # of Memory Channels: 4 Max Memory Bandwidth: 51.2 GB/s Sandy Bridge-E: Processor Number: i7-3820 # of Cores: 4 # of Threads: 8 Clock Speed: 3.6 GHz Max Turbo Frequency: 3.8 GHz Intel Smart Cache: 10 MB Lithography: 32 nm # of Memory Channels: 4 Max Memory Bandwidth: 51.2 GB/s Gulftown: Processor Number: i7-980X # of Cores: 6 # of Threads: 12 Clock Speed: 3.33 GHz Max Turbo Frequency: 3.6 GHz Intel Smart Cache: 12 MB Lithography: 32 nm # of Memory Channels: 3 Max Memory Bandwidth: 25.6 GB/s Sandy Bridge: Processor Number: i7-2700K # of Cores: 4 # of Threads: 8 Clock Speed: 3.5 GHz Max Turbo Frequency: 3.9 GHz Intel Smart Cache: 8 MB Lithography: 32 nm # of Memory Channels: 2 Max Memory Bandwidth: 21 GB/s Looking on Max Memory Bandwidths (51.2 GB/s vs 25.6 GB/s) one cannot ask oneself how Intel doubled the performance by adding 4 channels vs 3 channels, meaning it should be 6 channels if dummy math is done. |
Bandwidth doubled over X58 because of the limitations to the 1366 IMC.
Notice that sandybridge almost = x58 bandwidth despite only being dual channel memory |
Thanks, I read from time-to-time articles about whole platforms but I must admit I have no experience except my old AMD Barton (the fastest 32bit CPU ever made I believe) and my nowadays Core 2 laptop, I have so much to learn: it is shocking to see how i7 boosts even the clean code (no RAM loads) loops as in Knight Tours benchmark.
|
Bartons were awesome.
Never had one, I ran T-Breds.. then moved on to A64, then back to p3 then back to a64.. then actually ran core2 arch for a little bit (hated it) back to AM2+ then AM3 and intel x58 setups. (skipped p55) Intel had a LONG period of time they sucked, but still rocked the benchmarks. Core2 arch was terrible compared to AMD, but superpied better so everyone drooled over it. X58 was GREAT. And IMHO probably better than Sandybridge except in power consumption. X58 was snappy. Sandy bridge not so much. (Yes it benches betteR) going to fire up the x79 tomorrow... so we will see... In car analogies. the AMD is the ricer quicker off the line but it aint a drag car.... The intel is the top speed car. (like the Bugatti Veyron needing a 13 mile track with a 5 mile arrow straight line to hit top speed. Then again if what I see is true x79 should rock my world. sub 40ns mem latency might be the key. |
>Then again if what I see is true x79 should rock my world. sub 40ns mem latency might be the key.
Double yes. In my limited views the roadmap both for AMD and Intel (aside of making a fat CPU/GPU mix sharing one i.e. common memory!!!) is to continue this trend to lower drastically latencies - call me delusional but I think/dream of 10ns latency for main RAM whereas L1/L2/L3 are gonna be somewhat 1ns/2ns/3ns - bold huh. That is why I directed my intent towards the fine tuning of functions fetching in burst (i.e. sequential) mode small unaligned chunks - being the real BOOST of i5/i7 over all old architectures. For that reason I included a heavy Quicksort test sorting 7bytes chunks, to show how much better behaves i7 compared to inferiors, he-he. And just a note about 'qpress' benchmark: when the resultant text file is loaded into notepad the text is not formatted because of LF endings (*nix format of ending lines i.e. LF), not as Windows users expect CRLF endings, to obtain Windows-like text file just load the file into Wordpad and save - that will do the conversion. |
Quote:
LF=Line feed and CRLF= carriage return line feed. If so, seems odd to me that windows would need to add LF at all after CR.... |
Yes it is odd and retarded, NOTEPAD is to be blamed, not to be able to load properly text files from the LF world (*nix) in my opinion is on purpose - to show that DOS/Windows CRLF endings are to stay, kind of stupid pride.
In fact, qpress uses *nix format so CR should be prefixed to each LF in order the 'proud-in-its-stupidity' NOTEPAD to be able to catch up 21st century. Anyway I plan in next revision C of MJ to convert the qpress.txt with a tiny C written tool before loading into NOTEPAD. Also I plan to add 7th test: ZPAQ - being one of the most powerful compressors on INTERNET, on top of that it is free, open source, and not encumbered by patents. Its author Dr. Matt Mahoney is a renown expert in compression craft. ZPAQ is multi-theaded and stresses well both CPU and RAM, highly cache sensitive/dependent. All-in-all it shows the integer (i.e. non floating point) computational power of modern systems. If anyone has the time and will to send me ZIPed resultant text files from sixth tests along with CPU/RAM info I will be thankful. My desire is to make a comparative (a table or something similar) study and to place it here as well. The analysis is based on result ratios across different systems, for example one of the fastest single-threaded Lempel-Ziv [de]compressors (here dealing with 197MB English text file): T7500: Yappy_Intel_32bit_O3.exe: comp 29.9 MB/s uncomp 512.5 MB/s Yappy_Intel_32bit_Ox.exe: comp 33.1 MB/s uncomp 513.0 MB/s Yappy_Microsoft_32bit_Ox.exe: comp 32.3 MB/s uncomp 527.1 MB/s i7 2600K: Yappy_Intel_32bit_O3.exe: comp 52.9 MB/s uncomp 1362.2 MB/s Yappy_Intel_32bit_Ox.exe: comp 57.5 MB/s uncomp 1362.2 MB/s Yappy_Microsoft_32bit_Ox.exe: comp 54.8 MB/s uncomp 1385.9 MB/s Very interesting (it tells something important worth to be known) ratios change: 54.8:32.3 = 1.6 is highly different than 1385.9:527.1 = 2.6 or if you prefer 527.1:32.3 = 16.3 and 1385.9:54.8 = 25.2 In my view dummy math screams well here. |
I am heading out right now... when i get home I will run it on my stock thuban, tonight hopefully I will be hooking up an x79 system, although I have to finish up a dual channel ram kit before I move to quad channel to start the x79 review.
I know, im slow... EDIT: setting up download now since its going to take 6 minutes lol BTW, might want to clean up your site a bit.. dont know if you have a page limit or something on your host but I had to do a word search for Monstrous_Jesters.exe to find the download link. http://www.sanmayce.com/Downloads/Mo...revision_B.zip for anyone else looking for it. |
Just got a copy here and I'll do some runs with my 960T and Win 7 to see how it does. :cool3:
|
AMD 1090T, single threaded test so cores were hittign 3.6GHz, 1600 Mem 9-9-9 speed with 2400 CPUNB.
Took longer to clean up the TXT file than it did to run the test. also seems wierd that the more times it found a phrase the worse performance was. wnt from 2500/s for 6 hits up to 6000/s for 0 hits... but here you go Quote:
TEst2 YZ YAppy Quote:
Quote:
test4 lzmm Quote:
test5 quicksort Quote:
Quote:
|
Thanks Neuromancer.
>... might want to clean up your site a bit ... Yeah you are right, I piled up all kind of stuff in a mumbo-jumbo manner, but provided quick links/tags to easy the pain as the following being the home-page/tag of 'Monstrous Jesters' package: http://www.sanmayce.com/Downloads/index.html#Jesters Last night I updated rev. B with rev. C (adding ZPAQ as 7th test, and converting qpress.txt to CRLF). If you are interested here is the converter: // LF2CRLF.C written by Kaze #include <stdio.h> #define LF 10 #define CR 13 main(int argc, char **argv) { FILE *in; FILE *out; char buffer[1]; char PrevChar[1]; if (argc != 3) { printf("Usage: LF2CRLF infile outfile\n"); exit(13); } if ((in = fopen(argv[1], "rb")) == NULL) { printf("Can't open %s\n",argv[1]); exit(1); } if ((out = fopen(argv[2], "wb")) == NULL) { printf("Can't open %s\n",argv[2]); exit(2); } PrevChar[0]=0; while (fread(buffer, sizeof(char), 1, in) == 1) { if (buffer[0] == LF && PrevChar[0] != CR) fputc(CR, out); // Add a CR before the LF only if the previous char was not CR fputc(buffer[0], out); PrevChar[0]=buffer[0]; } } Thanks Bones. Glad glad I am for your readiness to help me. |
Thanks a lot Neuromancer, I regret that didn't say exactly how to gather results, you did a lot of editing but there is no need of any, sorry for misleading you.
Something wrong with the test qpress: Process Time = 0.483 = 339% which suggests 4 threads?! Is this AMD with 6cores or 4cores? AMD says that 1090T has 6cores. http://shop.amd.com/us/All/ModelsPer...henomiix6black You gave me some valuable information about AMD Phenom II X6 Black (45nm, 6 cores, 512KB L2 6144KB L3), it was a missing and needed test. I am still an AMD's fan despite their recent decline. Some quick notes: 1] Roughly speaking I have had some illusions about shining of Railgun_Quadruplet_7Hasherezade (using hashed approach), again the wonderful BNDM_64 eclipses the rest, I need the full dump in order to examine the exact behavior of all 4 functions through different patterns, though. >... also seems wierd that the more times it found a phrase the worse performance was ... The number of hits is not important but the length (and the TYPE mainly) of the phrase, this is the cause of my affection toward fine MEMMEM tuning - it needs careful analysis taking in account different string ranges/lengths. 2] Sadly for some reason (I am puzzled here) Yappy test shows bad news?! YAPPY: [b 256K] bytes 206908949 -> 95947973 46.4% comp 48.3 MB/s uncomp 1038.5 MB/s 1038.5 MB/s vs 1385.9 MB/s (on i7 2600K tested by rickss69), nah. 3] Kazuya_PTHREADed: DEFAULT_THREAD_COUNT: 6 Kazuya_PTHREADed: DEFAULT_COMPRESSION_LEVEL: 3 Kazuya_PTHREADed: DEFAULT_COMPRESS_CHUNK_SIZE: 524288 Kazuya_PTHREADed: Decompression RAM-to-RAM performance: 2525MB/s Sight for sore eyes, very pleasing indeed but I am awfully greedy I need 4400MB/s, why? That is why: One of nifty benefits from Lasse's light-fast Lempel-Ziv library is to boost the sequential external RAM reads (HDDs, SSDs). For example if you have 520MB/s burst read (SATA III SSD) then you need xMB/s in order to double the burst load/read into physical/main RAM. The calculation is simple: assume we have those 520MB/s then in order to traverse OSHO.TXT(197MB) it would take 197/520=0.378s, when running qpress: OSHO.TXT.qp(75MB) it would take 75/520 + 197/2525 = 0.222s or ((0.378-0.222)/0.222)*100% = 70.2% boosting. Now I want 2x520MB/s this requires 0.378s/2=0.189s or the above mentioned 75/520 + 197/x = 0.189 which equals x = 197/(0.189-(75/520))=4400MB/s, a dream soon to come true. And all this performed when using qpress (PTHREADed QuickLZ) in the dummy synchronous mode being slower than asynchronous. 4] Intel's memcpy(): Simplicius says for 'memcpy' performance: 2676 MB/s Microsoft's memcpy(): Simplicius says for 'memcpy' performance: 2782 MB/s The pancake is turned - on Intel CPUs first result (Intel compiler used) is better than the second (Microsoft compiler used). I don't know whether the forum allows it but the easiest way is to attach a ZIP file (of all resultant text files which are in your NOTEPAD) it is less than 64KB, or to email me this ZIP file to sanmayce@sanmayce.com, in future revisions (I want to gather results on some really overclocked monsters) my plan is to create a single HTML file (similar to the EVEREST's report) out of all (7 so far) resultant text files with a simple C written tool, in this way I will eliminate the torture you went through. |
1 Attachment(s)
it is a 6 core cpu.
I will rerun all tests and save the unedited txt files and ul them Reran tests and uploaded |
Thanks a lot.
Very glad that AMD is the first CPU to be added side-by-side with my T7500. However I am disappointed from far-from uncompromising performance shown by AMD Phenom II X6 'Thuban' 1090T 6-core Black Edition, as I saw at: http://www.futurelooks.com/the-amd-p...cessor-review/ "In a nutshell, it allows the CPU to dynamically overclock up to three of its own cores to provide extra performance. In the case of the 1090T pictured in the screenshot, we see that a couple of the cores have hit 3.6GHz, one is at 3.2GHz, which is the stock CPU speed, and the rest of them are clocked way down." As the task manager shows first two working on 3584MHz, the third at ????MHz, the fourth on 3255MHz and rest two under 2000MHz!!! This is not a desktop CPU at all, grrr. All-in-all I hate AMD's Turbo CORE, it is like throwing dices - not utilizing the full power due to temperature limits. As for INTEL’s Turbo Boost I don't like it either - not knowing what is going on due to dynamical resets is like selling you a car and saying "you don't need these high RPMs or torque because you cannot change gears as we do", not for me. I prefer Turbo Boost disabled during the tests. I was under the impression that BE (Black Edition) AMD CPUs were counterparts of X (Extreme) Intel CPUs. More useful would be a variant running all its cores at full speed - for extreme tests it is mandatory. |
I got x79 up and running, putting in the quad channel memory tonight so will run your bench again on that.
As for the turbo core, it is exactly the same way it runs on Intel. Most of the benches were bouncing around at 3.6 on my AMD stuff. It will hit 3.8 on the intel setup for single core, so far multithread testing = 3.5 across all cores. I normally disable Speed step on Intel since it gave a laggy feel in general usage. But having trouble disabling it without disabling turbo on this Gigabutt board. |
Thanks,
as for turbo CORE/BOOST AFAIK it is a complex internal tweak not only increasing CPU frequency but RAM timings and who-knows-what-else, my point is that I want to see how the CPU-RAM system responds to a particular test/program i.e. to gather stable results. In order to feel how fundamental is sorting (I still don't get why major benchmarks lack it) here comes my newest phrase-checking package 'Dumbino', made last night it is the first (free and open-source) English phrase-checker: http://www.sanmayce.com/Downloads/index.html#Dumbino In a few words: MJ test Quicksort helps one understand how different CPU-RAM systems would behave on a really heavy load, by heavy I mean my current corpus of four-word-phrases (879,557,846), the MJ Quicksort test sorts 206,908,943 - in 'Dumbino' package I gave 140,222,335 phrases (after ripping the Google-books US n-gram corpus 400GB in size). Now in order to phrase-check (spell-check uses 1-grams) an entire ebook consisted of 42,208 4-gram phrases Dumbino mixes them with those 140,222,335 and resorts them, thus all familiar and unfamiliar phrases pop-up in SUB-LINEAR time! @Neuromancer: When you have time (this year) I would like to hear your opinion on this subject (monstrous phrase-checking) which has been, is and will be in my sight for a long time. Wanna salute all with one of my favorite video-songs ever: P!nk - Funhouse, the pianist is so joyful and charming. |
Just wanted to throw a look at x79 and it is amazing it blows houses away:
Gigabyte X79 UD3, i7-3960X 4590MHz, Quad Channel at 1020MHz at 9-11-10-28 clocks: Sandra says for memory bandwidth 49GB/s, i.e. 4x12 (with limit 4x12.8), it simply silenced me. The info was taken from: http://www.ninjalane.com/reviews/mot...d3/page10.aspx |
To see how Gigabyte 990FXA UD5, AMD 1090T is positioned against the Gigabyte X79 UD3, i7-3960X from the above post:
Stock Phenom II X6 1090T 'Thuban': Core speed 3214MHz, Bus speed 200MHz, dual channel at 803MHz at 9-9-9-28 clocks Sandra says for Integer Memory Bandwidth: 12.54GB/s Overclocked Phenom II X6 1090T 'Thuban': Core speed 4125MHz, Bus speed 250MHz, dual channel at 1000MHz at 9-9-9-24 clocks Sandra says for Integer Memory Bandwidth: 19.28GB/s Stock i7 3960X 'Sandy Bridge-E': Core speed 3600MHz, Bus speed 100MHz, quad channel at 800MHz at 11-11-13-28 clocks Sandra says for Integer Memory Bandwidth: 39GB/s Overclocked i7 3960X 'Sandy Bridge-E': Core speed 4590MHz, Bus speed 127MHz, quad channel at 1020MHz at 9-11-10-28 clocks Sandra says for Integer Memory Bandwidth: 49GB/s :clapping: |
All times are GMT -10. The time now is 08:55 PM. |
Copyright ©2009 Overclockaholics.com