It was recently revealed by ZD Net that Apple’s new Mac OS X release, dubbed Snow Leopard, would default to a 32-bit kernel despite being largely portrayed by Apple as the final step in the Mac’s journey to being a fully 64-bit OS. The reactions, as with anything Apple-related, were sheer polemic. Just check out the comments on the ZD Net article, and you’ll see what I mean. The Apple apologists played it off as if 64-bit code is pointless in the kernel, despite being indispensable in applications. The Microsoft partisans acted as if Apple had just halved the speed of the entire OS.
So, what’s the truth? I ran a few quick benchmarks to find out. To isolate the effects of the kernel from the benchmark software itself, I used a 32 bit benchmark program, XBench, so that the only thing that would be changing between the two runs was the kernel. (My understanding of Mac internals is not great, so I hope I wasn’t making a poor assumption here.) The results were interesting. As one might expect, neither side is entirely right or wrong.
The biggest difference was in memory allocation, where the difference was almost a factor of two. The next biggest difference was in the thread benchmarks, where the 64-bit kernel had a roughly 30% improvement in time. Finally, the 64-bit kernel had over a 10% improvement in large block disk transfer speed. These results seem plausible, as all involve tasks where the kernel plays a relatively large role. The rest of the benchmarks, mainly graphics and computation, had little or no improvement, as one would also expect.
So, it seems that while it’s true Apple isn’t doing a terrible thing by defaulting to the 32-bit kernel, it’s certainly also the case that you’re leaving some speed on the table. This is especially true for the disk transfer benchmarks, which can have a real effect on the perceived responsiveness of the computer.
The complete results for my are below. The test computer was a 2.53 GHz Mid-2009 MacBook Pro.
Results 127.31 CPU Test 180.05 GCD Loop 285.50 15.05 Mops/sec Floating Point Basic 145.63 3.46 Gflop/sec vecLib FFT 120.72 3.98 Gflop/sec Floating Point Library 280.68 48.88 Mops/sec Thread Test 331.22 Computation 500.00 10.13 Mops/sec, 4 threads Lock Contention 247.63 10.65 Mlocks/sec, 4 threads Memory Test 200.62 System 255.76 Allocate 618.01 2.27 Malloc/sec Fill 185.89 9038.61 MB/sec Copy 211.31 4364.56 MB/sec Stream 165.04 Copy 157.53 3253.68 MB/sec Scale 155.20 3206.42 MB/sec Add 175.03 3728.48 MB/sec Triad 174.48 3732.49 MB/sec Quartz Graphics Test 190.82 OpenGL Graphics Test 86.25 User Interface Test 245.26 Disk Test 48.75 Sequential 101.56 Uncached Write 120.97 74.27 MB/sec [4K blocks] Uncached Write 119.43 67.57 MB/sec [256K blocks] Uncached Read 64.63 18.91 MB/sec [4K blocks] Uncached Read 137.50 69.11 MB/sec [256K blocks] Random 32.07 Uncached Write 11.66 1.23 MB/sec [4K blocks] Uncached Write 77.18 24.71 MB/sec [256K blocks] Uncached Read 59.85 0.42 MB/sec [4K blocks] Uncached Read 107.70 19.98 MB/sec [256K blocks]
Results 122.67 CPU Test 179.50 GCD Loop 295.89 15.60 Mops/sec Floating Point Basic 141.66 3.37 Gflop/sec vecLib FFT 120.19 3.97 Gflop/sec Floating Point Library 283.69 49.40 Mops/sec Thread Test 260.69 Computation 396.28 8.03 Mops/sec, 4 threads Lock Contention 194.23 8.36 Mlocks/sec, 4 threads Memory Test 190.01 System 234.38 Allocate 369.54 1.36 Malloc/sec Fill 186.29 9057.73 MB/sec Copy 211.60 4370.52 MB/sec Stream 159.77 Copy 153.34 3167.08 MB/sec Scale 150.01 3099.22 MB/sec Add 169.51 3610.92 MB/sec Triad 168.11 3596.34 MB/sec Quartz Graphics Test 187.39 OpenGL Graphics Test 87.04 User Interface Test 237.42 Disk Test 46.82 Sequential 90.51 Uncached Write 118.31 72.64 MB/sec [4K blocks] Uncached Write 79.22 44.82 MB/sec [256K blocks] Uncached Read 60.05 17.57 MB/sec [4K blocks] Uncached Read 154.69 77.75 MB/sec [256K blocks] Random 31.58 Uncached Write 11.29 1.20 MB/sec [4K blocks] Uncached Write 76.69 24.55 MB/sec [256K blocks] Uncached Read 60.83 0.43 MB/sec [4K blocks] Uncached Read 116.08 21.54 MB/sec [256K blocks]