Did you know that the first 3Ghz Pentium CPU was introduced over 8 years ago? When you now look for a state-of-the-art processor, you’ll find the Intel Xeon “Nahelem” CPU, which runs only at 2.66Ghz. From the start of the Pentium processor (66Mhz, 1993) the clock speed doubled almost every 2 years. 10 year later we hit the 3Ghz barrier and since then we haven’t seen higher clock speeds anymore. So what happened since 2002? Well… Modern CPUs can do more in a cycle then the Pentium 4 could, but this is not the major speed improvement. The number of cores per CPU is the big difference. The current Nahelem processor is a quad-core processor, while the Pentium 4 was a processor with just a single core.
The hardware industry choose multi-core as a solution to increase performance. There are only two ways to benefit of multiple cores:
- You need to run multiple CPU intensive programs that can run in parallel.
- The application needs to take advantage of multiple cores.
Server applications typically handle more requests at the same time, so these are great for multi-core computers. Normal end-users often have only one CPU intensive application running. If the application isn’t written for a multi-core CPU, then the program is running at maximum performance, but only utilizes 50% (or less) of your CPU power. This is a waste of CPU power.
[h2]Why do programmers ignore multi-core[/h2]
Why do these programs don’t use these additional cores? Well… I think most programmers are not used to think about multicore programming, because multi-cores were only common in a server-based applications. Another issue is that multi-threading is difficult. Typical problems of multi-threaded applications are:
- Race conditions and deadlocks, because multiple threads access the same resources and are not synchronized correctly. These bugs are often very hard to reproduce and to track down.
- Threading limitations in libraries. Some libraries are not thread-safe or haven’t been tested well on multi-core systems.
- Single threaded GUI frameworks (Windows GUI applications can only access the UI on the main GUI thread), so you need to synchronize GUI actions. Note that this restriction also applies to Apple OS X applications.
- Additional coding effort to create and manage threads.
- Multi-threading has overhead too.
Programmers need to know about these issues and because of constant time pressure, it is easier to ignore multi-threading and perform all work on the main thread. Microsoft and Apple understand to get more power from their systems, they should address these issues. Solving the fundamental problems, such as deadlocks, race conditions, … cannot be solved. It was possible to make multithreading more programmer-friendly and to significantly reduce the overhead.
Apple introduced Grand Central Dispatch (GCD) in Snow Leopard (OS X v10.6) and Microsoft will introduce Parallel extensions for .NET as an integrated part of .NET 4 (which will be released at the end of the year). Both technologies are similar in theory.
Grand Central Dispatch
Grand Central Dispatch was introduced with Snow Leopard (OS X v10.6) in August 2009. Apple claims that only 11 CPU instructions are required to schedule a task for GCD. It is also pretty easy to use. I will provide an example of using GCD (taken from Wikipedia). This is the single-threaded example:
- (IBAction)analyseDocument:(NSButton *)sender {
NSDictionary *stats = [myDoc analyse];
[myModel setDict:stats];
[myStatsView setNeedsDisplay:YES];
[stats release];
}
Using GCD it will look like this:
- (IBAction)analyzeDocument:(NSButton *)sender
{
dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSDictionary *stats = [myDoc analyze];
dispatch_async(dispatch_get_main_queue(), ^{
[myModel setDict:stats];
[myStatsView setNeedsDisplay:YES];
[stats release];
});
});
}
The document will be analyzed by any available thread, but the GUI will be updated on the main thread (OS X GUI applications need to use the main thread for GUI interaction). As you can see, it’s pretty easy to dispatch blocks of code on another thread. Multi-threading is even more useful when you can use it in loops, like this:
for (i = 0; i < count; i++) {
results[i] = do_work(data, i);
}
total = summarize(results, count);
Which can be rewritten to use GCD like this:
dispatch_apply(count, dispatch_get_global_queue(0, 0), ^(size_t i) {
results[i] = do_work(data, i);
});
total = summarize(results, count);
Of course you must rely that all calls to do_work are complete independent of each other, but this code will probably execute up to 4 times faster on a quad-core system then the version without GCD.
.NET 4
Microsoft decided to introduce their technology in .NET 4 instead of the operating system itself. The drawback is that it is only available for .NET based applications, but this will be the majority of the future applications. The advantage is that Windows XP and Windows Vista will also benefit of this extension making it available to a broader audience.
I think the parallel extensions are more thought out than GCD, because they are more elegant to use. I think this applies to the entire .NET framework compared to Apple’s Objective C with Cocoa alternative. The parallel extensions can be used like this:
Parallel.For(0,count, delegate(int i) { results[i] = do_work(data, i); };
total = summarize(results, count);
As you can see, it looks very similar to the GCD approach. The major advantage of .NET 4 is that the parallel extensions are also included in LINQ, called P/LINQ. So the code listed above can also be rewritten as:
data.AsParallel().Select(t => do_work(t)).Count();
Adding AsParallel(). to an enumeration translates the enumeration in a parallel enumeration, which makes it very easy to use.
Conclusion
Using multiple threads is much more convenient and more lightweight to use in Snow Leopard and .NET 4. For complex operations, you will benefit from these technologies, so make sure you are prepared.