GCD 入门（二）多核性能

文章發布時間 2009年9月4日

作者 TommyWu

標籤

译文 · 原文： Friday Q&A 2009-09-04: Intro to Grand Central Dispatch, Part II: Multi-Core Performance · 作者 Mike Ash

原文：https://www.mikeash.com/pyblog/friday-qa-2009-09-04-intro-to-grand-central-dispatch-part-ii-multi-core-performance.html 发布：2009-09-04　作者：Mike Ash 译者：MiMo（mimo-v2.5-pro）；代码块保留英文原样

The request was rejected because it was considered high risk

从 GCD 中提取多核性能主要有两种方式：将单个任务或一组相关任务并行调度到某个全局队列（global queue）上，或者将多个不相关或松散关联的任务并行调度到多个自定义队列（custom queue）上。

全局队列
想象以下循环结构：

1
    for(id obj in array)
2
        [self doSomethingIntensiveWith:obj];

1
    dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    for(id obj in array)
3
        dispatch_async(queue, ^{
4
            [self doSomethingIntensiveWith:obj];
5
        });

当然，代码并不总是如此简洁。有时你会遇到这样的代码：它先操纵一个数组，然后必须使用结果进行一些工作。

1
    for(id obj in array)
2
        [self doSomethingIntensiveWith:obj];
3
    [self doSomethingWith:array];

解决此问题的一种方法是使用调度组（dispatch group）。调度组可以将多个代码块分组在一起，等待它们完成或在它们完成后收到通知。调度组通过 dispatch_group_create 创建，而 dispatch_group_async 函数允许将代码块提交到调度队列的同时将其添加到组中。这样我们可以使用 GCD 重写上述代码：

1
    dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    dispatch_group_t group = dispatch_group_create();
3
    for(id obj in array)
4
        dispatch_group_async(group, queue, ^{
5
            [self doSomethingIntensiveWith:obj];
6
        });
7
    dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
8
    dispatch_release(group);
9

10
    [self doSomethingWith:array];

1
    dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    dispatch_group_t group = dispatch_group_create();
3
    for(id obj in array)
4
        dispatch_group_async(group, queue, ^{
5
            [self doSomethingIntensiveWith:obj];
6
        });
7
    dispatch_group_notify(group, queue, ^{
8
        [self doSomethingWith:array];
9
    });
10
    dispatch_release(group);

对于同步场景，GCD 提供了一个便捷的快捷方式 ——dispatch_apply 函数。该函数会在并行环境中多次调用同一个代码块（block）并等待其全部完成，正如我们所需：

1
    dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    dispatch_apply([array count], queue, ^(size_t index){
3
        [self doSomethingIntensiveWith:[array objectAtIndex:index]];
4
    });
5
    [self doSomethingWith:array];

1
    dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    dispatch_async(queue, ^{
3
        dispatch_apply([array count], queue, ^(size_t index){
4
            [self doSomethingIntensiveWith:[array objectAtIndex:index]];
5
        });
6
        [self doSomethingWith:array];
7
    });

这种方法的关键在于识别那些对大量不同数据执行完全相同工作的代码。只要确保这些工作的执行方式是线程安全的（这超出了本文的讨论范围），你就可以用 GCD（大中央调度）调用替换循环，以实现并行化。

为了看到性能提升，你需要执行相当大量的计算工作。相比传统线程，GCD 更轻量且开销更低，但向队列提交 block（代码块）仍然有一定成本。block 必须被复制并入队，相应的工作者线程也需要被适当通知。为图像中的每个像素提交一个 block 可能不会带来收益。相反，在转换一组图像时为每张图像提交一个 block 通常能提升性能。GCD 收益递减的临界点介于两者之间。如有疑问，请通过实验验证。并行化应用程序是一种优化手段，因此你始终应该在修改前后进行测量，以确保改动确实有所帮助。（同时确认你改动的是正确的地方！）

子系统并行

上一节讨论了在应用程序的单个子系统中利用多核的优势。在多个子系统之间这样做也可能很有用。

例如，假设有一个应用程序打开了一个包含元数据的文档。文档数据本身必须被解析并转换为用于显示的模型对象（model objects），元数据也是如此。然而，文档数据和元数据并不交互。你可以为它们每一个创建一个派发队列（dispatch queue），然后并行运行。每部分数据解析的代码在其自身内部完全串行，且不存在线程安全（thread safety）问题（只要它们之间没有共享数据），但它们仍将并行运行。

文档打开后，程序需要执行任务以响应用户操作。例如，它可能需要执行拼写检查（spell checking）、语法高亮（syntax highlighting）、字数统计（word counting）、自动保存（autosave）以及其他类似操作。如果这些任务中的每一个都使用一个单独的派发队列来实现，那么它们相对于彼此都将并行运行，而不会有太多多线程编程的困难。

通过使用 dispatch sources（调度源）—— 这部分内容我将在下周介绍 —— 可以让 GCD 将事件直接派发到自定义调度队列。例如，程序中监听网络套接字的部分可以拥有专属的调度队列，从而实现与应用其他部分的并行运行。同样，通过使用自定义队列，该模块内部将保持串行执行，从而简化编程。

结论本周我们了解了如何利用 GCD 提升应用程序性能并充分发挥现代多核系统的优势。尽管编写并行程序时仍需谨慎，但 GCD 使得充分利用所有可用计算资源变得前所未有的便捷。

以上就是本周 Friday Q & A 的全部内容。下周请继续关注 GCD 系列文章的下一部分，届时我将介绍 dispatch sources——GCD 用于监测内外部事件的机制。一如既往，如果你有建议探讨的主题，欢迎在评论区留言或直接发送邮件告知我。

#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2009-09-04-intro-to-grand-central-dispatch-part-ii-multi-core-performance.html

Welcome back to Friday Q&A. Last week I discussed the basics of Grand Central Dispatch, an exciting new technology in Snow Leopard. This week I’m going to dive deeper into GCD and look at how you can use GCD to take advantage of multi-core processors to speed up computation. This post assumes that you’ve read last week’s edition, so be sure to do that if you haven’t already.

Concepts In order to take advantage of multiple CPU cores within a single process, it’s necessary to use multiple threads. (I’m ignoring multi-process concurrency, because it’s unrelated to GCD.) This is just as true in the GCD world as it is in the purely threaded world. At the low level, GCD global dispatch queues are just abstractions around a pool of worker threads. Blocks on those queues get dispatched onto the worker threads as they become available. Blocks submitted to custom queues end up going through global queues and into that same pool of worker threads. (Unless your custom queue is targeted at the main thread, but you would never do that for speed purposes!)

There are essentially two ways to extract multi-core performance out of GCD: by parallelizing a single task or a group of related tasks onto one of the global queues, and by parallelizing multiple unrelated or loosely related tasks onto multiple custom queues.

Global Queues Imagine the following loop:

1
    for(id obj in array)
2
        [self doSomethingIntensiveWith:obj];

1
    dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    for(id obj in array)
3
        dispatch_async(queue, ^{
4
            [self doSomethingIntensiveWith:obj];
5
        });

Of course code isn’t always this nice. Sometimes you have code which manipulates an array like this, but then has to perform some work with the result:

1
    for(id obj in array)
2
        [self doSomethingIntensiveWith:obj];
3
    [self doSomethingWith:array];

One way to solve this problem is by using dispatch groups. A dispatch group is a way to group together multiple blocks, and either wait for them to complete or be notified once they complete. They are created using dispatch_group_create, and the dispatch_group_async function allows submitting a block to a dispatch queue and also adding it to the group. We could then rewrite this code to use GCD like so:

1
    dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    dispatch_group_t group = dispatch_group_create();
3
    for(id obj in array)
4
        dispatch_group_async(group, queue, ^{
5
            [self doSomethingIntensiveWith:obj];
6
        });
7
    dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
8
    dispatch_release(group);
9

10
    [self doSomethingWith:array];

1
    dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    dispatch_group_t group = dispatch_group_create();
3
    for(id obj in array)
4
        dispatch_group_async(group, queue, ^{
5
            [self doSomethingIntensiveWith:obj];
6
        });
7
    dispatch_group_notify(group, queue, ^{
8
        [self doSomethingWith:array];
9
    });
10
    dispatch_release(group);

For the synchronous case, GCD provides a nice shortcut with the dispatch_apply function. This function calls a single block multiple times in parallel and waits for it to complete, just like what we wanted:

1
    dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    dispatch_apply([array count], queue, ^(size_t index){
3
        [self doSomethingIntensiveWith:[array objectAtIndex:index]];
4
    });
5
    [self doSomethingWith:array];

1
    dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
2
    dispatch_async(queue, ^{
3
        dispatch_apply([array count], queue, ^(size_t index){
4
            [self doSomethingIntensiveWith:[array objectAtIndex:index]];
5
        });
6
        [self doSomethingWith:array];
7
    });

The key to this approach is identifying code which is performing identical work on many different pieces of data at once. If you ensure that the work performed is done in a thread safe manner (beyond the scope of this post) then you can replace your loops with calls to GCD in order to achieve parallelism.

In order to see a performance gain, you need to be performing a fairly substantial amount of work. GCD is lightweight and low-overhead compared to threads, but it’s still somewhat costly to submit a block to a queue. The block has to be copied and enqueued, and the appropriate worker thread somehow notified. Submitting a block for every pixel in an image is probably not going to be a win. On the other hand, submitting a block for each image when converting a collection of images is probably going to be a win. The point where GCD ceases to be profitable falls somewhere in the middle. When in doubt, experiment. Parallelizing applications is an optimization, and as such you should always measure before and after to make sure that your changes helped. (And to make sure that you’re making the changes in the right place!)

Subsystem Parallelism The previous section talked about taking advantage of multiple cores in a single subsystem of your application. It can also be useful to do this across multiple subsystems.

For example, imagine an application which opens a document containing metadata. The document data itself must be parsed and converted into model objects for display, as must the metadata. However, the document data and the metadata don’t interact. You could create a dispatch queue for each one, then run both in parallel. The code for each piece of data parsing would be entirely serial within itself, and thread safety is not a concern (as long as you don’t have shared data between them), but they will still run in parallel.

Once the document is open, the program needs to perform tasks in response to user actions. For example, it may need to perform spell checking, syntax highlighting, word counting, autosave, and other such things. If each one of these tasks is implemented using a separate dispatch queue, they will all run in parallel with respect to each other without many of the difficulties of multithreaded programming.

By using dispatch sources, something I’ll cover next week, you can have GCD deliver events directly to a custom dispatch queue. A part of your program that monitors a network socket, for example, could be given its own dispatch queue which will then allow it to run in parallel with respect to the rest of the application. And again, by using a custom queue, this module will run serially with respect to itself, simplifying programming.

Conclusion This week we saw how to use GCD to increase the performance of your applications and take advantage of modern multi-core systems. Although care must still be taken when writing parallel applications, GCD makes it easier than ever to take advantage of all available computing power.

That wraps up this week’s Friday Q&A. Come back next week for the next part in the continuing series on GCD, when I will talk about dispatch sources, GCD’s mechanism for monitoring internal and external events. As always, if you have a suggestion for a topic to cover, please post it in the comments or e-mail it directly to me.