动手实现 @synchronized

文章發布時間 2015年2月20日

作者 TommyWu

標籤

译文 · 原文： Friday Q&A 2015-02-20: Let's Build @synchronized · 作者 Mike Ash

原文：https://www.mikeash.com/pyblog/friday-qa-2015-02-20-lets-build-synchronized.html 发布：2015-02-20　作者：Mike Ash 译者：MiMo（mimo-v2.5-pro）；代码块保留英文原样

回顾
@synchronized 是 Objective-C 中的一种控制构造（control construct）。它接受一个对象指针作为参数，后接一个代码块。该对象指针充当锁的角色，在任意时刻，仅允许一个线程进入使用该对象指针的 @synchronized 代码块。

这是在多线程编程中使用锁的一种更简单的方式。例如，你可能会使用一个 NSLock 来保护对 NSMutableArray 的访问：

1
    NSMutableArray *array;
2
    NSLock *arrayLock;
3

4
    [arrayLock lock];
5
    [array addObject: obj];
6
    [arrayLock unlock];

或者你可以使用 @synchronized 将数组本身作为锁：

1
    @synchronized(array) {
2
        [array addObject: obj];
3
    }

就我个人而言，我更倾向于使用显式锁（explicit lock），这既能让代码逻辑更清晰，也因为 @synchronized 的性能表现并不尽如人意 —— 具体原因我们稍后会探讨。不过它用起来确实方便，而且无论如何，探究它的实现都挺有意思。

实现原理

Swift 版本的 @synchronized 是一个接收对象和一个闭包（closure）的函数，它会在持有锁的状态下调用该闭包：

1
    func synchronized(obj: AnyObject, f: Void -> Void) {
2
        ...
3
    }

问题是，如何将一个任意对象转变为 lock（锁）？

在一个理想世界中（从实现这个函数的角度来看），每个对象都会预留一小段额外空间用于锁。synchronized（同步块）可以在这小段额外空间上使用适当的锁定和解锁函数。然而，这样的额外空间并不存在，这可能是幸运的，因为它会使系统中每个对象的内存大小膨胀，为了一个大多数对象永远不会遇到的功能。

替代方法是使用一个将对象映射到锁的表。synchronized 可以在该表中查找锁，并在那里进行锁定和解锁。这种方法的麻烦在于，表本身需要是 thread safe（线程安全的），这要么需要它自己的锁，要么需要某种花哨的 lockless data structure（无锁数据结构）。为表单独设置一个锁要容易得多。

为了防止锁无限积累，表需要跟踪锁的使用情况，并在不再需要时销毁或重用锁。

实现对于将对象映射到锁的这张表来说，NSMapTable 非常适合。它可以配置为使用原始对象地址作为键（key），并且可以对键和值（value）都持有弱引用（weak references），这使得系统能够自动回收不再使用的锁。这样设置起来就恰到好处了：

1
    let locksTable = NSMapTable.weakToWeakObjectsMapTable()

这些对象将是 NSRecursiveLock 的实例。因为它是一个类，所以能很好地与 NSMapTable 配合使用，不像 pthread_mutex_t 这样的类型。@synchronized 提供递归语义，而此处的实现具有相同效果。

表本身也需要一个锁。自旋锁（spinlock）在此很适用，因为对表的访问将是短暂的：

1
    var locksTableLock = OS_SPINLOCK_INIT

表格就位后，我们便可以实现该函数：

1
    func synchronized(obj: AnyObject, f: Void -> Void) {

它首先要做的事是在 locksTable 中查找与 obj 对应的锁。这个操作必须在持有 locksTableLock 的情况下进行：

1
        OSSpinLockLock(&locksTableLock)
2
        var lock = locksTable.objectForKey(obj) as! NSRecursiveLock?

如果表中没有条目，则创建一个新 lock（锁）并设置它：

1
        if lock == nil {
2
            lock = NSRecursiveLock()
3
            locksTable.setObject(lock!, forKey: obj)
4
        }

掌握了所需的锁之后，就可以释放主表锁（master table lock）。这一步必须在调用 f 之前完成，以避免潜在的死锁（deadlock）：

1
        OSSpinLockUnlock(&locksTableLock)

现在我们可以调用 f，在调用前后对 lock 进行加锁和解锁：

1
        lock!.lock()
2
        f()
3
        lock!.unlock()
4
    }

与苹果实现的比较

苹果对 @synchronized 的实现作为 Objective-C Runtime 源码的一部分可供查阅。这部分具体代码位于： http://www.opensource.apple.com/source/objc4/objc4-646/runtime/objc-sync.mm

与上述为简化而设计的玩具实现不同，苹果的版本是为了追求运行速度而构建的。比较两者的相同与不同之处颇为有趣。

核心概念是相同的。都存在一个将对象指针映射到锁的全局表，并在 @synchronized 块的周围执行加锁和解锁操作。

对于底层的锁对象，苹果版本使用配置为递归锁的 pthread_mutex_t（POSIX 线程互斥锁）。由于 NSRecursiveLock 本身很可能就是用 pthread_mutex_t 实现的，这样做绕开了中间层，并避免了 Runtime 对 Foundation 库的依赖。

该表本身采用链表而非哈希表实现。由于通常情况下同一时刻仅存在少量锁，这种结构仍能保持良好性能，甚至可能优于哈希表 —— 哈希表的性能优势主要体现在较大数据集场景。通过引入每线程缓存（per-thread cache）来存储当前线程近期查询过的锁，性能得到进一步提升。

苹果并未使用单一的全局表，而是在数组中维护了 16 个表。对象根据其地址映射到不同的表中。这种设计减少了操作不同对象的 @synchronized 块之间不必要的竞争，因为它们很可能使用不同的全局表。

苹果的实现并未采用开销较大的弱指针（weak pointers），而是在每个锁旁维护一个内部引用计数（internal reference count）。当引用计数归零时，该锁即可被新对象复用。未使用的锁不会被销毁，但复用机制确保锁的总量始终受限于任意时刻的最大活跃锁数量，而非随着新对象的持续使用无限增长。

结论

Apple 的实现对其功能而言既智能又高效，但与使用单独的显式锁相比，它仍然会产生一些不可避免的额外开销。具体包括：

如果不相关的对象恰好被分配到同一个全局表（global table）中，它们仍可能面临竞争。
在常见情况下（即锁不存在于线程缓存中时），查找锁必须获取并释放一个自旋锁（spinlock）。
必须进行额外的工作，在全局表中为对象查找对应的锁。
每个锁定 / 解锁周期都会因递归语义（recursive semantics）而产生开销，即使实际并不需要递归。

然而，这些问题或多或少是 @synchronized 本身固有的，其实现当然不能因此受到指责。这是一段非常值得通读的优秀代码。

@synchronized 是一种有趣的语言构造，但在实现上存在一些挑战。从根本上说，它提供线程安全，但其自身实现也需要同步才能保证安全。使用隐藏在后台的全局锁来保护对锁表的访问，解决了这一矛盾。Apple 实现中的巧妙技巧使其速度相当快。

今天的分享就到这里。下次再会时将有更多趣事。周五问答系列由读者建议驱动，所以如果您有任何希望被探讨的想法，请不吝赐教！

#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2015-02-20-lets-build-synchronized.html

Continuing the theme of thread safety from the previous article, today I’m going to explore an implementation of Objective-C’s @synchronized facility in the latest edition of Let’s Build. I’m going to build it in Swift, although an equivalent Objective-C version would be much the same.

Recap@synchronized is a control construct in Objective-C. It takes an object pointer as a parameter and is followed by a block of code. The object pointer acts as a lock, and only one thread is permitted within a @synchronized block with that object pointer at any given time.

It’s a simpler way of using locks for multithreaded programming. For example, you might use an NSLock to protect access to an NSMutableArray:

1
    NSMutableArray *array;
2
    NSLock *arrayLock;
3

4
    [arrayLock lock];
5
    [array addObject: obj];
6
    [arrayLock unlock];

Or you can use @synchronized to use the array itself as the lock:

1
    @synchronized(array) {
2
        [array addObject: obj];
3
    }

I personally prefer an explicit lock, both to make it clearer what’s going on, and because @synchronized doesn’t perform quite as well for reasons we’ll see below. However, it can be convenient, and it’s interesting to build regardless.

Implementation TheoryThe Swift version of @synchronized is a function that takes an object and a closure, and invokes the closure with the lock held:

1
    func synchronized(obj: AnyObject, f: Void -> Void) {
2
        ...
3
    }

The question is, how do you turn an arbitrary object into a lock?

In an ideal world (from the perspective of implementing this function), every object would have a little extra space set aside for a lock. synchronized could then use the appropriate lock and unlock functions on that little extra space. However, no such extra space exists, which is probably fortunate because it would bloat the memory size of every object on the system for a feature that most of them will never encounter.

The alternative is to use a table that maps an object to a lock. synchronized can then look up the lock in the table, and lock and unlock it there. The trouble with this approach is that the table itself needs to be thread safe, which either requires its own lock or some sort of fancy lockless data structure. A separate lock for the table is by far easier.

To prevent locks from building up forever, the table needs to track lock usage and destroy or reuse locks when they’re no longer needed.

ImplementationFor the table that maps objects to locks, NSMapTable fits the bill perfectly. It can be configured to use raw object addresses as its keys, and it can hold weak references to both keys and values which allows the system to automatically reclaim unused locks. This sets it up appropriately:

1
    let locksTable = NSMapTable.weakToWeakObjectsMapTable()

The objects will be instances of NSRecursiveLock. Because it’s a class, it works well with NSMapTable, as opposed to something like pthread_mutex_t. @synchronized provides recursive semantics and this does the same.

The table itself also needs a lock. A spinlock works well here, as accesses to the table will be brief:

1
    var locksTableLock = OS_SPINLOCK_INIT

With the table in place, we can implement the function:

1
    func synchronized(obj: AnyObject, f: Void -> Void) {

The first thing it does is look up the lock corresponding to obj in locksTable. This must be done with locksTableLock held:

1
        OSSpinLockLock(&locksTableLock)
2
        var lock = locksTable.objectForKey(obj) as! NSRecursiveLock?

If there’s no entry in the table, create a new lock and set it:

1
        if lock == nil {
2
            lock = NSRecursiveLock()
3
            locksTable.setObject(lock!, forKey: obj)
4
        }

With the lock in hand, the master table lock can be released. This must be done before invoking f in order to avoid a potential deadlock:

1
        OSSpinLockUnlock(&locksTableLock)

Now we can invoke f, locking and unlocking lock around the invocation:

1
        lock!.lock()
2
        f()
3
        lock!.unlock()
4
    }

Comparison With Apple’s ImplementationApple’s implementation of @synchronized is available as part of the Objective-C runtime source distribution. This specific bit is available here:

http://www.opensource.apple.com/source/objc4/objc4-646/runtime/objc-sync.mm

It’s build for speed rather than simplicity as the above toy implementation is. It’s interesting to see what it does the same and what it does differently.

The basic concept is the same. There’s a global table that maps object pointers to locks, and the lock is then locked and unlocked around the @synchronized block.

For the underlying lock object, Apple’s version uses pthread_mutex_t configured as a recursive lock. Since NSRecursiveLock is likely implemented using pthread_mutex_t anyway, this cuts out the middleman, and avoids a dependency on Foundation in the runtime.

The table itself is implemented as a linked list rather than a hash table. Since the common case is that only a few locks exist at any given time, this will still perform well, and probably performs better than a hash table, since the performance advantage of hash tables comes with larger data sets. Performance is further improved with a per-thread cache that saves locks that were recently looked up on the current thread.

Instead of a single global table, there are 16 tables kept in an array. Objects are mapped to different tables depending on their address. This reduces unnecessary contention between @synchronized blocks operating on different objects, since they will likely use different global tables.

Instead of using weak pointers, which incur substantial additional overhead, Apple’s implementation instead keeps an internal reference count alongside each lock. When the reference count reaches zero, the lock is available for reuse with a new object. Unused locks are not destroyed, but reuse means that the total number of locks is limited to the maximum number of active locks at any given time, rather than growing without bound as new objects are used.

Apple’s implementation is intelligent and fast for what it does, but it still incurs some unavoidable extra overhead compared to using a separate, explicit lock. In particular:

Unrelated objects can still be subject to contention if they happen to be assigned to the same global table.
A spinlock must be acquired and released when looking up the lock in the common case where it doesn’t exist in the per-thread cache.
Additional work must be done to look up the appropriate lock for the object in the global table.
Each lock/unlock cycle incurs overhead for recursive semantics even when it’s not required.

However, these problems are more or less inherent to what @synchronized does, and the implementation certainly can’t be faulted for it. It’s a great piece of code that’s well worth reading through.

Conclusion@synchronized is an interesting language construct with some implementation challenges. Fundamentally, it provides thread safety, but the implementation itself requires synchronization to be safe. Using a global lock behind the scenes to protect access to the lock table solves this dilemma. Clever tricks in Apple’s implementation allow it to be reasonably fast.

That’s it for today. Come back next time for more amusing whatnot. Friday Q&A is driven by reader suggestions, so if you have an idea you’d like to see covered, please send it in!