译文 · 原文: Friday Q&A 2014-05-09: When an Autorelease Isn't · 作者 Mike Ash
原文:https://www.mikeash.com/pyblog/friday-qa-2014-05-09-when-an-autorelease-isnt.html 发布:2014-05-09 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样
欢迎回到另一期 Friday Q & A。我为文章发布期间的未通知中断道歉。这不是由于任何有趣的事情,只是时间短缺。Friday Q & A 将继续,我将继续努力保持每两周一次的定期发布。对于今天的文章,我有一个关于一个 autorelease(自动释放)调用没有按预期工作的小故事。
The SetupARC 是一个可爱的技术,但它并不能覆盖所有情况。有时你需要使用 CoreFoundation(核心基础框架)对象,你就回到了手动内存管理的世界。通常,这不是问题。我做了很多年的手动内存管理,虽然我很享受在 ARC(自动引用计数)下不用手动管理,但我仍然记得如何做。然而,ARC 使一些事情比以前更困难。特别是,有时你想 autorelease 一个 CoreFoundation 对象。没有 ARC 时,你可能会写类似这样的代码:
CFDictionaryRef MakeDictionary(void) { CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL); // Put some stuff in the dictionary here perhaps
[(id)dict autorelease]; return dict; }这提供了良好的内存管理语义,调用者不需要负责释放返回值,就像我们习惯的大多数 Cocoa 方法一样。它利用了所有 CoreFoundation 对象也是 Objective-C 对象的事实,而 autorelease(自动释放)是一种平衡 CoreFoundation Create 调用的方式。
这段代码在 ARC(自动引用计数)下不再工作,因为不允许调用 autorelease。为了解决这个问题,苹果为我们提供了一个 CFAutorelease 函数,它做同样的事情,并且可以在 ARC 下使用。不幸的是,它只在 iOS 7 和 Mac OS X 10.9 及以上版本可用。对于那些需要支持旧版操作系统的人来说,我们必须临时想办法。
我的解决方案是使用 sel_getUid runtime(运行时)调用来获取 autorelease 的 selector(选择子),这可以绕过 ARC 的规则。然后我会将这个 selector 发送给 CoreFoundation 对象,从而实现与 [(id) dict autorelease] 相同的效果。我的代码如下:
CFDictionaryRef MakeDictionary(void) { CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL); // Put some stuff in the dictionary here perhaps
SEL autorelease = sel_getUid("autorelease"); IMP imp = class_getMethodImplementation(object_getClass((__bridge id)dict), autorelease); ((id (*)(CFTypeRef, SEL))imp)(dict, autorelease);
return dict; }注意:如果你是因为需要实现此功能的代码而来到这里,请勿使用此代码。正如我们很快将看到的,这段代码存在问题。如果你需要能正常工作的代码,请查阅文章末尾。
我测试了这段代码,一切运行正常。稍后,项目组的另一位程序员报告说代码在他那里持续崩溃。幸运的是,我无需费太多周折就复现了他的崩溃现象。然而,要弄清楚究竟发生了什么还是花了一些时间。
崩溃 这段代码本身不会崩溃。但它会导致后续代码发生崩溃。例如:
CFDictionaryRef dict = MakeDictionary(); NSLog(@"Testing."); NSLog(@"%@", dict);这在第二行 NSLog 处崩溃了。堆栈跟踪看起来像是一个典型的内存管理崩溃:
frame #0: 0x00007fff917980a3 libobjc.A.dylib`objc_msgSend + 35 frame #1: 0x00007fff97175184 Foundation`_NSDescriptionWithLocaleFunc + 41 frame #2: 0x00007fff9077bd94 CoreFoundation`__CFStringAppendFormatCore + 7332 frame #3: 0x00007fff907aa313 CoreFoundation`_CFStringCreateWithFormatAndArgumentsAux + 115 frame #4: 0x00007fff907e1b9b CoreFoundation`_CFLogvEx + 123 frame #5: 0x00007fff9719ed0c Foundation`NSLogv + 79 frame #6: 0x00007fff9719ec98 Foundation`NSLog + 148看起来这个字典对象在 NSLog 调用之前就被销毁了。但这怎么可能呢?我们在函数中调用了 autorelease(自动释放),而自动释放池(autorelease pool)还没有被排空。用于平衡 CoreFoundation Create 调用的 release 也尚未发生,所以该对象应该仍然存在。
在以各种方式调试代码之后,我决定查看编译器生成的汇编代码。我写的代码本身并不复杂,因此任何问题必然隐藏在更深层的地方。
这是有问题的 MakeDictionary 函数对应的 x86-64 汇编输出:
_MakeDictionary: ## @MakeDictionary .cfi_startproc Lfunc_begin0: .loc 1 11 0 ## test.m:11:0 ## BB#0: pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp subq $32, %rsp movabsq $0, %rax .loc 1 12 0 prologue_end ## test.m:12:0 Ltmp5: movq %rax, %rdi movq %rax, %rsi movq %rax, %rdx movq %rax, %rcx callq _CFDictionaryCreateMutable leaq L_.str(%rip), %rdi movq %rax, -8(%rbp) .loc 1 15 0 ## test.m:15:0 callq _sel_getUid movq %rax, -16(%rbp) .loc 1 16 0 ## test.m:16:0 movq -8(%rbp), %rax movq %rax, %rdi callq _object_getClass movq -16(%rbp), %rsi movq %rax, %rdi callq _class_getMethodImplementation movq %rax, -24(%rbp) .loc 1 17 0 ## test.m:17:0 movq -24(%rbp), %rax movq -8(%rbp), %rcx movq -16(%rbp), %rsi movq %rcx, %rdi callq *%rax movq %rax, %rdi callq _objc_retainAutoreleasedReturnValue movq %rax, %rdi callq _objc_release .loc 1 19 0 ## test.m:19:0 movq -8(%rbp), %rax addq $32, %rsp popq %rbp ret这部分相当直接。由于没有进行实际的计算,我们只需查看 callq 指令的序列,就能看到调用了哪些函数。它调用了 CFDictionaryCreateMutable、sel_getUid、object_getClass、class_getMethodImplementation,然后通过一个函数指针进行了一次间接调用 —— 这才是真正执行 autorelease(自动释放)调用的地方。接着,ARC 介入,通过对该调用返回值执行保留(retain)然后立即释放(release)的操作,进行了一些无用但无害的工作。最后,该函数将字典返回给调用者。
大部分无害
我花了一点时间才明白发生了什么,但随后就恍然大悟了。我之前说过,ARC 插入的调用是 “无用但无害的”。事实上,它们绝非如此!
ARC 带来的有趣特性之一,就是对自动释放返回值的快速处理。这种模式在 ARC 下非常普遍:
// callee obj = [[SomeClass alloc] init]; [obj setup]; return [obj autorelease];
// caller obj = [[self method] retain]; [obj doStuff]; [obj release];人类程序员通常会在调用方省略 retain 和 release 调用,但 ARC(自动引用计数)更加保守。这会让使用 ARC 时稍微变慢,这正是快速自动释放处理的用武之地。
在 Objective-C 运行时实现自动释放的代码中,存在一些极其精巧且令人费解的部分。在实际发送 autorelease 消息之前,它会先检查调用方的代码。如果发现调用方将立即调用 objc_retainAutoreleasedReturnValue,它会完全跳过消息发送。它实际上根本不执行自动释放。相反,它只是将对象存放在一个已知的位置,以此作为信号表明它根本没有发送过 autorelease。
objc_retainAutoreleasedReturnValue 在此方案中配合工作。在调用 retain 之前,它会先检查那个已知位置。如果其中包含正确的对象,它就会跳过 retain 调用。最终结果就是,上述代码实际上被有效地转换为这样:
// callee obj = [[SomeClass alloc] init]; [obj setup]; return obj;
// caller obj = [self method]; [obj doStuff]; [obj release];这种方式之所以更快,是因为它完全跳过了自动释放池(autorelease pool),从而节省了三次消息发送(message sending)及其伴随的工作:autorelease、调用方的 retain,以及最终由自动释放池发送的 release。它还允许对象更早被销毁,从而降低了内存和缓存压力。
这个技巧的美妙之处在于,由于运行时(runtime)在进行此优化前会检查调用方的代码,因此它与不参与此机制的代码完全兼容。如果调用方对返回值执行了其他操作,那么运行时就会简单地调用 autorelease,一切照常运行。
我之前说过这段代码并非毫无意义。那么,汇编中紧随 retain 的 release 其意义何在?它使得调用方即使不使用返回值也能参与此机制。省略掉它们固然是正确的,但在那种情况下,就失去了快速 autorelease 路径的优势。最终的结果是,在常见情况下,执行这两次额外调用反而更快。
我也说过这段代码并非无害的。这里的危害正是那个快速自动释放路径。对于 ARC(Automatic Reference Counting,自动引用计数)来说,函数或方法中的自动释放操作后跟调用者的保留操作,只是一种传递所有权的方式。然而,这段代码中发生的情况并非如此。这段代码试图无论如何都将对象放入自动释放池中。ARC 的巧妙优化最终绕过了这一尝试,结果导致字典被立即销毁,而不是被放入自动释放池中等待稍后销毁。根本原因在于发起自动释放调用时使用的函数指针类型转换。
((id (*)(CFTypeRef, SEL))imp)(dict, autorelease);我这样写是因为这就是它的类型。autorelease 方法返回 id 并接受两个(通常隐式的)参数:self 和正在发送的 selector(选择子)。我为了方便将 self 参数从 id 改为 CFTypeRef,但保留了返回类型为 id,因为它在底层的 autorelease 方法中确实是这个类型。这应该无关紧要,因为返回值无论如何都会被忽略。
正是这个返回类型导致了这段代码的失败。我在大部分地方都小心地避免了 ARC(自动引用计数)的干预,但这个 id 让 ARC 介入并开始插入调用,导致字典立即被销毁。
修复方法
一旦知晓了所有这些,修复就很简单。通过让调用返回 CFTypeRef 而非 id 来使 ARC 不参与其中。以下是应用修复后的完整函数:
CFDictionaryRef MakeDictionary(void) { CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL); // Put some stuff in the dictionary here perhaps
SEL autorelease = sel_getUid("autorelease"); IMP imp = class_getMethodImplementation(object_getClass((__bridge id)dict), autorelease); ((CFTypeRef (*)(CFTypeRef, SEL))imp)(dict, autorelease);
return dict; }反汇编显示 ARC 现在已不再涉及其中:
_MakeDictionary: ## @MakeDictionary .cfi_startproc Lfunc_begin0: .loc 1 11 0 ## test.m:11:0 ## BB#0: pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp subq $32, %rsp movabsq $0, %rax .loc 1 12 0 prologue_end ## test.m:12:0 Ltmp5: movq %rax, %rdi movq %rax, %rsi movq %rax, %rdx movq %rax, %rcx callq _CFDictionaryCreateMutable leaq L_.str(%rip), %rdi movq %rax, -8(%rbp) .loc 1 15 0 ## test.m:15:0 callq _sel_getUid movq %rax, -16(%rbp) .loc 1 16 0 ## test.m:16:0 movq -8(%rbp), %rax movq %rax, %rdi callq _object_getClass movq -16(%rbp), %rsi movq %rax, %rdi callq _class_getMethodImplementation movq %rax, -24(%rbp) .loc 1 17 0 ## test.m:17:0 movq -24(%rbp), %rax movq -8(%rbp), %rcx movq -16(%rbp), %rsi movq %rcx, %rdi callq *%rax .loc 1 19 0 ## test.m:19:0 movq -8(%rbp), %rcx movq %rax, -32(%rbp) ## 8-byte Spill movq %rcx, %rax addq $32, %rsp popq %rbp ret架构差异有一个问题依然存在:为什么这段代码最初对我有效,而我的同事后来才发现了崩溃?
一旦了解了其他所有细节,答案其实很简单。这是一个 iOS 项目。我在模拟器中测试了代码,而他是在真实的 iPhone 上测试的。执行快速 autorelease(自动释放)检查的运行时函数叫做 callerAcceptsFastAutorelease。由于它检查的是机器代码,因此它是架构相关的。如果你查看用于 32 位 iOS 模拟器的版本,问题就显而易见了:(译注:此处提及的 32 位架构与 iOS 模拟器实现细节可能随系统版本更新而变化。)
# elif __i386__ && TARGET_IPHONE_SIMULATOR
static bool callerAcceptsFastAutorelease(const void *ra) { return false; }简单来说,32 位 iOS 模拟器没有实现快速 autorelease(自动释放)处理。这完全可以理解 —— 要实现并修复它需要相当多的工作量。与此同时,Mac 程序的 i386 架构已不再支持 ARC(自动引用计数),因此在 i386 上触发此路径的唯一方法就是在模拟器中运行。专门为仅适用于模拟器应用的极端优化投入精力并没有实际意义。
附注 在撰写本文之前,我首先编写了一个小测试用例,以便能够轻松地进行实验和独立排查问题。然而,这里出现了一个大问题:测试用例没有起作用!更准确地说,它运行得很好,但拒绝崩溃。代码非常简单,大致如下:
int main(int argc, char **argv) { @autoreleasepool { CFDictionaryRef dict = MakeDictionary(); NSLog(@"Testing."); NSLog(@"%@", dict); } return 0; }这里几乎没有出错的余地,所以它为何不会崩溃实在令人费解。
经过多次在调试器中单步执行汇编代码后,我意识到这与 dyld 的延迟绑定(lazy binding)机制有关。程序最初加载时,对外部函数的引用并不会完全绑定。相反,系统会生成一个存根(stub),其中包含足够的信息以便在首次调用时完成绑定。第一次调用外部函数时,会查找该函数的地址,然后重写存根使其指向该地址,接着才执行函数调用。后续的调用则直接转到该函数。通过延迟绑定,可以改善程序启动时间,并且不会浪费时间去查找那些从未被调用的函数。
这意味着,在这段代码第一次运行时,对 objc_retainAutoreleasedReturnValue 的调用并未完全绑定。因为尚未完全绑定,callerAcceptsFastAutorelease 无法识别出这次调用指向 objc_retainAutoreleasedReturnValue。由于它没有看到对 objc_retainAutoreleasedReturnValue 的调用,就不会使用快速自动释放路径(fast autorelease path)。字典就会如最初预期的那样进入自动释放池(autorelease pool),于是代码就能工作…… 但只有一次。
一旦我弄明白这点,通过插入循环来强制崩溃就变得很简单了:
int main(int argc, char **argv) { while(1) @autoreleasepool { CFDictionaryRef dict = MakeDictionary(); NSLog(@"Testing."); NSLog(@"%@", dict); } return 0; }循环在第二次迭代时可靠地崩溃。第一次通过时触发了 objc_retainAutoreleasedReturnValue 的懒加载绑定(lazy binding),这使得下一次调用能够采用快速自动释放(autorelease)路径并触发该 bug。
这对正常程序影响很小,因为正常程序会早期执行像这样的函数的懒加载绑定。然而,它最终成为一个小型测试程序的严重复杂因素。
结论:自动引用计数(ARC)是一项伟大的技术,但有时有必要绕过它。当绕过它时,你必须确保你真正绕过了它,不给它任何介入的机会。如果你不这样做,它可能会决定消除看起来无用的自动释放(autorelease)调用,导致你的对象被瞬间销毁,而不是平静地返回给调用者。
人们有时问我是否真正会在实际工作中使用博客里讨论的那些冷僻技术。这次就是个绝佳案例:要定位这个 bug 需要具备基础汇编语言阅读能力、Objective-C 运行时(Runtime)内部原理知识,以及对特定 ARC 调用的理解。构建示例崩溃程序更进一步要求理解 dyld 如何在运行时绑定外部函数引用。这些都是值得掌握的精华知识,即使你永远不会用到,单纯了解它们也充满乐趣。
今天的内容就到这里。希望能恢复正常更新节奏,敬请期待下一篇文章。与此同时,周五问答一如既往地由读者建议驱动 —— 如果你有希望探讨的主题,请务必发送给我们!
Original (English)
Source: https://www.mikeash.com/pyblog/friday-qa-2014-05-09-when-an-autorelease-isnt.html
Welcome back to another Friday Q&A. I apologize for the unannounced hiatus in posts. It’s not due to anything interesting, just a shortage of time. Friday Q&A will continue, and I will continue to aim for my regular biweekly postings. For today’s article, I have a little story about an autorelease call that didn’t do what it was supposed to do.
The SetupARC is a lovely technology but it doesn’t cover everything. Sometimes you need to use CoreFoundation objects and you’re back in the world of manual memory management.
Normally, that’s no problem. I did manual memory management for many years, and while I enjoy not doing it with ARC, I still remember how. However, ARC makes some things a bit more difficult than they used to be. In particular, sometimes you want to autorelease a CoreFoundation object. Without ARC, you might write something like this:
CFDictionaryRef MakeDictionary(void) { CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL); // Put some stuff in the dictionary here perhaps
[(id)dict autorelease]; return dict; }This gives you nice memory management semantics, where the caller is not responsible for releasing the return value, just like we’re used to with most Cocoa methods. It takes advantage of the fact that all CoreFoundation objects are also Objective-C objects, and an autorelease is a way to balance a CoreFoundation Create call.
This code no longer works with ARC, because the call to autorelease is not permitted. To solve this, Apple helpfully provided us with a CFAutorelease function which does the same thing and can be used with ARC. Unfortunately, it’s only available as of iOS 7 and Mac OS X 10.9. For those of us who need to support older OS releases, we have to improvise.
My solution was to get the selector for autorelease using the sel_getUid runtime call, which sneaks past ARC’s rules. Then I’d send that selector to the CoreFoundation object, thus accomplishing the same thing as [(id)dict autorelease]. My code looked like this:
CFDictionaryRef MakeDictionary(void) { CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL); // Put some stuff in the dictionary here perhaps
SEL autorelease = sel_getUid("autorelease"); IMP imp = class_getMethodImplementation(object_getClass((__bridge id)dict), autorelease); ((id (*)(CFTypeRef, SEL))imp)(dict, autorelease);
return dict; }Note: if you ended up here because you need code to accomplish this, do not use this code. As we will see shortly, it’s broken. If you want working code for this, check the end of the article.
I tested this code and everything worked fine. A little later, another programmer on the project reported that it was consistently crashing for him. Fortunately, I was able to replicate his crash without too much difficulty. However, it took a while to figure out just what was going on.
The CrashThis code does not crash itself. However, it can cause a crash in subsequent code. For example:
CFDictionaryRef dict = MakeDictionary(); NSLog(@"Testing."); NSLog(@"%@", dict);This crashes on the second NSLog line. The stack trace looks like a typical memory management crash:
frame #0: 0x00007fff917980a3 libobjc.A.dylib`objc_msgSend + 35 frame #1: 0x00007fff97175184 Foundation`_NSDescriptionWithLocaleFunc + 41 frame #2: 0x00007fff9077bd94 CoreFoundation`__CFStringAppendFormatCore + 7332 frame #3: 0x00007fff907aa313 CoreFoundation`_CFStringCreateWithFormatAndArgumentsAux + 115 frame #4: 0x00007fff907e1b9b CoreFoundation`_CFLogvEx + 123 frame #5: 0x00007fff9719ed0c Foundation`NSLogv + 79 frame #6: 0x00007fff9719ec98 Foundation`NSLog + 148It seems that the dictionary is being destroyed before the NSLog call. But how can that be? We called autorelease in the function, and the autorelease pool has not yet been drained. The release that will balance the CoreFoundation Create call hasn’t happened yet, so the object should still exist.
The AssemblyAfter poking at the code in various ways, I decided to read the assembly code generated by the compiler. There wasn’t much to the code I wrote, so whatever problem there was must have been deeper.
Here’s the x86-64 assembly output for the broken MakeDictionary function:
_MakeDictionary: ## @MakeDictionary .cfi_startproc Lfunc_begin0: .loc 1 11 0 ## test.m:11:0 ## BB#0: pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp subq $32, %rsp movabsq $0, %rax .loc 1 12 0 prologue_end ## test.m:12:0 Ltmp5: movq %rax, %rdi movq %rax, %rsi movq %rax, %rdx movq %rax, %rcx callq _CFDictionaryCreateMutable leaq L_.str(%rip), %rdi movq %rax, -8(%rbp) .loc 1 15 0 ## test.m:15:0 callq _sel_getUid movq %rax, -16(%rbp) .loc 1 16 0 ## test.m:16:0 movq -8(%rbp), %rax movq %rax, %rdi callq _object_getClass movq -16(%rbp), %rsi movq %rax, %rdi callq _class_getMethodImplementation movq %rax, -24(%rbp) .loc 1 17 0 ## test.m:17:0 movq -24(%rbp), %rax movq -8(%rbp), %rcx movq -16(%rbp), %rsi movq %rcx, %rdi callq *%rax movq %rax, %rdi callq _objc_retainAutoreleasedReturnValue movq %rax, %rdi callq _objc_release .loc 1 19 0 ## test.m:19:0 movq -8(%rbp), %rax addq $32, %rsp popq %rbp retPretty straightforward here. Since no real calculations are done, we can just look at the sequence of callq instructions to see what functions are called. It calls CFDictionaryCreateMutable, sel_getUid, object_getClass, class_getMethodImplementation, and then there’s an indirect call through the function pointer which is where it actually makes the autorelease call. ARC then hops in and does some pointless but harmless work on the return value from the call by retaining it and then immediately releasing it. The function then returns the dictionary to the caller.
Mostly HarmlessIt took me a little while to realize what was going on, but then it was obvious. I said that the ARC calls inserted are “pointless but harmless.” In fact, they are anything but!
One of the interesting features that came with ARC is fast handling of autoreleased return values. This sort of pattern is extremely common with ARC:
// callee obj = [[SomeClass alloc] init]; [obj setup]; return [obj autorelease];
// caller obj = [[self method] retain]; [obj doStuff]; [obj release];A human programmer would typically omit the retain and release calls in the caller, but ARC is more paranoid. This would make things a bit slower when using ARC, which is where the fast autorelease handling comes in.
There is some extremely fancy and mind-bending code in the Objective-C runtime’s implementation of autorelease. Before actually sending an autorelease message, it first inspects the caller’s code. If it sees that the caller is going to immediately call objc_retainAutoreleasedReturnValue, it completely skips the message send. It doesn’t actually do an autorelease at all. Instead, it just stashes the object in a known location, which signals that it hasn’t sent autorelease at all.
objc_retainAutoreleasedReturnValue cooperates in this scheme. Before calling retain, it first checks that known location. If it contains the right object, it skips the retain. The net result is that the above code is effectively transformed into this:
// callee obj = [[SomeClass alloc] init]; [obj setup]; return obj;
// caller obj = [self method]; [obj doStuff]; [obj release];This is faster because it skips the autorelease pool entirely, saving three message sends and the accompanying work: autorelease, the caller’s retain, and the eventual release sent by the autorelease pool. It also allows the object to be destroyed earlier, reducing memory and cache pressure.
The beautiful thing about this technique is that because the runtime checks the caller’s code before making this optimization, everything is perfectly compatible with code that doesn’t participate in the scheme. If the caller does something else with the return value, then the runtime simply calls autorelease and everything works normally.
I said that this code is not pointless. What, then, is the point of the retain immediately followed by release in the assembly above? It allows the caller to participate in this scheme even though it’s not using the return value. It would be correct to simply omit them, but in that case, the fast autorelease path is lost. It ends up being faster to make these two extra calls, at least in the common case.
I also said that this code is not harmless. The harm here is exactly that fast autorelease path. To ARC, an autorelease in a function or method followed by a retain in the caller is just a way to pass ownership around. However, that’s not what’s going on in this code. This code is attempting to actually put the object into the autorelease pool no matter what. ARC’s clever optimization ends up bypassing that attempt and as a result, the dictionary is immediately destroyed instead of being placed in the autorelease pool for later destruction.
Root CauseIt all comes down to the function pointer cast used when making the autorelease call:
((id (*)(CFTypeRef, SEL))imp)(dict, autorelease);I wrote it like this because that’s what the type is. The autorelease method returns id and takes two (normally implicit) parameters: self and the selector being sent. I changed the self parameter to CFTypeRef instead of id for convenience, but left the return type as id since that’s what it really is in the underlying autorelease method. It shouldn’t matter, since the return value is ignored anyway.
That return type is this code’s downfall. I was careful to avoid ARC’s meddling for the most part, but that id makes ARC come in and start inserting calls, and that causes the dictionary to be immediately destroyed.
The FixOnce all of this is known, the fix is easy. Get ARC out of the picture by having the call return CFTypeRef instead of id. Here’s the complete function with the fix:
CFDictionaryRef MakeDictionary(void) { CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL); // Put some stuff in the dictionary here perhaps
SEL autorelease = sel_getUid("autorelease"); IMP imp = class_getMethodImplementation(object_getClass((__bridge id)dict), autorelease); ((CFTypeRef (*)(CFTypeRef, SEL))imp)(dict, autorelease);
return dict; }Dumping the assembly shows that ARC is now out of the picture:
_MakeDictionary: ## @MakeDictionary .cfi_startproc Lfunc_begin0: .loc 1 11 0 ## test.m:11:0 ## BB#0: pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp subq $32, %rsp movabsq $0, %rax .loc 1 12 0 prologue_end ## test.m:12:0 Ltmp5: movq %rax, %rdi movq %rax, %rsi movq %rax, %rdx movq %rax, %rcx callq _CFDictionaryCreateMutable leaq L_.str(%rip), %rdi movq %rax, -8(%rbp) .loc 1 15 0 ## test.m:15:0 callq _sel_getUid movq %rax, -16(%rbp) .loc 1 16 0 ## test.m:16:0 movq -8(%rbp), %rax movq %rax, %rdi callq _object_getClass movq -16(%rbp), %rsi movq %rax, %rdi callq _class_getMethodImplementation movq %rax, -24(%rbp) .loc 1 17 0 ## test.m:17:0 movq -24(%rbp), %rax movq -8(%rbp), %rcx movq -16(%rbp), %rsi movq %rcx, %rdi callq *%rax .loc 1 19 0 ## test.m:19:0 movq -8(%rbp), %rcx movq %rax, -32(%rbp) ## 8-byte Spill movq %rcx, %rax addq $32, %rsp popq %rbp retArchitecturesOne question remains: why did this code work for me initially, and my colleage only uncovered the crash later?
The answer is actually pretty simple, once everything else is known. This is an iOS project. I tested the code in the simulator, while he tried it on a real iPhone. The runtime function that performs the fast autorelease check is called callerAcceptsFastAutorelease. It’s architecture-specific since it’s inspecting machine code. If you look at the version used in the 32-bit iOS simulator, the problem becomes apparent:
# elif __i386__ && TARGET_IPHONE_SIMULATOR
static bool callerAcceptsFastAutorelease(const void *ra) { return false; }In short, the fast autorelease handling is not implemented for the 32-bit iOS simulator. It makes sense that it wouldn’t be. It’s going to be some non-trivial amount of effort to implement and fix. Meanwhile, ARC is not supported on i386 for Mac programs, so the only way to hit this path on i386 is to run in the simulator. There’s no real point in putting effort into extreme optimizations that will only apply to simulator apps.
AsideBefore writing this article, I first wrote a small test case so I could easily experiment and examine the problem in isolation. However, there was a big problem: the test case didn’t work! Or rather, it did work just fine, and refused to crash. The code was really simple, roughly:
int main(int argc, char **argv) { @autoreleasepool { CFDictionaryRef dict = MakeDictionary(); NSLog(@"Testing."); NSLog(@"%@", dict); } return 0; }There isn’t much room for error there, so it was baffling why it wouldn’t crash.
After many single-steps through assembly in the debugger, I realized that it had to do with dyld lazy binding. References to external functions aren’t fully bound when a program is initially loaded. Instead, a stub is generated which has enough information to complete the binding the first time the call is made. On the first call to an external function, the address for that function is looked up, the stub is rewritten to point to it, and then the function call is made. Subsequent calls go directly to the function. By binding lazily, program startup time is improved and time isn’t wasted looking up functions that are never called.
That means that on the very first run of this code, the call to objc_retainAutoreleasedReturnValue isn’t fully bound. Because it’s not fully bound, callerAcceptsFastAutorelease doesn’t realize that the call is to objc_retainAutoreleasedReturnValue. Because it doesn’t see the call to objc_retainAutoreleasedReturnValue, the fast autorelease path isn’t used. The dictionary goes into the autorelease pool as was originally intended, and the code works… once.
Once I figured that out, it was trivial to force the crash by inserting a loop:
int main(int argc, char **argv) { while(1) @autoreleasepool { CFDictionaryRef dict = MakeDictionary(); NSLog(@"Testing."); NSLog(@"%@", dict); } return 0; }The loop reliably crashes on the second iteration. The first time through triggers lazy binding of objc_retainAutoreleasedReturnValue, which then allows the next call to take the fast autorelease path and trigger the bug.
This has little consequence for normal programs, which will perform the lazy binding for functions like these early on. It ended up being a severe complicating factor for a small test program, though.
ConclusionARC is great technology, but sometimes it’s necessary to work around it. When working around it, you have to be sure you really work around it, and not give it any opportunity to jump in. If you do, it might decide to eliminate what looks like a useless autorelease call, causing your objects to be instantaneously destroyed instead of being peacefully returned to the caller.
People sometimes ask me if I actually use the crazy and esoteric stuff I discuss on this blog. This is a good example: it took basic assembly language reading, Objective-C runtime internals, and understanding of specific ARC calls to track down this bug. Building the example crasher further required understanding how dyld binds external function references at runtime. This is all great stuff to know, and even if you never use it, it’s just plain fun.
That’s it for today. I hope to be back on track, so check back soon for another article. In the meantime, as always, Friday Q&A is driven by reader suggestions, so if you have a topic that you’d like to see covered, please send it in!