Objective-C 消息发送机制

Mike Ash Friday Q&A 中文译文:Objective-C 消息发送机制

作者 TommyWu
封面圖片: Objective-C 消息发送机制

译文 · 原文: Friday Q&A 2009-03-20: Objective-C Messaging · 作者 Mike Ash

原文:https://www.mikeash.com/pyblog/friday-qa-2009-03-20-objective-c-messaging.html 发布:2009-03-20 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样


欢迎回到周五问答。本周,我想采纳 Joshua Pennington 的建议,对上周关于 Objective-C Runtime 话题中的一个特定方面 —— 消息发送(messaging)—— 进行深入探讨。消息发送是如何运作的?它到底做了什么?请继续阅读!

定义 在开始讨论机制之前,我们需要先定义我们的术语。例如,很多人对一个 “方法” 与一个 “消息” 到底是什么并不十分清楚,但这对于理解底层消息系统的工作方式至关重要。

  • 方法(Method):一段与类关联的实际代码,并被赋予了一个特定的名称。示例:- (int)meaning { return 42; }

  • 消息(Message):一个发送给对象的名称和一组参数。示例:向对象 0x12345678 发送 “meaning” 且不带任何参数。

  • 选择子(Selector):一种表示消息或方法名称的特殊方式,表示为 SEL 类型。选择子本质上只是受管理的不透明字符串,以便可以使用简单的指针相等性比较来加快速度。(其内部实现可能不同,但从外部来看本质如此。)示例:@selector(meaning)

  • 消息发送(Message send):接收消息、查找并执行相应方法的过程。

方法 接下来需要讨论的是:在机器层面,方法究竟代表什么。根据定义,它是一段被命名并与特定类关联的代码,但它最终在应用程序二进制文件中实际生成了什么?

方法最终会被生成为直接的 C 函数,并附带几个额外参数。你可能知道 self 会被作为隐式参数传递,实际上它最终成为一个显式参数。鲜为人知的隐式参数 _cmd(它保存着正在发送的消息的选择子)是第二个此类隐式参数。像这样编写一个方法:

- (int)foo:(NSString *)str { ...
int SomeClass_method_foo_(SomeClass *self, SEL _cmd, NSString *str) { ...

那么,当我们写出如下代码时,会发生什么呢?

int result = [obj foo:@"hello"];
int result = ((int (*)(id, SEL, NSString *))objc_msgSend)(obj, @selector(foo:), @"hello");

等号后面那段看似冗长的代码,其作用是将 Objective-C 运行时(runtime)中定义的 objc_msgSend 函数进行类型转换。具体来说,它将这个函数从返回 id(对象类型)、接收 idSEL(选择子)及后续可变参数的签名,强制转换为与被调用方法原型相匹配的函数类型。

换言之,编译器生成的代码会调用 objc_msgSend,但参数和返回值的约定会与具体的方法相匹配。

消息发送 代码中的消息发送最终会转变为对 objc_msgSend 的调用。那么这个函数做了什么?其高层逻辑应该相当明显:既然这是唯一的函数调用,它必须找到合适的方法实现(method implementation)然后去调用它。调用本身很简单:只需跳转到对应的地址即可。但它是如何查找的呢?

Objective-C 头文件 runtime.h 中包含了 objc_class 结构体的成员定义(该结构体现在已不透明,属于历史遗留):(译注:objc_class 结构体在现代系统中可能已发生重大变化,其成员可能不再直接暴露。)

struct objc_method_list **methodLists OBJC2_UNAVAILABLE;
struct objc_method_list {
struct objc_method_list *obsolete OBJC2_UNAVAILABLE;
int method_count OBJC2_UNAVAILABLE;
#ifdef __LP64__
int space OBJC2_UNAVAILABLE;
#endif
/* variable length structure */
struct objc_method method_list[1] OBJC2_UNAVAILABLE;
} OBJC2_UNAVAILABLE;
struct objc_method {
SEL method_name OBJC2_UNAVAILABLE;
char *method_types OBJC2_UNAVAILABLE;
IMP method_imp OBJC2_UNAVAILABLE;
} OBJC2_UNAVAILABLE;

因此,尽管我们不应直接操作这些结构体(无需担心,操作它们的所有功能都已通过头文件中其他位置的函数提供),但我们仍能一窥 Runtime 对方法(method)的定义。它包含一个名称(以 selector(选择子) 的形式存在)、一个描述参数 / 返回类型编码的字符串(欲了解更多可查阅 @encode 指令),以及一个 IMP—— 它本质上就是个函数指针:

typedef id (*IMP)(id, SEL, ...);

这里还有一个细节需要考虑。上述流程理论上可行,但实际执行会非常缓慢。在 x86 架构上,objc_msgSend 仅需大约十几个 CPU 周期(CPU cycles)即可完成执行,这说明它不可能每次调用都经历如此冗长的流程。线索在于另一个 objc_class 成员:

struct objc_cache *cache OBJC2_UNAVAILABLE;
struct objc_cache {
unsigned int mask /* total = mask + 1 */ OBJC2_UNAVAILABLE;
unsigned int occupied OBJC2_UNAVAILABLE;
Method buckets[1] OBJC2_UNAVAILABLE;
}

(实际上还有一个细节极其重要:当给定 selector(选择子)找不到对应方法时会发生什么。但这个问题太重要了,值得单独写一篇文章探讨,所以我们下周再讨论。)

结语 以上就是本周的内容。下周再见,敬请期待更多分享。有问题吗?觉得 Objective-C 的消息发送系统应该换种方式实现?欢迎在下方留言。

请记住,周五问答专栏由各位的想法驱动。如果你有主题建议,请告诉我!可以在评论区留言,或直接发送邮件给我(除非你特别说明,我会使用你的署名)。


#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2009-03-20-objective-c-messaging.html

Welcome back to another Friday Q&A. This week I’d like to take Joshua Pennington’s idea and elaborate on a particular facet last week’s topic of the Objective-C runtime, namely messaging. How does messaging work, and what exactly does it do? Read on!

Definitions Before we get started on the mechanisms, we need to define our terms. A lot of people are kind of unclear on exactly what a “method” is versus a “message”, for example, but this is critically important for understanding how the messaging system works at the low level.

  • Method: an actual piece of code associated with a class, and which is given a particular name. Example: - (int)meaning { return 42; }

  • Message: a name and a set of parameters sent to an object. Example: sending “meaning” and no parameters to object 0x12345678.

  • Selector: a particular way of representing the name of a message or method, represented as the type SEL. Selectors are essentially just opaque strings that are managed so that simple pointer equality can be used to compare them, to allow for extra speed. (The implementation may be different, but that’s essentially how they look on the outside.) Example: @selector(meaning).

  • Message send: the process of taking a message and finding and executing the appropriate method.

Methods The next thing that we need to discuss is what exactly a method is at the machine level. From the definition, it’s a piece of code given a name and associated with a particular class, but what does it actually end up creating in your application binary?

Methods end up being generated as straight C functions, with a couple of extra parameters. You probably know that self is passed as an implicit parameter, which ends up being an explicit parameter. The lesser-known implicit parameter _cmd (which holds the selector of the message being sent) is a second such implicit parameter. Writing a method like this:

- (int)foo:(NSString *)str { ...
int SomeClass_method_foo_(SomeClass *self, SEL _cmd, NSString *str) { ...

What, then, happens when we write some code like this?

int result = [obj foo:@"hello"];
int result = ((int (*)(id, SEL, NSString *))objc_msgSend)(obj, @selector(foo:), @"hello");

What that ridiculous piece of code after the equals sign does is take the objc_msgSend function, defined as part of the Objective-C runtime, and cast it to a different type. Specifically, it casts it from a function that returns id and takes id, SEL, and variable arguments after that to a function that matches the prototype of the method being invoked.

To put it another way, the compiler generates code that calls objc_msgSend but with parameter and return value conventions matched to the method in question.

Messaging A message send in code turns into a call to objc_msgSend, so what does that do? The high-level answer should be fairly apparent. Since that’s the only function call present, it must look up the appropriate method implementation and then call it. Calling is easy: it just needs to jump to the appropriate address. But how does it look it up?

The Objective-C header runtime.h includes this as part of the (now opaque, legacy) objc_class structure members:

struct objc_method_list **methodLists OBJC2_UNAVAILABLE;
struct objc_method_list {
struct objc_method_list *obsolete OBJC2_UNAVAILABLE;
int method_count OBJC2_UNAVAILABLE;
#ifdef __LP64__
int space OBJC2_UNAVAILABLE;
#endif
/* variable length structure */
struct objc_method method_list[1] OBJC2_UNAVAILABLE;
} OBJC2_UNAVAILABLE;
struct objc_method {
SEL method_name OBJC2_UNAVAILABLE;
char *method_types OBJC2_UNAVAILABLE;
IMP method_imp OBJC2_UNAVAILABLE;
} OBJC2_UNAVAILABLE;

So even though we’re not supposed to touch these structs (don’t worry, all the functionality for manipulating them is provided through functions in elsewhere in the header), we can still see what the runtime considers a method to be. It’s a name (in the form of a selector), a string containing argument/return types (look up the @encode directive for more information about this one), and an IMP, which is just a function pointer:

typedef id (*IMP)(id, SEL, ...);

One more detail needs to be considered here. The above procedure would work but it would be extremely slow. objc_msgSend only takes about a dozen CPU cycles to execute on the x86 architecture, which makes it clear that it’s not going through this lengthy procedure every single time you call it. The clue to this is another objc_class member:

struct objc_cache *cache OBJC2_UNAVAILABLE;
struct objc_cache {
unsigned int mask /* total = mask + 1 */ OBJC2_UNAVAILABLE;
unsigned int occupied OBJC2_UNAVAILABLE;
Method buckets[1] OBJC2_UNAVAILABLE;
}

(There is actually one more detail beyond this which ends up being extremely important: what happens when no method can be found for a given selector. But that one is so important that it deserves its own post, so look for it next week.)

Conclusion That wraps up this week’s edition. Come back next week for more. Have a question? Think Objective-C’s messaging system should be done differently? Post below.

Remember, Friday Q&A is powered by your ideas. If you have an idea for a topic, tell me! Post your idea in the comments, or e-mail them directly to me (I’ll use your name unless you ask me not to).