动手实现 Tagged Pointer

文章發布時間 2012年7月27日

作者 TommyWu

標籤

译文 · 原文： Friday Q&A 2012-07-27: Let's Build Tagged Pointers · 作者 Mike Ash

原文：https://www.mikeash.com/pyblog/friday-qa-2012-07-27-lets-build-tagged-pointers.html 发布：2012-07-27　作者：Mike Ash 译者：MiMo（mimo-v2.5-pro）；代码块保留英文原样

上次周五问答中，我探讨了 NSNumber 类的假设性实现。从 Mac OS X 10.7 和 iOS 5 开始，NSNumber 采用了一种名为 tagged pointers（标签指针）的新运行时机制来提升速度并减少内存占用，今天我想深入探究其内部工作原理。

指针对齐理论

众所周知，指针本质上只是被用作内存地址的整数。在底层，持有对象指针的变量实际上只是存储着类似0x7f84a41000c0这样的整数值。这个值的指针特性完全取决于程序的使用方式。在 C 语言中，我们可以通过简单的类型转换提取指针的整数值：

1
    void *somePointer = ...;
2
    uintptr_t pointerIntegerValue = (uintptr_t)somePointer;

（uintptr_t 类型是标准 C 中定义的一种整数类型，其大小足以容纳一个指针。这很实用，因为指针大小会因平台而异。）

多数计算机架构都存在指针对齐（pointer alignment）的概念。这意味着指向特定数据类型的指针应当是某个 2 的幂次方的整数倍。例如，在大多数架构中，指向 4 字节整数的指针地址应是 4 的倍数。违反对齐限制可能导致性能损耗（在某些平台上，这种尝试会引发硬件异常，并由操作系统模拟处理），而在某些对性能敏感的架构上，甚至可能直接崩溃。诸如内存原子读写（atomic reads and writes）等操作也要求正确的对齐。简言之，指针对齐非常重要，应当予以遵守。

若声明一个局部变量，编译器可确保其对齐正确：

1
    void f(void)
2
    {
3
        int x;
4
        // x is aligned properly, because
5
        // the compiler has full control
6
        // over its location
7
    }

然而，对于动态分配的内存而言，情况就不那么简单了：

1
    int *ptr = malloc(sizeof(*ptr));
2
    // is ptr aligned properly?

malloc 并不知道程序将要存储什么类型的数据。它收到一个请求分配 4 字节内存（假设 int 为 4 字节），但无法判断这 4 字节将用于存储一个 int、一个指针（在 32 位平台上）、两个 short、四个 char，还是其他类型。

为了确保正确对齐，malloc 采取了一种保守的方式，返回的指针会对齐到足够大的边界，从而保证无论存储何种数据类型都能满足正确的对齐要求。在 Mac OS X 上，malloc 总是返回对齐到 16 字节边界的指针。

由于对齐要求，任何对齐后的指针都会存在未使用的位（译注：指针低位可用于存储标志位或其他元数据）。一个 16 字节对齐指针的十六进制表示形如：

1
    0x-------0

最后一个十六进制数字总是 0。虽然存在不遵循此规律的有效指针（例如 char * 就没有对齐限制），但对象指针末尾总会包含一些零位。

Tagged Pointer 理论
有些对象包含的数据量极小。NSNumber 就是典型的例子：根据存储的数字类型不同，其实际数据可能仅占一、二、四或八字节。若为这些数据分配独立的对象，涉及内存分配与释放，且每次访问都需通过指针间接寻址，这种做法相当浪费。

如果我们利用指针末尾那些未使用的低位来标识” 这不是真正的指针” 呢？如果我们进而将对象数据直接内联存储在这个非指针结构中，而非分配在单独的内存区域呢？如此实现的结果便是 Tagged Pointer（标签指针）。

使用标记指针（tagged pointers）的系统需要进行额外检查。每当使用对象指针时，系统会检查其低位（low bit）。若该位为零，则表示这是真实的对象指针，按正常方式处理。若低位为一，则它不可能是真实指针，因为未正确对齐。此时，该指针将受到特殊处理。通常，对象类型编码在指针的低位中，对象数据则存放在高位（high bits）。一个正常指针的形态如下：

1
    ....0000
2
        ^ all zeroes at the end

而 tagged pointer（标记指针）看起来是这样的：

1
    ....xxx1
2
        ^ type encoded in bottom bits

在此基础上还有变体。例如，标记指针（tagged pointer）可能设置最低的四位中的任意一位。通过利用额外的架构特性，你还可以拥有更多标记位（tag bits）。例如，x86_64 实际上只使用指针的 48 位，这样在对齐带来的 4 位基础上，顶部还剩下 16 位可用。利用这点，你可以做些疯狂的事情，比如将指针塞进浮点数 NaN 值（Not a Number）里。

无论出于何种原因，Objective-C 对标记指针的实现是：在标记指针中，最低位始终设置为 1，其他三位则用于指示该标记指针的类别（class）。我不确定他们为何选择这样实现，但我猜测这是速度（可能只检查最低位更容易，而这必须在 objc_msgSend 的快速路径中完成）和编程简便性之间某种结合的结果。

Tagged Pointer 的用途（译注：原文标题为 “Tagged Pointer Uses”）标记指针常出现在 “万物皆对象” 的语言中。当数字 3 是一个对象，而像 3 + 4 这样的表达式涉及两个对象并创建第三个对象时，对象的创建和值的提取对性能就变得至关重要了。

我们通常将对象视为一个独立的内存分配，通过指针进行引用。在 tagged pointers（标签指针）出现之前，Objective-C 中一直如此，许多其他语言也是如此。然而，内存分配和管理会产生显著开销。当计算 3 + 4 时，为结果 7 分配内存以及在无人使用时最终处置它的开销，远超过实际执行加法运算的成本。

仅通过指针引用一切也会因访问数据本身而产生开销。为了加载对象 3 的值，CPU 需要先加载指向该对象的指针值，然后加载存储在对象自身内的原始值 3。由于内存访问速度较慢，这种额外的访问可能代价高昂。

通过使用 tagged pointers，对于能装入标签指针内的数据，这些开销可以被消除。小整数是极佳的候选对象，因为它们容易装入且往往被频繁使用。整数 3 在正常情况下会这样表示：

1
    0000 0000 0000 0000 0000 0000 0000 0011
2
                                         ^ binary 3

使用标签指针（tagged pointer）后，它看起来会像这样：

1
    0000 0000 0000 0000 0000 0000 0011 1011
2
                                    ^  ^  ^ tag bit
3
                                    |  |
4
                                    |  tagged pointer class (5)
5
                                    |
6
                                    binary 3

实际值被向左平移若干位，并正确设置 tag 位。在此例中，我假设整数类的 tag 位标识值为 5，但这纯属虚构，实际含义由系统自行解析 tag 位中的每个值。

细心的读者会注意到，tagged pointer 中预留的存储空间小于普通整数。使用 4 个 bit 存储 tag 后，32 位系统剩余 28 位用于存储数值，64 位系统剩余 60 位。超过 28 位或 60 位的整数无法存入 tagged pointer，必须作为常规对象存储。

当所有数据都能容纳在 tagged pointer 中时，就无需单独分配内存。这意味着没有内存分配或销毁的开销。由于所有数据都直接存在指针中，也无需额外的内存加载操作。虽然提取数值” 3” 需要一些位移和掩码操作，但这点额外开销微不足道 ——CPU 非常擅长这类操作，其速度远快于访问独立内存块。

除了速度优势外，这还减少了内存使用，因为不需要分配单独的内存块来保存这些整数。对于像 3 + 4 这样的单一表达式来说，这微不足道，但在使用大量整数的代码中，这可以节省可观的内存。

凭借利用标记位（tag bits）区分多种带标记的指针（tagged pointer）类别能力，存储范围远不止整数。浮点数也可以存储在带标记的指针中。甚至像短字符串这样的非常规数据也能存储。（一个 64 位指针除了标记位外，还可以容纳八个 ASCII 字符。）一个仅包含单个（非带标记的）对象指针的单元素数组，也可以用一个带标记的指针来保存。任何体积小且使用频繁的类，都是采用带标记指针的理想候选者。

虽然 Objective-C 拥有原始整数类型，不会为每个数学表达式都使用对象，但数值对象在许多 Objective-C 代码中依然常见。每当我们将数字放入数组或字典时，都必须使用NSNumber将它们包装成对象；每当我们要编码或解码属性列表或 JSON 时，也必须让所有数字经过NSNumber处理。由于NSNumber使用频繁，并且主要用于存储符合标记指针（tagged pointer）要求的数值，因此它是应用标记指针技术的理想候选者，而这也正是 Cocoa 的做法。标记指针也被用于合适的NSDate对象。该技术仍有充足的空间可扩展至其他用途。

标记指针实践 理论知识足够了，现在来编写一些代码。

你应该还记得我上次编写的NSNumber仿制品MANumber。现在我将为MANumber添加标记指针支持。

请注意，tagged pointers（标签指针）是非常、非常、非常私有的，在任何情况下都不能用于任何真实的代码。标签指针类的数量极其有限。由于只有三个位可用于指示类别，因此最多只能使用八个标签类。如果你的标签类与框架中的某个类冲突，游戏就结束了。此外，由于标签指针实现的细节没有通过任何公共 API 暴露，它们可能会在没有任何通知的情况下发生变化。标签位的含义或数量可能会改变，或者如果认为不值得麻烦，整个功能都可能被取消。

然而，通过使用适当的私有 API，即使我们永远无法安全地使用它们，也可以对标签指针进行实验。

特别是，私有函数 _objc_insert_tagged_isa 允许将一个类与特定的标签关联起来。这是它的原型：

1
    void _objc_insert_tagged_isa(unsigned char slotNumber, Class isa);

你需要提供一个插槽编号（tag）和一个类，运行时（runtime）便会设置对应的表项，使两者在运行时相互关联。

几乎任何标签指针类都需要一个非标签指针类与之配合。在这个例子中，我们需要常规的 MANumber 来存储无法压缩进标签指针的整数，以及存储双精度浮点数值 —— 这些值很难挤进标签指针里，因此我选择暂不处理。根据存储的值，我们或创建一个标签指针，或分配一个常规实例。

这里有两种方法。一是创建两个完全不同的类，并通过某种共同的超类来共享代码。二是对两者使用同一个类，只是在代码需要查找内部数据（如实例变量）时，为标签指针和非标签指针设置两条不同的代码路径。我在此选择了后者，因为对于这个案例来说它似乎更简单。

大部分情况下，MANumber 保持原样。我将部分直接访问实例变量的操作改为通过访问器（accessor）调用，以便将条件编译代码集中管理，但这个类的主体结构没有变化。MANumber 使用联合体（union）来存储实际数值，我发现将其单独提取出来很有用 —— 这样它既能作为数值传递的载体，又不仅仅局限于充当实例变量：

1
    union Value
2
    {
3
        long long i;
4
        unsigned long long u;
5
        double d;
6
    };

接下来是一系列用于操作标签指针（tagged pointer）的常量。首先是 MANumber 所使用的标签指针槽位，我任意选取了 1：

1
    const int kSlot = 1;

我还将 tag 位数量提取为一个常量，因为它被用于 tagged pointers（标记指针）的编码和解码过程：

1
    const int kTagBits = 4;

我决定在 tagged pointer 中保留与 MANumber 相同的基本结构。它既存储一个值，也存储一个表示如何解释该值的类型。类型可以是整数、无符号整数或双精度浮点数。由于所有内容都需要尽可能紧凑地打包，我希望用最少的位数来表示类型。因为有三种类型，我分配了两个位来存放类型标识：

1
    const int kTypeBits = 2;

需要注意的是，虽然我在标签指针（tagged pointer）中不支持 double 类型，但我仍然为其预留了足够的空间来表示其类型。这样做纯粹是为了保持一致性，并方便将来在某个时间点增加对 double 的支持。

最后，由于我们存储的整数类型是 long long，明确了解它具体有多少位会很有用：

1
    const int kLongLongBits = sizeof(long long) * CHAR_BIT;

在这段代码中，我假设 long long 类型的大小等同于指针的大小。我没有尝试为标签指针代码编写一个 32 位兼容的版本。

为了更好地操作 tagged pointer（标签指针），我编写了一些辅助函数。第一个是一个函数，它根据一个值和一个类型来构造一个标签化的 MANumber 指针：

1
    static id TaggedPointer(unsigned long long value, unsigned type)
2
    {

回忆一下标记指针（tagged pointer）的结构。最低有效位总是 1。接下来的三位是标签（tag），或称为槽（slot），用于指明对象所属的类。之后是类自身的数据。在这个例子里，接下来的两位是类型，之后的所有位都是值。下面这行代码通过位移和按位或操作将所有这些部分紧凑地组合在一起：

1
        id ptr = (__bridge id)(void *)((value << (kTagBits + kTypeBits)) | (type << kTagBits) | (kSlot << 1) | 1);

注意这里奇怪的双重类型转换。我在使用 ARC 构建，它对类型转换很严格。在对象指针和非对象指针之间进行转换时需要使用 __bridge，而且它完全不允许在对象指针和整数之间进行转换。为了在 tagged pointer（标签指针）的整数表示形式和对象表示形式之间转换，我首先必须将整数转换为 void *，然后才能将结果转换为对象。

这就是所有需要做的了，所以我返回新构造的指针：

1
        return ptr;
2
    }

我还编写了一个检查指针是否被标记（tagged）的函数。这本质上只需要检查最低有效位（least-significant bit），但由于需要满足 ARC 的要求而进行复杂的类型转换，所以最好将其封装成一个单独的函数。

1
    static BOOL IsTaggedPointer(id pointer)
2
    {
3
        uintptr_t value = (uintptr_t)(__bridge void *)pointer;
4
        return value & 1;
5
    }

最后，有一个函数可以将 tagged pointer（标记指针）分解为其组成部分。由于 C 语言不支持多返回值，我创建了一个结构体来保存要返回的两个值：值和类型。

1
    struct TaggedPointerComponents
2
    {
3
        unsigned long long value;
4
        unsigned type;
5
    };

该函数本身首先将指针转换为整数，使用的是上述奇特双关转换的逆向操作：

1
    static struct TaggedPointerComponents ReadTaggedPointer(id pointer)
2
    {
3
        uintptr_t value = (uintptr_t)(__bridge void *)pointer;

接下来提取相关字段。标签位（tag bits）本身可以忽略，因为它们不影响 MANumber 的值或类型。MANumber 的值是通过将指针向下移动适当的位数来提取的：

1
        struct TaggedPointerComponents components = {
2
            value >> (kTagBits + kTypeBits),

类型必须同时进行掩码处理和位移。掩码（mask）的生成是通过将 1 左移标签位（tag bits）的数量，生成二进制值 100，然后从该值减去 1，得到二进制值 11。通过与该值进行逻辑与（logical and）操作，我们去掉了指针（pointer）中除表示类型的所有位：

1
            (value >> kTagBits) & ((1ULL << kTypeBits) - 1)
2
        };

既然组件已被提取，该函数只需将它们返回。

1
        return components;
2
    }

在某些时候，我们实际上必须通过调用 _objc_insert_tagged_isa 来告诉运行时，我们希望作为 tagged pointer 类（标记指针类）。（译注：_objc_insert_tagged_isa 是 Runtime 内部函数，现代系统可能已变化。）显而易见的位置是在 +initialize 中。实际上，我调用了两次：一次是为了清除任何先前的条目，再次插入新的条目。作为一种保护措施，如果你尝试直接覆盖 tagged pointer table（标记指针表）中的现有类，Objective-C runtime（Objective-C 运行时）会变得非常敏感。

1
    + (void)initialize
2
    {
3
        if(self == [MANumber class])
4
        {
5
            _objc_insert_tagged_isa(kSlot, nil);
6
            _objc_insert_tagged_isa(kSlot, self);
7
        }
8
    }

现在我们可以开始实际构建标签指针（tagged pointer）了。我编写了两个新方法：+numberWithLongLong: 和 +numberWithUnsignedLongLong:。它们会尝试根据参数构造一个标签指针，如果无法做到，则退回到分配一个类的实例。

这些方法只有在值满足特定限制时才能创建标签指针。也就是说，该值必须能够容纳在 kLongLongBits - kTagBits - kTypeBits 位内，在 64 位系统上这是 58 位。能够容纳在这个范围内的最大 long long 值是 2^57 - 1。能容纳的最小 long long 值是 -2^57。由于 long long 是有符号类型，需要为符号位保留一位。基于此，我们计算带标签的 long long 值的最小值和最大值：

1
    + (id)numberWithLongLong: (long long)value {
2
        long long taggedMax = (1ULL << (kLongLongBits - kTagBits - kTypeBits - 1)) - 1;
3
        long long taggedMin = -taggedMax - 1;

其余部分很简单。如果值超出边界，我们只需执行常规的 alloc / init 流程。如果它在边界内，我们使用值和 INT 调用 TaggedPointer 来构造标签指针（tagged pointer）：

1
        if(value > taggedMax || value < taggedMin)
2
            return [[self alloc] initWithLongLong: value];
3
        else
4
            return TaggedPointer(value, INT);
5
    }

无符号长长整型（unsigned long long）的代码是类似的。此类型仅存在一个限制，即 2^{64}-1。

1
    + (id)numberWithUnsignedLongLong:(unsigned long long)value {
2
        unsigned long long taggedMax = (1ULL << (kLongLongBits - kTagBits - kTypeBits)) - 1;
3

4
        if(value > taggedMax)
5
            return [[self alloc] initWithUnsignedLongLong: value];
6
        else
7
            return (id)TaggedPointer(value, UINT);
8
    }

接下来，我们需要一个能处理标签指针（tagged pointer）的类型访问器，这样其余代码只需简单地调用 [self type]，而无需操心位操作和掩码处理。这相当简单。它只需调用 IsTaggedPointer，若指针已标记则调用 ReadTaggedPointer，否则直接返回 _type 实例变量：

1
    - (int)type
2
    {
3
        if(IsTaggedPointer(self))
4
            return ReadTaggedPointer(self).type;
5
        else
6
            return _type;
7
    }

此外还有一个值访问器（value accessor），但由于需要对结果值进行符号扩展处理，其实现要稍微复杂一些。它首先检查指针是否为标记指针（tagged pointer），如果不是，则直接返回 _value 实例变量：

1
    - (union Value)value
2
    {
3
        if(!IsTaggedPointer(self))
4
        {
5
            return _value;
6
        }

对于 tagged pointers（标签指针），它首先从 ReadTaggedPointer 读取值。这个值以无符号长长整型（unsigned long long）的形式弹出，因此，如果实际类型是有符号的，则需要一些处理。

1
        else
2
        {
3
            unsigned long long value = ReadTaggedPointer(self).value;

同时创建一个类型为 union Value 的局部变量，用于存储返回值：

1
            union Value v;

如果值是无符号的，那么处理起来就很简单：直接将 value 填充到 v 中。

1
            int type = [self type];
2
            if(type == UINT)
3
            {
4
                v.u = value;
5
            }

不过，有符号整数（signed integers）的情况稍微复杂一些。它首先会检查符号位（sign bit），在当前情况下这个位位于第 57 位：

1
            else if(type == INT)
2
            {
3
                unsigned long long signBit = (1ULL << (kLongLongBits - kTagBits - kTypeBits - 1));

如果符号位（sign bit）被置位，那么该数值中第 57 位之后的所有其他位也必须全部置为 1，这样生成的 long long 值才能被视为符合规范的 64 位负数。这个过程被称为符号扩展（sign extension），它之所以如此运作，源于现代计算机实现有符号整数（signed integers）的方式。简而言之，一个负数的高位全部由 1 组成，而其首个 0 位才是该数值的最高有效数字（即第一个 1 位）。要扩展正数的位宽，只需在左侧补 0；要扩展负数的位宽，则在左侧补 1：

1
                if(value & signBit)
2
                {
3
                    unsigned long long mask = (((1ULL << kTagBits + kTypeBits) - 1) << (kLongLongBits - kTagBits - kTypeBits));
4
                    value |= mask;
5
                }

对于正数无需额外处理，因为其左侧已填充了多余的 0。只需将值赋给 v 中的对应字段即可：

1
                v.i = value;
2
            }

如果类型是其他值，则说明出现了问题，因此需要检查这种情况并中止操作：

1
            else
2
                abort();

最后，返回 v：

1
            return v;
2
        }
3
    }

有了这些代码后，其余所有 MANumber 代码无需任何修改，只需改为调用这些访问器（accessor）而非直接访问实例变量。你可以毫无问题地混合使用 tagged 和 untagged 的 MANumber 实例，甚至可以在混用对象上使用 compare: 和 isEqual:。

结论 Tagged Pointer 是 Cocoa 和 Objective-C 运行时（runtime）的一项重要改进，它提升了 NSNumber 对象的速度并减少了内存占用。通过将对象数据直接塞入指针本身，Tagged Pointer 消除了额外的内存分配和获取值时额外的间接寻址（indirection）。

我们可以利用私有运行时调用和一些位操作（bit twiddling）来实现自己的 tagged pointer 类，这让我们得以窥见 NSNumber 背后的运作机制。然而，由于 tagged pointer 类的数量有限，无法安全地在我们自己的代码中使用 tagged pointers。它们纯粹是 Cocoa 用于提升性能的机制。在这方面它们表现得非常出色，使得 NSNumber 的使用体验不那么痛苦。

今天就到这里！下次回来时，还有更多疯狂的把戏等着你。这个「周五问答」专栏的内容是由读者建议驱动的，所以如果你有任何想在这里看到的主题想法，请发来告诉我们！

#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2012-07-27-lets-build-tagged-pointers.html

Last time on Friday Q&A, I discussed a hypothetical implementation of the NSNumber class. Starting on Mac OS X 10.7 and iOS 5, NSNumber uses a new runtime facility called tagged pointers to increase speed and reduce memory usage, the inner workings of which I want to examine today.

Pointer Alignment TheoryAs we all know, a pointer is just an integer that’s treated as an address in memory. Under the covers, a variable holding a pointer to an object is really just an integer holding a value like 0x7f84a41000c0. The pointer nature of this value is entirely in how the program uses in. In C, we can extract the integer value of a pointer with a simple cast:

1
    void *somePointer = ...;
2
    uintptr_t pointerIntegerValue = (uintptr_t)somePointer;

(The uintptr_t type is a standard C typedef for an integer type large enough to hold a pointer. This is useful, as pointer sizes will vary across platforms.)

Most computer architectures have a concept of pointer alignment. This means that a pointer to a particular data type should be a multiple of some power of two. For example, a pointer to a 4-byte integer should be a multiple of 4 on most architectures. Violating alignment restrictions can cause performance impacts (on some platforms, the attempt throws a hardware exception that has to be emulated by the OS) and, on certain performance-sensitive architectures, can just crash outright. Correct alignment is also required for things like atomic reads and writes to memory. In short, pointer alignment is important and something you want to respect.

If you make a local variable, the compiler can ensure that the alignment is correct:

1
    void f(void)
2
    {
3
        int x;
4
        // x is aligned properly, because
5
        // the compiler has full control
6
        // over its location
7
    }

However, things are not so simple for dynamically allocated memory:

1
    int *ptr = malloc(sizeof(*ptr));
2
    // is ptr aligned properly?

malloc has no idea what kind of data the program is going to be storing. It gets a request for 4 bytes (assuming that int is 4 bytes), but it has no idea if this is an int, a pointer (on 32-bit platforms), two shorts, four chars, or something else.

In order to ensure proper alignment, malloc takes a paranoid approach and returns pointers that are aligned to a boundary large enough to guarantee correct alignment no matter what data types are stored. On Mac OS X, malloc always returns pointers aligned to 16-byte boundaries.

Because of alignment, there are unused bits in any aligned pointer. The hex representation of a 16-byte aligned pointer looks like:

1
    0x-------0

That last hex digit is always 0. It’s possible to have a valid pointer that doesn’t respect this (for example, a char * has no alignment restrictions), but object pointers will always have some zero bits at the end.

Tagged Pointer TheorySome objects contain very little data. NSNumber is an excellent example of this: depending on exactly what kind of number the object is storing, the data it stores is only one, two, four, or eight bytes. Having an entire object dedicated to these, with memory allocation and deallocation, and indirecting through pointers to that object for every access, is wasteful.

What if we took advantage of those unused low bits in a pointer to indicate that this is not a real pointer? What if we then stored the object data inline in the rest of this non-pointer instead of off in a separate memory allocation? The result of doing so is a tagged pointer.

Systems which use tagged pointers require an extra check. Any time an object pointer is used, the system checks the low bit. If it’s zero, it’s a real object pointer and treated normally. If the low bit is one, then it can’t be a real pointer, since it’s not correctly aligned. In that case, the pointer is handled specially. Typically, the object type is encoded in the low bits of the pointer, with the object data in the high bits. A normal pointer will look like this:

1
    ....0000
2
        ^ all zeroes at the end

While a tagged pointer looks like this:

1
    ....xxx1
2
        ^ type encoded in bottom bits

There are variations on this. For example, a tagged pointer could have any of the bottom four bits set. You can have even more tag bits by taking advantage of additional architecture peculiarities. For example, x86_64 actually only uses 48 bits of a pointer, leaving you with 16 bits free on top of the 4 you get due to alignment. With this, you can do crazy things like stuff pointers into floating-point not a number values.

For whatever reason, the Objective-C implementation of tagged pointers has the bottom bit always set to 1 in a tagged pointer, with the other three bits indicating the class of the tagged pointer. I’m not sure why they chose to implement it in this way, but my guess is some combination between speed (it may be easier to check only the bottom bit, and this has to be done in the fast path of objc_msgSend) and simple convenience in programming.

Tagged Pointer UsesTagged pointers often show up in languages where everything is an object. When the number 3 is an object, and an expression like 3 + 4 involves two objects and creates a third, object creation and value extraction becomes important to performance.

We generally think of an object as a separate memory allocation, referenced with a pointer. This was always the case in Objective-C before tagged pointers came along, and is always the case in a lot of other languages as well. However, memory allocation and management incurs substantial overhead. When evaluating 3 + 4, the overhead of allocating memory for the result of 7 and then eventually disposing of it when nobody is using it far outweighs the cost of actually performing the addition.

Referring to everything with a pointer also incurs overhead simply for accessing the data. To load the value of the 3 object, the CPU has to load the value of the pointer to that object, then load the primitive 3 stored within the object itself. Since memory is slow, this extra access can be costly.

By using tagged pointers, these costs can be eliminated for data which fits within the tagged pointer. Small integers are an excellent candidate for this, since they can easily fit, and tend to be used a lot. The integer 3 would normally be represented like:

1
    0000 0000 0000 0000 0000 0000 0000 0011
2
                                         ^ binary 3

With tagged pointers, it would look something like this:

1
    0000 0000 0000 0000 0000 0000 0011 1011
2
                                    ^  ^  ^ tag bit
3
                                    |  |
4
                                    |  tagged pointer class (5)
5
                                    |
6
                                    binary 3

The actual value is shifted over a bit, and the tag bits are set appropriately. For this example, I’ve assumed that the integer class is identified by the value 5 in the tag bits, but that’s entirely arbitrary, and it’s up to the system to figure out the meaning of each value in the tag bits.

The observant reader will notice that the space left for storing the value is smaller than that of a plain integer. After using 4 bits for the tag, there are 28 bits left for a value on 32-bit systems, and 60 bits left for a value on 64-bit systems. Integers which require more than 28 or 60 bits to store can’t be placed in a tagged pointer, and will have to be stored in a regular object.

When everything fits within the tagged pointer, there’s no need for a separate memory allocation. This means no overhead to allocate or destroy the memory. No additional memory loads are needed, since all of the data is right there. There’s a bit of extra overhead because some shifting and masking is needed to extract the 3, but CPUs are really fast at those operations and they’re much faster than accessing a separate chunk of memory.

In addition to the speed benefits, this also reduces memory usage, since no separate chunk of memory has to be allocated to hold these integers. It’s inconsequential for a single expression like 3 + 4, but it can make for a substantial savings in code that uses a lot of integers.

With the ability to use the tag bits distinguish between several classes of tagged pointers, it’s possible to store more than just integers. Floating point numbers could be stored in tagged pointers as well. Even off-the-wall items like small strings could be stored. (A 64-bit pointer can hold eight ASCII characters in addition to the tag bits.) A one-element array holding a single (non-tagged) pointer to an object could be held in a tagged pointer. Any class which is both small and frequently used is a good candidate for a tagged pointer.

Although Objective-C has primitive integers and doesn’t use objects for every single mathematical expression, numeric objects are still common in a lot of Objective-C code. Any time we put numbers into an array or dictionary, we have to use NSNumber to box them up into an object. Any time we encode or decode a property list or JSON we have to route all numbers through NSNumber. Since NSNumber is used a lot, and is mostly used to store numbers that fit within a tagged pointer, NSNumber is an excellent candidate for tagged pointers, and that’s what Cocoa does. Tagged pointers are also used for suitable NSDate objects. Plenty of room is available for expansion to other uses as well.

Tagged Pointer PracticeEnough theory, let’s write some code.

You’ll recall my NSNumber workalike from last time, MANumber. I’m now going to add tagged pointer support to MANumber.

Note that tagged pointers are very, very, very private and cannot under any circumstances be used in any real code. There are an extremely limited number of tagged pointer classes. Since there are only three bits of tag available to indicate the class, only eight tagged classes can be used. If your tagged class conflicts with one in the frameworks, it’s game over. Additionally, since details of the tagged pointer implementation aren’t exposed through any public API, they’re subject to change without notice. The meaning or number of the tag bits could change, or the whole thing could be thrown out if it’s decided that it’s not worth the trouble.

However, by using the appropriate private API, it is possible to experiment with tagged pointers, even if we can never use them safely.

In particular, the private _objc_insert_tagged_isa function allows associating a class with a particular tag. This is its prototype:

1
    void _objc_insert_tagged_isa(unsigned char slotNumber, Class isa);

You give it a slot number (tag) and a class, and it sets the appropriate table entry so that the two are associated in the runtime.

Nearly any tagged pointer class also needs a non-tagged class to go along with it. For this case, we need the regular MANumber to store integers that don’t fit into a tagged pointer, and also to store double values, which are really hard to squeeze into a tagged pointer, and thus are something I skipped over. Depending on the value being stored, we either create a tagged pointer or allocate a normal instance.

There are two approaches here. One is to create two completely different classes, with some kind of common superclass so that they can share common code. The other is to use the same class for both, and have two code paths for tagged and non-tagged pointers whenever the code looks up internal data (like instance variables). I chose to go with the latter approach here, as it seemed simpler for this case.

For the most part, MANumber can stay the same. I routed some direct instance variable accesses through accessors in order to group the conditionalized code all in one place, but most of the class stayed the same. MANumber uses a union to store the actual value, which I found useful to break out so that it could be used to pass values around, and not just as an instance variable:

1
    union Value
2
    {
3
        long long i;
4
        unsigned long long u;
5
        double d;
6
    };

Next up are a bunch of constants used to manipulate the tagged pointer. First is the tagged pointer slot to use for MANumber, which I arbitrarily chose as 1:

1
    const int kSlot = 1;

I also put the number of tag bits into a constant, since it’s used for encoding and decoding the tagged pointers:

1
    const int kTagBits = 4;

I decided to keep the same basic structure of MANumber in the tagged pointer. It stores a value as well as a type which says how to interpret that value. The type can be either an integer, unsigned integer, or double. Since everything needs to be packed as tighly as possible, I want to use the minimum number of bits to represent the type. Since there are three types, I set aside two bits to hold the type:

1
    const int kTypeBits = 2;

Note that although I’m not supporting double in the tagged pointer, I still reserve enough space to represent its type. This is done purely for consistency, and to make it easier to add double support at some point.

Finally, since the integer type we’re storing is a long long, it’s useful to know exactly how many bits that is:

1
    const int kLongLongBits = sizeof(long long) * CHAR_BIT;

I’m assuming in this code that long long is the size of a pointer. I did not attempt to create a 32-bit compatible version of the tagged pointer code.

To better manipulate tagged pointers, I wrote some helper functions. The first is a function that constructs a tagged MANumber pointer out of a value and type:

1
    static id TaggedPointer(unsigned long long value, unsigned type)
2
    {

Recall the structure of a tagged pointer. The least significant bit is always 1. The next three bits are the tag, or slot, which indicates the class of the object. After that is the class’s own data. In this case, the next two bits are the type, and everything after that is the value. Here’s a line which squishes all of these components together using bitwise shifting and oring:

1
        id ptr = (__bridge id)(void *)((value << (kTagBits + kTypeBits)) | (type << kTagBits) | (kSlot << 1) | 1);

Note the odd double cast here. I’m building with ARC, which is picky about casting. You need __bridge when casting between object pointers and non-object pointers, and furthermore it won’t allow casting between object pointers and integers at all. To convert between the integer version of the tagged pointer and the object version, I first have to cast the integer to a void *, and only then can I cast the result to an object.

That’s all that needs to be done, so I return the newly constructed pointer:

1
        return ptr;
2
    }

I also have a function that checks whether a pointer is tagged or not. All this involves is checking the least-significant bit, but ugly casting is needed to work around ARC’s demands, so it was good to wrap it up in a single function:

1
    static BOOL IsTaggedPointer(id pointer)
2
    {
3
        uintptr_t value = (uintptr_t)(__bridge void *)pointer;
4
        return value & 1;
5
    }

Finally, there’s a function to break a tagged pointer out into its components. Since C doesn’t support multiple return values, I created a struct that holds the two values to return: the value and the type.

1
    struct TaggedPointerComponents
2
    {
3
        unsigned long long value;
4
        unsigned type;
5
    };

The function itself first converts the pointer into an integer, using the reverse of the weird double cast shown above:

1
    static struct TaggedPointerComponents ReadTaggedPointer(id pointer)
2
    {
3
        uintptr_t value = (uintptr_t)(__bridge void *)pointer;

Next, it extracts the relevant fields. The tag bits themselves can be ignored, since they don’t influence the MANumber’s value or type. The MANumber value is extracted by just shifting the pointer down by the appropriate amount:

1
        struct TaggedPointerComponents components = {
2
            value >> (kTagBits + kTypeBits),

The type must be masked as well as shifted. The mask is generated by shifting 1 to the left by the number of tag bits, generating the binary value 100, then subtracting 1 from that value, resulting in the binary value 11. By doing a logical and with that value, we get rid of all bits in the pointer aside from the bits that represent the type:

1
            (value >> kTagBits) & ((1ULL << kTypeBits) - 1)
2
        };

Now that the components are extracted, the function just returns them.

1
        return components;
2
    }

At some point, we have to actually tell the runtime that we want to act as a tagged pointer class, by calling _objc_insert_tagged_isa. The obvious place to do this is in +initialize. I actually call this twice, once to clear out any previous entry, and again to insert the new entry. As a protective measure, the Objective-C runtime gets very upset if you try to directly overwrite an existing class in the tagged pointer table:

1
    + (void)initialize
2
    {
3
        if(self == [MANumber class])
4
        {
5
            _objc_insert_tagged_isa(kSlot, nil);
6
            _objc_insert_tagged_isa(kSlot, self);
7
        }
8
    }

Now we can get to actually constructing the tagged pointers. I wrote two new methods: +numberWithLongLong: and +numberWithUnsignedLongLong:. These will try to construct a tagged pointer from their argument if possible, and otherwise fall back to allocating an instance of the class.

These methods can only create a tagged pointer if the value is within certain limits. Namely, the value has to fit within kLongLongBits - kTagBits - kTypeBits bits, which is 58 bits on a 64-bit system. The largest long long that fits within that space is 257-1. The smallest long long that fits is -257. One bit is needed for the sign bit, since long long is signed. Based on this, we compute the min and max for a tagged long long:

1
    + (id)numberWithLongLong: (long long)value {
2
        long long taggedMax = (1ULL << (kLongLongBits - kTagBits - kTypeBits - 1)) - 1;
3
        long long taggedMin = -taggedMax - 1;

The rest is simple. If the value lies beyond the bounds, we simply do the regular alloc/init dance. If it’s within bounds, we call TaggedPointer with the value and INT to construct the tagged pointer:

1
        if(value > taggedMax || value < taggedMin)
2
            return [[self alloc] initWithLongLong: value];
3
        else
4
            return TaggedPointer(value, INT);
5
    }

The code for unsigned long long is similar. There’s only one limit in this case, and it’s 258-1:

1
    + (id)numberWithUnsignedLongLong:(unsigned long long)value {
2
        unsigned long long taggedMax = (1ULL << (kLongLongBits - kTagBits - kTypeBits)) - 1;
3

4
        if(value > taggedMax)
5
            return [[self alloc] initWithUnsignedLongLong: value];
6
        else
7
            return (id)TaggedPointer(value, UINT);
8
    }

Next, we need a type accessor that handles tagged pointers, so that the rest of the code can simply call [self type] without worrying about bits and masking and such. This is pretty simple. All it has to do is call IsTaggedPointer, call ReadTaggedPointer if the pointer is tagged, and otherwise simply return the _type instance variable:

1
    - (int)type
2
    {
3
        if(IsTaggedPointer(self))
4
            return ReadTaggedPointer(self).type;
5
        else
6
            return _type;
7
    }

There’s also a value accessor, but it’s a bit more complicated due to the need to handle sign extension in the resulting value. The first thing it does is check to see whether the pointer is tagged and, if it’s not, return the _value instance variable:

1
    - (union Value)value
2
    {
3
        if(!IsTaggedPointer(self))
4
        {
5
            return _value;
6
        }

For tagged pointers, it first reads the value from ReadTaggedPointer. This pops out as an unsigned long long, and so needs some work in the event that the actual type is signed:

1
        else
2
        {
3
            unsigned long long value = ReadTaggedPointer(self).value;

It also creates a local variable of type union Value to hold the return value:

1
            union Value v;

If the value is unsigned, then life is simple: simply stuff value into v.

1
            int type = [self type];
2
            if(type == UINT)
3
            {
4
                v.u = value;
5
            }

However, signed integers get a bit more complex. The first thing it does is check the sign bit, which in this case lives in bit 57:

1
            else if(type == INT)
2
            {
3
                unsigned long long signBit = (1ULL << (kLongLongBits - kTagBits - kTypeBits - 1));

If the sign bit is set, then all of the other bits in the number beyond 57 need to be set to 1 as well, in order for the resulting long long to be seen as a proper 64-bit negative number. This procedure is called sign extension, and it works this way due to the way that modern computers implement signed integers. The short version is that a negative number consists of all 1s at the beginning, and the first 0 is really the first significant digit of the number. To make a positive number wider, you simply add 0s to the left side. To make a negative number wider, you add 1s to the left side:

1
                if(value & signBit)
2
                {
3
                    unsigned long long mask = (((1ULL << kTagBits + kTypeBits) - 1) << (kLongLongBits - kTagBits - kTypeBits));
4
                    value |= mask;
5
                }

Nothing needs to be done for positive numbers, as they’re already filled with extra 0s on the left side. All that needs to be done is to assign to the right field in v:

1
                v.i = value;
2
            }

If the type is something else, then something has gone wrong, so check for that and abort:

1
            else
2
                abort();

Finally, return v:

1
            return v;
2
        }
3
    }

With this code in place, all of the other MANumber code works without any changes other than modifying it to call these accessors rather than directly accessing instance variables. You can mix and match tagged and untagged MANumber instances with impunity, and even use compare: and isEqual: on mixed objects.

ConclusionTagged pointers are a great addition to Cocoa and the Objective-C runtime which improve speed and reduce memory usage for NSNumber objects. By stuffing object data directly into the pointer, tagged pointers eliminate the need for a separate memory allocation and an extra level of indirection when retrieving values.

We can implement our own tagged pointer classes by using private runtime calls and some bit twiddling, which gives us some insight into just what NSNumber is doing behind the scenes. However, due to the restricted number of tagged pointer classes, it’s not possible to safely use tagged pointers in our own code. They are purely a mechanism for Cocoa to perform better. They do this job wonderfully, and make NSNumber much less painful.

That’s it for today! Come back next time for more crazy shenanigans. Friday Q&A is driven by reader suggestions, so if you have an idea for a topic that you’d like to see covered here, please send it in!