动手构建 Mach-O 可执行文件

文章發布時間 2012年11月30日

作者 TommyWu

標籤

译文 · 原文： Friday Q&A 2012-11-30: Let's Build A Mach-O Executable · 作者 Mike Ash

原文：https://www.mikeash.com/pyblog/friday-qa-2012-11-30-lets-build-a-mach-o-executable.html 发布：2012-11-30　作者：Mike Ash 译者：MiMo（mimo-v2.5-pro）；代码块保留英文原样

这算是对我上一篇文章《dyld: Dynamic Linking On OS X》的后续跟进，在那篇文章中我探讨了动态链接器 dyld 的工作原理。本周，我将重新实现编译器和静态链接器的功能，仅借助汇编器（assembler）的帮助，完全从零开始构建一个 Mach-O 二进制文件。

因事制宜的工具
在 OS X 上，从汇编语言输入生成二进制文件的最佳工具当然是汇编器 as。但是，如果你尝试用它构建原始二进制文件，你会发现 as 本身就充当了静态链接器的角色。这并非我们所需。

在这方面更灵活的工具是 nasm，即 Netwide Assembler（通用汇编器）。Xcode 命令行工具会安装 nasm，但遗憾的是，Apple 提供的是一个极其过时的版本 0.98.40，其错误修复可追溯到 2007 年，功能则停留在 1999 年的水平。撰写本文时最新的版本是 2.10.05，你可以通过 port install nasm、brew install nasm 或你选择的任何其他包管理器来安装。如果你不使用包管理器，也可以下载并自行编译源码。

nasm 2.x 包含诸多实用特性，例如 64 位支持和 Mach-O 输出。不过我们不会使用 nasm 的 Mach-O 支持功能，因为本次实践的初衷正是要手动完成这部分工作 —— 但若能直接使用 64 位指令构建 64 位二进制文件，而非将程序拆分成 32 位字来处理，那确实会更加理想！

重新引入素数程序
以下是我们将用于构建 Mach-O 二进制文件的 C 语言源代码。为保持最终生成的二进制文件结构相对简洁，我在编写时只引入了最少必要信息：

1
    #define NULL ((void *)0L)
2
    extern int printf(const char * restrict format, ...);
3
    typedef long time_t;
4
    extern time_t time(time_t *sloc);
5

6
    int main(void)
7
    {
8
        printf("Hello, world #%ld!\n", time(NULL));
9
        return 0;
10
    }

需要注意几点：

我没有使用 #include <stdio.h> 和 #include <time.h>，而是手动声明了 printf() 和 time() 函数，定义了 time_t 类型，并用宏定义了 NULL。这样做可以避免为标准头文件中定义的各种内容生成额外的调试信息。
我将 main() 定义为不接受任何参数。虽然这在常规实践中是极其糟糕的做法，但由于 C 语言的调用约定（calling conventions），它实际上能正确运行。
我使用了包含格式替换的格式字符串，这样我用来生成测试文件的编译器就不会过于” 高效” 地将其替换为 puts() 调用。

这会产生以下汇编代码（使用 Clang 3.3svn 在 -Os 优化级别下构建）：

1
            .section        __TEXT,__text,regular,pure_instructions
2
            .globl  _main
3
    _main:                                  ## @main
4
            .cfi_startproc
5
    ## BB#0:                                ## %entry
6
            pushq   %rbp
7
    Ltmp2:
8
            .cfi_def_cfa_offset 16
9
    Ltmp3:
10
            .cfi_offset %rbp, -16
11
            movq    %rsp, %rbp
12
    Ltmp4:
13
            .cfi_def_cfa_register %rbp
14
            xorl    %edi, %edi
15
            callq   _time
16
            leaq    L_.str(%rip), %rdi
17
            movq    %rax, %rsi
18
            xorb    %al, %al
19
            callq   _printf
20
            xorl    %eax, %eax
21
            popq    %rbp
22
            ret
23
            .cfi_endproc
24

25
            .section        __TEXT,__cstring,cstring_literals
26
    L_.str:                                 ## @.str
27
            .asciz   "Hello, world #%ld!\n"
28

29
    .subsections_via_symbols

代码本身非常直接：在 __TEXT,__text 节内，建立栈帧，调用 time()，加载 L_.str 字符串，将 al 设为零，调用 printf，将 eax 归零，销毁栈帧，然后返回。接着，在 __TEXT,__cstring 节中，定义 L_.str 标签指向一个以零终止的 ASCII 字符串。最后，声明此文件中没有任何符号位于基本块内 —— 链接器会在死代码剥离时利用此信息。

其余的伪指令与调用帧信息（Call Frame Information）相关，这些信息用于展开数据（‘.unwind_info’ 和 .eh_frame，即异常处理支持）以及调试信息（.debug_frame）。我们将手动构建前两种。

为了保持清晰，我会省略完整的 DWARF 调试信息。即使对于这个非常简单的程序，它也会给这篇已经很长的文章带来相当多的内容。

Mach-O 可执行文件的起始

我们的 nasm 输入文件将用于生成一个 Mach-O 文件，因此需要以一个 Mach-O 头部（header）开始。我们将使用 64 位小端序（little-endian）的 Mach-O 格式，其头部如下所示：

1
    struct mach_header_64 {
2
        uint32_t    magic;      /* mach magic number identifier */
3
        cpu_type_t  cputype;    /* cpu specifier */
4
        cpu_subtype_t   cpusubtype; /* machine specifier */
5
        uint32_t    filetype;   /* type of file */
6
        uint32_t    ncmds;      /* number of load commands */
7
        uint32_t    sizeofcmds; /* the size of all the load commands */
8
        uint32_t    flags;      /* flags */
9
        uint32_t    reserved;   /* reserved */
10
    };
11

12
    /* Constant for the magic field of the mach_header_64 (64-bit architectures) */
13
    #define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
14
    #define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */

以下是用于我们 Mach-O（Mach 可执行文件格式）头的 nasm 输入：

1
    bits 64
2
    cpu x64
3

4
    __mh_execute_header:
5
        dd 0xfeedfacf   ; MH_MAGIC_64
6
        dd 16777223     ; CPU_TYPE_X86 | CPU_ARCH_ABI64
7
        dd 0x80000003   ; CPU_SUBTYPE_I386_ALL | CPU_SUBTYPE_LIB64
8
        dd 2            ; MH_EXECUTE
9
        dd 16           ; number of load commands
10
        dd ___loadcmdsend - ___loadcmdsstart    ; size of load commands
11
        dd 0x00200085   ; MH_NOUNDEFS | MH_DYLDLINK | MH_TWOLEVEL | MH_PIE
12
        dd 0            ; reserved
13
    ___loadcmdsstart:

bits 和 cpu 指令只是告诉 nasm 以 64 位模式运行。

紧接在 Mach-O 头之后的是加载命令（load commands）。执行文件必须包含一系列必备的命令，而除此之外还可能存在大量其他命令。Clang 为此可执行文件生成了 16 条加载命令。一条加载命令看起来像这样：

1
    struct load_command {
2
        uint32_t cmd;       /* type of load command */
3
        uint32_t cmdsize;   /* total size of command in bytes */
4
    };

每个加载命令实际上比这更大；cmd 字段告诉加载器如何解释后续的数据。对于 64 位的 Mach-O 文件，加载命令必须对齐到 8 字节边界。

段和节

段（Segments）是 dyld 在运行时实际映射到内存中的数据块和代码块。节（Sections）是段的细分。段和节都有名称，并且其中相当多是标准且预定义的。

这是我们的第一个段命令：

1
    ___pagezerostart:
2
        dd 0x19         ; LC_SEGMENT_64
3
        dd ___pagezeroend - ___pagezerostart    ; command size
4
        db '__PAGEZERO',0,0,0,0,0,0 ; segment name (pad to 16 bytes)
5
        dq 0            ; VM address
6
        dq 0x100000000  ; VM size
7
        dq 0            ; file offset
8
        dq 0            ; file size
9
        dd 0x0          ; VM_PROT_NONE (maximum protection)
10
        dd 0x0          ; VM_PROT_NONE (inital protection)
11
        dd 0            ; number of sections
12
        dd 0x0          ; flags
13
        align 8, db 0   ; pad with zero to 8-byte boundary
14
    ___pagezeroend:

这是 __PAGEZERO 段，它预先将 64 位虚拟内存空间的低 4GB 定义为不可访问。由于这个段被标记为不可读、不可写且不可执行，解引用 NULL 指针会立即导致段错误（segmentation fault）。

下一个段命令则更为复杂：

1
    ___TEXTstart:
2
        dd 0x19         ; LC_SEGMENT_64
3
        dd ___TEXTend - ___TEXTstart    ; command size
4
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
5
        dq 0x100000000  ; VM address
6
        dq 0x1000       ; VM size
7
        dq 0            ; file offset
8
        dq 0x1000       ; file size
9
        dd 0x7          ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE
10
        dd 0x5          ; VM_PROT_READ | VM_PROT_EXECUTE
11
        dd 6            ; number of sections
12
        dd 0x0          ; flags
13
    ___TEXTtextstart:
14
        db '__text',0,0,0,0,0,0,0,0,0,0 ; section name (pad to 16 bytes)
15
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
16
        dq 0x100000000 + ___codestart - ___TEXTload ; address
17
        dq ___codeend - ___codestart    ; size
18
        dd ___codestart ; offset
19
        dd 0            ; alignment as power of 2 (1)
20
        dd 0            ; relocations data offset
21
        dd 0            ; number of relocations
22
        dd 0x80000400   ; S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS
23
        dd 0            ; reserved1
24
        dd 0            ; reserved2
25
        dd 0            ; reserved3
26
    ___TEXTstubsstart:
27
        db '__stubs',0,0,0,0,0,0,0,0,0  ; section name (pad to 16 bytes)
28
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
29
        dq 0x100000000 + ___stubstart - ___TEXTload ; address
30
        dq ___stubend - ___stubstart    ; size
31
        dd ___stubstart ; offset
32
        dd 1            ; alignment as power of 2 (2)
33
        dd 0            ; relocations data offset
34
        dd 0            ; number of relocations
35
        dd 0x80000408   ; S_SYMBOL_STUBS | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS
36
        dd 0            ; reserved1 (index into indirect symbol table)
37
        dd 6            ; reserved2 (size per stub)
38
        dd 0            ; reserved3
39
    ___TEXTstubhelperstart:
40
        db '__stub_helper',0,0,0    ; section name (pad to 16 bytes)
41
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
42
        dq 0x100000000 + ___stubhelpstart - ___TEXTload ; address
43
        dq ___stubhelpend - ___stubhelpstart    ; size
44
        dd ___stubhelpstart ; offset
45
        dd 2            ; alignment as power of 2 (4)
46
        dd 0            ; relocations data offset
47
        dd 0            ; number of relocations
48
        dd 0x80000400   ; S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS
49
        dd 0            ; reserved1
50
        dd 0            ; reserved2
51
        dd 0            ; reserved3
52
    ___TEXTcstringstart:
53
        db '__cstring',0,0,0,0,0,0,0    ; section name (pad to 16 bytes)
54
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
55
        dq 0x100000000 + ___strsstart - ___TEXTload ; address
56
        dq ___strsend - ___strsstart    ; size
57
        dd ___strsstart ; offset
58
        dd 0            ; alignment as power of 2 (1)
59
        dd 0            ; relocations data offset
60
        dd 0            ; number of relocations
61
        dd 0x00000002   ; S_CSTRING_LITERALS
62
        dd 0            ; reserved1
63
        dd 6            ; reserved2
64
        dd 0            ; reserved3
65
    ___TEXTunwindinfostart:
66
        db '__unwind_info',0,0,0    ; section name (pad to 16 bytes)
67
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
68
        dq 0x100000000 + ___uwstart - ___TEXTload   ; address
69
        dq ___uwend - ___uwstart    ; size
70
        dd ___uwstart   ; offset
71
        dd 0            ; alignment as power of 2 (1)
72
        dd 0            ; relocations data offset
73
        dd 0            ; number of relocations
74
        dd 0x00000000   ; no flags
75
        dd 0            ; reserved1
76
        dd 0            ; reserved2
77
        dd 0            ; reserved3
78
    ___TEXTehframestart:
79
        db '__eh_frame',0,0,0,0,0,0 ; section name (pad to 16 bytes)
80
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
81
        dq 0x100000000 + ___ehstart - ___TEXTload   ; address
82
        dq ___ehend - ___ehstart    ; size
83
        dd ___ehstart   ; offset
84
        dd 3            ; alignment as power of 2 (8)
85
        dd 0            ; relocations data offset
86
        dd 0            ; number of relocations
87
        dd 0x00000000   ; no flags
88
        dd 0            ; reserved1
89
        dd 0            ; reserved2
90
        dd 0            ; reserved3
91
        align 8, db 0   ; pad with zero to 8-byte boundary
92
    ___TEXTend:

所以，这是 __TEXT 段，它覆盖了所有可执行代码以及大量其他数据。该段包含六个节（section）。每个节都按照其节信息进行对齐，并且所有节都被紧密排列在段的末尾，因此 __TEXT 的开头相当多字节都为零。然而，由于链接器映射段的方式，__TEXT 实际上包含了所有 Mach-O 头信息。正如我们稍后将看到的，符号表甚至有自己对 __mh_execute_header 的条目。以下是各个节：

__text - 可执行文件的实际代码，所有函数都存放于此。在这个例子中，只有一个函数 - main ()。它被标记为 S_REGULAR，意思是 “它是一个普通老式节”，并被标记为包含 “某些指令”（至少一些可执行代码）和 “纯指令”（仅包含可执行代码）。
__stubs - 跳转表，用于重定向到惰性（lazy）和非惰性（non-lazy）符号节。关于此节内容的解释，请参阅我之前的文章。它被标记为 S_SYMBOL_STUBS，其含义相当明显。
__stub_helper - 用于惰性动态绑定符号的辅助函数。
__cstring - 包含代码中使用的只读 C 字符串字面量的节。
__unwind_info - 可执行文件代码的紧凑型栈展开信息。用于 OS X 上的异常处理。
__eh_frame - 可执行文件代码的 DWARF2 栈展开信息。用于异常处理和调试。

接下来是 __DATA 段：

1
    ___DATAstart:
2
        dd 0x19         ; LC_SEGMENT_64
3
        dd ___DATAend - ___DATAstart    ; command size
4
        db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
5
        dq 0x100001000  ; VM address
6
        dq 0x1000       ; VM size
7
        dq 0x1000       ; file offset
8
        dq 0x1000       ; file size
9
        dd 0x7          ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE
10
        dd 0x3          ; VM_PROT_READ | VM_PROT_WRITE
11
        dd 2            ; number of sections
12
        dd 0x0          ; flags
13
    ___DATAnlsymptrstart:
14
        db '__nl_symbol_ptr',0  ; section name (pad to 16 bytes)
15
        db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
16
        dq 0x100001000 + ___nlsymptrstart - ___DATAload ; address
17
        dq ___nlsymptrend - ___nlsymptrstart    ; size
18
        dd ___nlsymptrstart ; offset
19
        dd 3            ; alignment as power of 2 (8)
20
        dd 0            ; relocations data offset
21
        dd 0            ; number of relocations
22
        dd 0x00000006   ; S_NON_LAZY_SYMBOL_POINTERS
23
        dd 2            ; reserved1 (index into indirect symbol table)
24
        dd 0            ; reserved2
25
        dd 0            ; reserved3
26
    ___DATAlasymptrstart:
27
        db '__la_symbol_ptr',0  ; section name (pad to 16 bytes)
28
        db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
29
        dq 0x100001000 + ___lasymptrstart - ___DATAload ; address
30
        dq ___lasymptrend - ___lasymptrstart    ; size
31
        dd ___lasymptrstart ; offset
32
        dd 3            ; alignment as power of 2 (8)
33
        dd 0            ; relocations data offset
34
        dd 0            ; number of relocations
35
        dd 0x00000007   ; S_LAZY_SYMBOL_POINTERS
36
        dd 4            ; reserved1 (index into indirect symbol table)
37
        dd 0            ; reserved2
38
        dd 0            ; reserved3
39
        align 8, db 0   ; pad with zero to 8-byte boundary
40
    ___DATAend:

这里只有两个段，因为该程序没有任何全局或静态数据：非惰性符号桩（non-lazy symbol stubs）和惰性符号桩（lazy symbol stubs）。

然后是最后一个段，__LINKEDIT：

1
    ___LINKEDITstart:
2
        dd 0x19         ; LC_SEGMENT_64
3
        dd ___LINKEDITend - ___LINKEDITstart    ; command size
4
        db '__LINKEDIT',0,0,0,0,0,0 ; segment name (pad to 16 bytes)
5
        dq 0x100002000  ; VM address
6
        dq 0x1000       ; VM size
7
        dq 0x2000       ; file offset
8
        dq ___LINKEDITdataend - ___LINKEDITdatastart    ; file size
9
        dd 0x7          ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE
10
        dd 0x1          ; VM_PROT_READ
11
        dd 0            ; number of sections
12
        dd 0x0          ; flags
13
        align 8, db 0   ; pad with zero to 8-byte boundary
14
    ___LINKEDITend:

__LINKEDIT 段包含动态链接器（dyld，dynamic loader）使用的多种数据，例如符号表（symbol table）、间接符号表（indirect symbol table）、重定位操作码（rebase opcodes）、绑定操作码（binding opcodes）、导出表（exports table）、函数起始信息（function starts information）、代码内数据表（data-in-code table）以及部分代码签名数据（codesigning data）。

接下来的若干加载命令（load commands）将处理静态链接和动态链接信息：

1
    ___dyldinfostart:
2
        dd 0x80000022   ; LC_DYLD_INFO | LC_REQ_DYLD
3
        dd ___dyldinfoend - ___dyldinfostart    ; command size
4
        dd ___rebasestart   ; rebase info offset
5
        dd ___rebaseend - ___rebasestart    ; rebase info size
6
        dd ___bindstart ; binding info offset
7
        dd ___bindend - ___bindstart    ; binding info size
8
        dd 0            ; weak binding info offset
9
        dd 0            ; weak binding info size
10
        dd ___lazystart ; lazy binding info offset
11
        dd ___lazyend - ___lazystart    ; lazy binding info size
12
        dd ___exportstart   ; export info offset
13
        dd ___exportend - ___exportstart    ; export info size
14
        align 8, db 0   ; pad with zero to 8-byte boundary
15
    ___dyldinfoend:
16
    ___symtabinfostart:
17
        dd 0x2          ; LC_SYMTAB
18
        dd ___symtabinfoend - ___symtabinfostart    ; command size
19
        dd ___symtabstart   ; symbol table offset
20
        dd (___symtabend - ___symtabstart) >> 4 ; number of symbols
21
        dd ___strtabstart   ; string table offset
22
        dd ___strtabend - ___strtabstart    ; string table size
23
        align 8, db 0   ; pad with zero to 8-byte boundary
24
    ___symtabinfoend:
25
    ___dysymtabinfostart:
26
        dd 0xb          ; LC_DYSYMTAB
27
        dd ___dysymtabinfoend - ___dysymtabinfostart    ; command size
28
        dd 0            ; local symbols index
29
        dd 8            ; number of local symbols
30
        dd 8            ; external symbols index
31
        dd 2            ; number of external symbols
32
        dd 10           ; undefined symbols index
33
        dd 3            ; number of undefined symbols
34
        dd 0            ; table of contents offset
35
        dd 0            ; table of contents entries
36
        dd 0            ; module table offset
37
        dd 0            ; module table entries
38
        dd 0            ; external references table offset
39
        dd 0            ; external references table entries
40
        dd ___indirsymstart ; indirect symbol table offset
41
        dd (___indirsymend - ___indirsymstart) >> 2 ; indirect symbol table entries
42
        dd 0            ; local relocation table offset
43
        dd 0            ; local relocation table entries
44
        align 8, db 0   ; pad with zero to 8-byte boundary
45
    ___dysymtabinfoend:
46
    ___loaddylinkerstart:
47
        dd 0xe          ; LC_LOAD_DYLINKER
48
        dd ___loaddylinkerend - ___loaddylinkerstart    ; command size
49
        dd ___loaddylinkername - ___loaddylinkerstart   ; offset to name
50
    ___loaddylinkername:
51
        db '/usr/lib/dyld',0    ; name
52
        align 8, db 0   ; pad with zero to 8-byte boundary
53
    ___loaddylinkerend:
54
    ___maincmdstart:
55
        dd 0x80000028   ; LC_MAIN | LC_REQ_DYLD
56
        dd ___maincmdend - ___maincmdstart  ; command size
57
        dq _main        ; offset of main from start of __TEXT
58
        dq 0            ; stack size
59
        align 8, db 0   ; pad with zero to 8-byte boundary
60
    ___maincmdend:
61
    ___loadlibsystemstart:
62
        dd 0xc          ; LC_LOAD_DYLIB
63
        dd ___loadlibsystemend - ___loadlibsystemstart  ; command size
64
        dd ___loadlibsystemname - ___loadlibsystemstart ; offset to path
65
        dd 2            ; UNIX time stamp Wed Dec 31 19:00:02 1960
66
        dd 0x00a90300   ; current version (0.169.3.0)
67
        dd 0x00010000   ; compatibility version (0.1.0.0)
68
    ___loadlibsystemname:
69
        db '/usr/lib/libSystem.B.dylib' ; path
70
        align 8, db 0   ; pad with zero to 8-byte boundary
71
    ___loadlibsystemend:
72
    ___fstartscmdstart:
73
        dd 0x26         ; LC_FUNCTION_STARTS
74
        dd ___fstartscmdend - ___fstartscmdstart    ; command size
75
        dd ___functionstartsstart   ; offset to function starts data (fun label name, isn't it?)
76
        dd ___functionstartsend - ___functionstartsstart    ; size of function starts data (even more fun name!)
77
        align 8, db 0   ; pad with zero to 8-byte boundary
78
    ___fstartscmdend:
79
    ___datacodecmdstart:
80
        dd 0x29         ; LC_DATA_IN_CODE
81
        dd ___datacodecmdend - ___datacodecmdstart  ; command size
82
        dd ___datacodestart ; offset to data-in-code information
83
        dd ___datacodeend - ___datacodestart ; size of data-in-code information
84
        align 8, db 0   ; pad with zero to 8-byte boundary
85
    ___datacodecmdend:
86
    ___dycodesigncmdstart:
87
        dd 0x2b         ; LC_DYLIB_CODE_SIGN_DRS
88
        dd ___dycodesigncmdend - ___dycodesigncmdstart  ; command size
89
        dd ___dylibcodesignaturesstart  ; offset to code signatures from dylibs
90
        dd ___dylibcodesignaturesend - ___dylibcodesignaturesstart  ; you get the idea, right?
91
        align 8, db 0   ; pad with zero to 8-byte boundary
92
    ___dycodesigncmdend:

总结一下，这段冗长的数据说明包含：

该二进制文件的动态链接信息列表。这个命令与其他一些命令一同被标记为 LC_REQ_DYLD，这意味着如果加载该二进制文件的 dyld（动态链接器）版本不理解该命令，它必须立即放弃，而不是在缺少该信息的情况下继续执行。
符号表（symbol table）和字符串表（strings table）的位置。它们以文件起始处的偏移量给出，但可以理解为这些数据包含在 __LINKEDIT 段内。在运行时，dyld 会执行计算 symtable_base_address = linkedit_base_address + (symtab_offset - linkedit_offset) 来获得符号表在内存中的实际位置。对于字符串表以及 LC_DYLD_INFO 和 LC_DYSYMTAB 命令中给出的偏移量，也会进行类似的计算。
一组该二进制文件的动态符号数据，给出了符号表中各种类型符号的偏移量和数量。
LC_LOAD_DYLINKER 命令提供了用于加载可执行文件的动态链接器的硬编码路径。此命令供内核使用而非动态链接器本身，内核将在进程创建时运行指定的程序。但不要误以为可以利用此命令颠覆加载流程 —— 内核不允许随意选择动态链接器。
LC_MAIN 是较旧的 LC_UNIXTHREAD 命令的替代方案。过去可执行文件需通过二进制文件内部指定的线程状态进行初始化，但近来有人意识到，随着 dyld（动态链接器）在早期介入运行，且几乎所有可执行文件的初始状态完全相同，这种做法纯属浪费时间和空间。因此 LC_MAIN 直接提供了入口点（main()）的地址，dyld 将直接跳转到该地址，同时也替代了原先包含设置 main() 胶水代码的 crt1.o 对象文件。
LC_LOAD_DYLIB 是” 我为部分未定义符号链接此动态库” 命令。当前二进制文件仅链接了 libSystem.B.dylib，即 OS X 中相当于 libc 的系统库。（译注：现代 macOS 中该库名称可能已调整）
LC_FUNCTION_STARTS 是一个位于 __LINKEDIT 段中的数据表，它提供了可执行文件中每个函数入口点的地址。除了其他用途外，这使得那些在符号表中没有条目的函数也能够存在。
LC_DATA_IN_CODE 类似地是一个数据表，它给出了嵌入在可执行代码内的数据字节位置。这对许多目的都很有用，其中至少包括实现精确的反汇编。
最后，LC_DYLIB_CODE_SIGN_DRS 提供了一个列表，列出了与该可执行文件链接的每个动态库的指定要求。这使得代码签名机制无需加载所链接的每个动态库，就能判断该可执行文件的适用性。

还有更多！就在你以为我们已经讲完时，还有三个我们尚未涉及的加载命令（load command）：

1
    ___uuidstart:
2
        dd 0x1b         ; LC_UUID
3
        dd ___uuidend - ___uuidstart    ; command size
4
        db 0xd3,0xec,0x58,0x28,0x02,0x26,0x36,0x29,0xab,0xc3,0x7d,0x6d,0xc9,0xf9,0x2d,0xda  ; D3EC5828-0226-3629-ABC3-7D6DC9F92DDA
5
        align 8, db 0   ; pad with zero to 8-byte boundary
6
    ___uuidend:
7
    ___osverstart:
8
        dd 0x24         ; LC_VERSION_MIN_MACOSX
9
        dd ___osverend - ___osverstart  ; command size
10
        dd 0x000a0800   ; OS min version: 10.8
11
        dd 0x000a0800   ; Build SDK version: 10.8
12
        align 8, db 0   ; pad with zero to 8-byte boundary
13
    ___osverend:
14
    ___sourceverstart:
15
        dd 0x2a         ; LC_SOURCE_VERSION
16
        dd ___sourceverend - ___sourceverstart  ; command size
17
        dq 0            ; Source version: 0.0.0.0.0
18
        align 8, db 0   ; pad with zero to 8-byte boundary
19
    ___sourceverend:
20
    ___loadcmdsend:

这些是二进制文件的 UUID、其适用的 OS X 版本、链接时使用的 SDK 版本，以及 “源代码版本”。我找不到关于 “source version”（源代码版本）到底指什么的任何线索，而且在我查看的二进制文件中它全都是零，所以你的猜测和我的一样。

最后，还有另一件事！我们现在要做的第一件事是填充文件至 main() 函数的起始位置：

1
    ___TEXTload:
2
        times (0xf14-($-$$)) db 0   ; pad the __TEXT segment

你可能会问为什么我不写 _main-($-$$) 而是硬编码起始地址。这看起来确实很脆弱。没错，确实如此。问题在于 nasm 没有提供将数据对齐到 segment（段）“末尾” 的简单方法，尤其是我们没有使用它的内置分段支持。在添加 padding（填充）之前，它根本不知道 _main 在哪里！在这种情况下，我只是硬编码了 main() 开始的偏移量（这正是 __TEXT,__text 段的 addr 字段的精确值），并把它作为一个 hack（权宜之计），而不是试图找出一个优雅但复杂的解决方案。

现在我们按顺序处理数据；实际上我们不需要严格遵循任何特定顺序，因为加载命令中使用的标签会根据我们在文件中的位置重新定位所有内容，但没有理由不这样做。首先是 __TEXT,__text 段，即可执行代码。请注意，我们必须将原始汇编代码重写为 nasm 语法 ——nasm 使用 Intel 语法而非 GNU 语法。主要区别在于所有操作数顺序相反，并且寄存器名称不带限定符。所有各种指导指令也被剥离，因为我们将手动完成它们的工作。

1
    ___codestart:
2
    _main:
3
        push    rbp
4
        mov     rbp, rsp
5
        xor     edi, edi
6
        call    _time
7
        lea     rdi, [rel L_str]
8
        mov     rsi, rax
9
        xor     al, al
10
        call    _printf
11
        xor     eax, eax
12
        pop     rbp
13
        ret
14
    ___codeend:

我们也没有在指令上使用任何大小后缀，因为 nasm（汇编器）可以从操作数（operands）中推断它们。字符串加载的 rel 限定符（相对限定符）只是告诉 nasm 生成 rip 相对访问（rip-relative access）而不是绝对位置，这是必要的，因为我们已将可执行文件标记为位置无关（position-independent）。接下来，我们有 time () 和 printf () 的符号存根（symbol stubs），以及存根助手（stub helper）：

1
    ___stubstart:
2
    _printf:
3
        jmp     [rel _lazy_printf]
4
    _time:
5
        jmp     [rel _lazy_time]
6
    ___stubend:
7

8
    ___stubhelpstart:
9
    _stub_helper:
10
        lea     r11, [rel _nonlazy_dyld_stub_binder]
11
        push    r11
12
        jmp     [rel _nonlazy_dyld_stub_binder]
13
        nop
14
        push    strict qword (_lazy_printf - ___lasymptrstart)
15
        jmp     _stub_helper
16
        push    strict qword (_lazy_time - ___lasymptrstart)
17
        jmp     _stub_helper
18
    ___stubhelpend:

这些桩函数（stub）本身会跳转到 __DATA 段中的惰性符号绑定（lazy symbol binding）。这些绑定最初会直接跳转回 _stub_helper 的底部，该函数会加载符号在惰性符号节中的偏移量，并通过一个非惰性符号（会在可执行文件加载时由 dyld 绑定）调用 dyld 自身。dyld 将绑定该符号并重写惰性符号节，以便未来对该桩函数的调用直接指向目标函数。请注意，这些都是直接的、无条件的跳转，而非子程序调用。同时注意使用了严格的 qword 指令来强制 NASM（Netwide Assembler，一个汇编器）为栈推入操作生成完整的 64 位值。

接下来是 C 字符串段（C strings section），非常简短，因为我们只有一个字符串：

1
    ___strsstart:
2
    L_str:
3
        db      "Hello, world #%ld!\n",0
4
    ___strsend:

现在来看展开表（unwinding table）。它是由苹果定义的” compact unwind encoding（紧凑展开编码）“来编码的（据我所知）。

1
    ___uwstart:
2
        dd 1            ; unwind info version
3
        dd _commonEncodings - ___uwstart    ; common encodings array offset
4
        dd 0            ; count of common encodings
5
        dd _personalities - ___uwstart  ; personality array offset
6
        dd 0            ; count of personalities
7
        dd _index - ___uwstart  ; first-level index offset
8
        dd 2            ; count of entries in first-level index
9
    _commonEncodings:
10
    _personalities:
11
    _index:
12
    __entry1_0:
13
        dd _main        ; function offset
14
        dd __entry2_0 - ___uwstart  ; offset to second-level entry
15
        dd _lsda - ___uwstart   ; offset to language-specific data array entry
16
    __entry1_1:
17
        dd ___codeend+1 ; function offset (end of table)
18
        dd 0            ; offset to second-level entry - zero means end of table
19
        dd _lsda - ___uwstart   ; offset to LSDA
20
    _lsda:
21
    _pages:
22
    __entry2_0:
23
        dd 3            ; UNWIND_SECOND_LEVEL_COMPRESSED
24
        dw ___entrypage0 - __entry2_0   ; offset to entry page
25
        dw 1            ; number of entries in entry page
26
        dw ___enc0 - __entry2_0 ; offset to encoding page
27
        dw 1            ; number of entries in encoding page
28
    ___entrypage0:
29
    ____entrypage0_0:
30
        dd (0 << 24) | (0)  ; encoding index and function offset relative to first-level index offset
31
    ___enc0:
32
    ____enc0_0:
33
        dd 0x01000000   ; UNWIND_X86_64_MODE_RBP_FRAME | UNWIND_X86_64_REG_NONE
34
    ___uwend:

接下来是同样信息的 DWARF（调试信息格式）编码版本。为了节省大家的时间，我不打算把这部分完整写出来并附上全部注释，因为它很复杂，而且只是用一种更冗长的方式重复了上面提到的展开信息。

1
    ___ehstart:
2
        db 0x14,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x7a,0x52,0x00,0x01,0x78,0x10,0x01
3
        db 0x10,0x0c,0x07,0x08,0x90,0x01,0x00,0x00,0x24,0x00,0x00,0x00,0x1c,0x00,0x00,0x00
4
        db 0x34,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x20,0x00,0x00,0x00,0x00,0x00,0x00,0x00
5
        db 0x00,0x41,0x0e,0x10,0x86,0x02,0x43,0x0d,0x06,0x00,0x00,0x00,0x00,0x00,0x00,0x00
6
    ___ehend:

数据、数据、数据… 嗯，差不多就这样。那结束了 __TEXT 段。现在我们有 __DATA 段，其中包含惰性符号指针（lazy symbol pointers）和非惰性符号指针（non-lazy symbol pointers）：

1
    ___DATAload:
2

3
    ___nlsymptrstart:
4
    _nonlazy_dyld_stub_binder:
5
        dq 0x0000000000000000
6
    _nonlazy_table_start:
7
        dq 0x0000000000000000
8
    ___nlsymptrend:
9

10
    ___lasymptrstart:
11
    _lazy_printf:
12
        dq 0x100000000 + _stub_helper_printf
13
    _lazy_time:
14
        dq 0x100000000 + _stub_helper_time
15
    ___lasymptrend:

在一个真实的可执行文件中，__DATA 段通常还包含静态数据、全局变量的存储空间以及其他一些内容。

链接编辑器 __LINKEDIT 段非常棘手，因为它的结构是任意的，且其中的数据并非总是有详尽的文档记录。我已尽力以易于理解的方式呈现其内容，但无法保证我完全做到了这一点。

我们从重定位操作码（rebase opcodes）开始，这些操作码被 dyld（动态链接器）用于应用 ASLR（地址空间布局随机化）时。

1
    ___rebasestart:
2
        db 0x10 | 0x01  ; REBASE_OPCODE_SET_TYPE_IMM | REBASE_TYPE_POINTER
3
        db 0x20 | 0x02  ; REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2)
4
        db 0x10         ; uleb128_encode(_lazy_printf - ___DATAload)
5
        db 0x50 | 0x02  ; REBASE_OPCODE_DO_REBASE_IMM_TIMES | 2
6
        align 8, db 0   ; pad with 0 to 8-byte boundary
7
    ___rebaseend:

这段指令的含义是：“使用指针，在 __DATA 段的偏移量 0x10 处，基于该段的加载地址对 2 个指针进行重定位”。

接下来是绑定操作码（binding opcodes）和延迟绑定操作码（lazy binding opcodes）：

1
    ___bindstart:
2
        db 0x11         ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1
3
        db 0x40         ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0
4
        db 'dyld_stub_binder',0 ; immediate operand
5
        db 0x51         ; BIND_OPCODE_SET_TYPE_IMM | BIND_TYPE_POINTER
6
        db 0x72         ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2)
7
        db 0x00         ; uleb128_encode(0)
8
        db 0x90         ; BIND_OPCODE_DO_BIND
9
        db 0x00         ; BIND_OPCODE_DONE
10
        align 8, db 0   ; pad with 0 to 8-byte boundary
11
    ___bindend:
12
    ___lazystart:
13
        db 0x72,0x10    ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2), uleb128_encode(0x10)
14
        db 0x11         ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1
15
        db 0x40,'_printf',0 ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0, '_printf'
16
        db 0x90,0x00    ; BIND_OPCODE_DO_BIND, BIND_OPCODE_DONE
17
        db 0x72,0x18    ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2), uleb128_encode(0x18)
18
        db 0x11         ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1
19
        db 0x40,'_time',0   ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0, '_time'
20
        db 0x90,0x00    ; BIND_OPCODE_DO_BIND, BIND_OPCODE_DONE
21
        align 8, db 0   ; pad with 0 to 8-byte boundary
22
    ___lazyend:

这些操作码（opcodes）会将名为 dyld_stub_binder 的非惰性符号（non-lazy symbol）绑定到 __DATA 段的偏移量 0 处作为指针。对于惰性符号（lazy symbol），它们会将名为 _printf 的符号绑定到 __DATA 段的偏移量 0x10 处，并将 _time 绑定到偏移量 0x18 处。

以下是导出前缀树（export trie）：

1
    ___exportstart:
2
    _exnode0:
3
        db 0x00         ; terminal size
4
        db 0x01         ; child count
5
        db '_',0        ; name
6
        db _exnode1 - ___exportstart    ; child node offset
7
    _exnode1:
8
        db 0x00         ; terminal size
9
        db 0x02         ; child count
10
        db '_mh_execute_header',0   ; name
11
        db _exnode3 - ___exportstart    ; child node offset
12
    _exnode2:
13
        db 'main',0     ; name
14
        db _exnode4 - ___exportstart    ; child node offset
15
    _exnode3:
16
        db 0x02         ; terminal size
17
        db 0x00         ; flags
18
        db 0x00         ; address - uleb128_encode(0)
19
        db 0x00         ; child count
20
    _exnode4:
21
        db 0x03         ; terminal size
22
        db 0x00         ; flags
23
        db 0x94,0x1e    ; address - uleb128_encode(0xf14)
24
        db 0x00         ; child count
25
        align 8, db 0   ; pad with 0 to 8-byte boundary
26
    ___exportend:

这构成了一个 trie 树（前缀树），用于可执行文件导出的两个符号 __mh_execute_header 和 _main。

有一个压缩的函数起始表（function starts table），它表示为一组要加到基代码地址上的增量：

1
    ___functionstartsstart:
2
        db 0x94         ; delta = 0x14, address  = ___codestart
3
        db 0x1e         ; delta = 0x1e, end
4
        align 8, db 0   ; pad with 0 to 8-byte boundary
5
    ___functionstartsend:

这里是数据嵌入代码的表格。哎呀，在这个可执行文件中实际上并没有这样的表格，加载命令只是被添加进来了：

1
    ___datacodestart:
2
        align 8, db 0   ; pad with 0 to 8-byte boundary
3
    ___datacodeend:

动态库的指定要求 要不就试试为动态库（dylibs）制定一些指定要求？我其实不太确定这种格式具体该怎么写，只是尽力按照我的理解来解释：

1
    ___dylibcodesignaturesstart:
2
        dd 1            ; count of code signatures (maybe?)
3
        dd 0            ; unknown
4
        dd 0x14         ; unknown
5
        db 0xfa,0xde,0x0c,0x00,0x00,0x00,0x00,0x28
6
        db 0x00,0x00,0x00,0x01,0x00,0x00,0x00,0x06
7
        db 0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x0b
8
        db 0x6c,0x69,0x62,0x53,0x79,0x73,0x74,0x65
9
        db 0x6d,0x2e,0x42,0x00,0x00,0x00,0x00,0x03  ; code signature for libSystem.B.dylib
10
        dd 0            ; unknown
11
        align 8, db 0   ; pad with 0 to 8-byte boundary
12
    ___dylibcodesignaturesend:

符号表 符号表（symbol table）是剩余的大部分有趣内容所发生的地方：

1
    ___symtabstart:
2
        dd L_srcdir - ___strtabstart    ; string table offset
3
        db 0x64         ; N_SO
4
        db 0x00         ; section 0
5
        dw 0x00         ; no desc
6
        dq 0            ; address 0
7
        dd L_srcfile - ___strtabstart   ; string table offset
8
        db 0x64         ; N_SO
9
        db 0x00         ; section 0
10
        dw 0x00         ; no desc
11
        dq 0            ; address 0
12
        dd L_objfile - ___strtabstart   ; string table offset
13
        db 0x66         ; N_OSO
14
        db 0x03         ; section 3
15
        dw 0x01         ; desc(?)
16
        dq 0x50b8c91f   ; st_mtime
17
        dd L_empty - ___strtabstart ; no string
18
        db 0x2e         ; N_BNSYM
19
        db 0x01         ; section 1
20
        dw 0x00         ; desc
21
        dq 0x100000000 + _main      ; start address
22
        dd L_main1 - ___strtabstart ; string table offset
23
        db 0x24         ; N_FUN
24
        db 0x01         ; section 1
25
        dw 0x00         ; desc
26
        dq 0x100000f14  ; start address
27
        dd L_empty - ___strtabstart ; no string
28
        db 0x24         ; N_FUN
29
        db 0x00         ; section 0
30
        dw 0x00         ; desc
31
        dq 0x20         ; address
32
        dd L_empty - ___strtabstart ; no string
33
        db 0x4e         ; N_ENSYM
34
        db 0x01         ; section 1
35
        dw 0x00         ; desc
36
        dw 0x20         ; address
37
    _sym_mh_execute_header:
38
        dd L_mhexechead - ___strtabstart    ; string table offset
39
        db 0x0f         ; N_SECT | N_EXT
40
        db 0x01         ; section 1
41
        dw 0x0010       ; REFERENCED_DYNAMICALLY
42
        dq 0x100000000 + __mh_execute_header    ; start address
43
    _sym_main:
44
        dd L_main2 - ___strtabstart ; string table offset
45
        db 0x0f         ; N_SECT | N_EXT
46
        dw 0x0000       ; no extra flags
47
        dq 0x100000000 + _main  ; start address
48
    _sym_printf:
49
        dd L_printf - ___strtabstart    ; string table offset
50
        db 0x01         ; N_UNDF | N_EXT
51
        dw 0x0100       ; dynamic library 1
52
        dq 0            ; address
53
    _sym_time:
54
        dd L_time - ___strtabstart  ; string table offset
55
        db 0x01         ; N_UNDF | N_EXT
56
        dw 0x0100       ; dynamic library 1
57
        dq 0            ; address
58
    _sym_dyld_stub_binder:
59
        dd L_binder - ___strtabstart    ; string table offset
60
        db 0x01         ; N_UNDF | N_EXT
61
        dw 0x0100       ; dynamic library 1
62
        dq 0            ; address
63
        align 8, db 0   ; pad with 0 to 8-byte boundary
64
    ___symtabend:
65

66
    ___indirsymstart:
67
        dd (_sym_printf - ___symtabstart) >> 4  ; index into symbol table
68
        dd (_sym_time - ___symtabstart) >> 4    ; index into symbol table
69
        dd (_sym_dyld_stub_binder - ___symtabstart) >> 4    ; index into symbol table
70
        dd 0x40000000   ; INDIRECT_SYMBOL_ABS
71
        dd (_sym_printf - ___symtabstart) >> 4  ; index into symbol table
72
        dd (_sym_time - ___symtabstart) >> 4    ; index into symbol table
73
        align 8, db 0   ; pad with 0 to 8-byte boundary
74
    ___indirsymend:
75

76
    ___strtabstart:
77
    L_spc:
78
        db ' '
79
    L_empty:
80
        db 0
81
    L_srcdir:
82
        db '/Users/gwynne/',0
83
    L_srcfile:
84
        db 'test.c',0
85
    L_objfile:
86
        db '/var/folders/b8/qgjb841d71d55cf8jh1myb540000gn/T/test-KyuIba.o',0
87
    L_main1:
88
        db '_main',0
89
    L_mhexechead:
90
        db '__mh_execute_header',0
91
    L_main2:
92
        db '_main',0
93
    L_printf:
94
        db '_printf',0
95
    L_time:
96
        db '_time',0
97
    L_binder:
98
        db 'dyld_stub_binder',0
99
        align 8, db 0   ; pad with 0 to 8-byte boundary
100
    ___strtabend:
101

102
    ___LINKEDITdataend:

这里存放着符号表（包括 STABS 条目）、间接符号表（indirect symbol table）（这其实只是一组指向符号表的索引，用于在绑定操作码（binding opcodes）无法满足需求时告诉 dyld 如何使用符号桩 —— 本质上属于遗留数据），以及字符串表（string table），其中保存了符号表所有人类可读的字符串。

结论以上是一大段主要由原始十六进制字节组成的混乱内容。而关键在于：即使按照这里写的方式生成，它仍然无法产出一个可运行的 Mach-O 二进制文件！

为什么呢？因为我没有正确处理对齐要求（alignment requirements），而且在文章发布前我没有足够的时间修复这个问题。不过这里所有的表格和结构都是正确的，因此希望它仍然具有教学意义 —— 即使是构造最简单的二进制文件也需要这么多步骤，你应该非常感谢 ld 和 dyld 为你承担了多少工作！

一如既往，感谢阅读。希望你喜欢这篇文章！

#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2012-11-30-lets-build-a-mach-o-executable.html

This is something of a followup to my last article, dyld: Dynamic Linking On OS X, in which I explored how the dynamic linker dyld does its job. This week, I’m going to recreate the function of both the compiler and the static linker, building a Mach-O binary completely from scratch with only the help of the assembler.

The Right Tool For the Right JobThe best tool on OS X for producing binary files from assembly-language inputs is, of course, the assembler, as. But, if you try to build a raw binary from this, you’ll find that as also functions as a static linker in its own right. This isn’t what we’re after.

A more flexible tool, in this particular respect, is nasm, the Netwide Assembler. nasm is installed by the Xcode commandline tools, but unfortunately, Apple ships a horrifyingly outdated version, 0.98.40, which dates back to 2007 in terms of bug fixes, and to 1999 for features. The most recent version at the time of this writing is 2.10.05, which can be installed with port install nasm, brew install nasm, or whatever other package manager of your choice. If you don’t use a package manager, you can download and compile the source yourself.

nasm 2.x includes a number of useful things, like 64-bit support, and Mach-O output. We won’t be using nasm’s Mach-O support, since the point of all this is to do it by hand, but it’d be kind of nice to build a 64-bit binary using 64-bit instructions instead of split 32-bit words!

Reinserting the Prime ProgramHere’s the C source code for which we’ll build our Mach-O binary. To keep the resulting binary relatively simple, I’ve written it to avoid importing more than the bare minimum of information:

1
    #define NULL ((void *)0L)
2
    extern int printf(const char * restrict format, ...);
3
    typedef long time_t;
4
    extern time_t time(time_t *sloc);
5

6
    int main(void)
7
    {
8
        printf("Hello, world #%ld!\n", time(NULL));
9
        return 0;
10
    }

Some things to notice:

Rather than #include <stdio.h> and #include <time.h>, I’ve manually declared printf() and time(), defined the time_t type, and macroed NULL. This avoids emitting extra debug information for the various stuff defined in the standard headers.
I’ve defined main() as taking no parameters. This is extremely poor practice in general, but because of C’s calling conventions, it works correctly.
I’ve used a format string that actually does a format replacement so that the compiler with which I produced my test files doesn’t get all efficient and replace it with a puts() call instead.

This generates the following assembly (built with Clang 3.3svn at -Os):

1
            .section        __TEXT,__text,regular,pure_instructions
2
            .globl  _main
3
    _main:                                  ## @main
4
            .cfi_startproc
5
    ## BB#0:                                ## %entry
6
            pushq   %rbp
7
    Ltmp2:
8
            .cfi_def_cfa_offset 16
9
    Ltmp3:
10
            .cfi_offset %rbp, -16
11
            movq    %rsp, %rbp
12
    Ltmp4:
13
            .cfi_def_cfa_register %rbp
14
            xorl    %edi, %edi
15
            callq   _time
16
            leaq    L_.str(%rip), %rdi
17
            movq    %rax, %rsi
18
            xorb    %al, %al
19
            callq   _printf
20
            xorl    %eax, %eax
21
            popq    %rbp
22
            ret
23
            .cfi_endproc
24

25
            .section        __TEXT,__cstring,cstring_literals
26
    L_.str:                                 ## @.str
27
            .asciz   "Hello, world #%ld!\n"
28

29
    .subsections_via_symbols

The code itself is very straightforward: Inside the __TEXT,_text section, set up a stack frame, call time(), load the L.str string, set al to zero, call printf, zero eax, tear down the stack frame, and return. Then, in the __TEXT,_cstring section, define the L.str label to point to a zero-terminated ASCII string. Finally, declare that no symbols in this file occur inside basic blocks, which the linker uses during dead code stripping.

The rest of the directives are related to Call Frame Information, which is used for unwinding data (‘.unwind_info’ and .eh_frame, exception handling support) and debug information (.debug_frame). We’ll be building the first two by hand.

For sanity’s sake, I’ll be omitting the full DWARF debugging information. Even for this very simple program it would represent a considerable addition to this already overlong article.

The Start of a Mach-O ExecutableOur nasm input file will be used to generate a Mach-O file, so we need to start it with a Mach-O header. We’ll use the 64-bit Mach-O little-endian format, whose header looks like this:

1
    struct mach_header_64 {
2
        uint32_t    magic;      /* mach magic number identifier */
3
        cpu_type_t  cputype;    /* cpu specifier */
4
        cpu_subtype_t   cpusubtype; /* machine specifier */
5
        uint32_t    filetype;   /* type of file */
6
        uint32_t    ncmds;      /* number of load commands */
7
        uint32_t    sizeofcmds; /* the size of all the load commands */
8
        uint32_t    flags;      /* flags */
9
        uint32_t    reserved;   /* reserved */
10
    };
11

12
    /* Constant for the magic field of the mach_header_64 (64-bit architectures) */
13
    #define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
14
    #define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */

Here’s the nasm input for our Mach-O header:

1
    bits 64
2
    cpu x64
3

4
    __mh_execute_header:
5
        dd 0xfeedfacf   ; MH_MAGIC_64
6
        dd 16777223     ; CPU_TYPE_X86 | CPU_ARCH_ABI64
7
        dd 0x80000003   ; CPU_SUBTYPE_I386_ALL | CPU_SUBTYPE_LIB64
8
        dd 2            ; MH_EXECUTE
9
        dd 16           ; number of load commands
10
        dd ___loadcmdsend - ___loadcmdsstart    ; size of load commands
11
        dd 0x00200085   ; MH_NOUNDEFS | MH_DYLDLINK | MH_TWOLEVEL | MH_PIE
12
        dd 0            ; reserved
13
    ___loadcmdsstart:

The bits and cpu directives just tell nasm to run in 64-bit mode.

Immediately after the Mach-O header comes the load commands. There’s a whole list of commands which are required for an executable, and a huge pile more which might be in one. Clang produces 16 load commands for this executable. A load command looks like this:

1
    struct load_command {
2
        uint32_t cmd;       /* type of load command */
3
        uint32_t cmdsize;   /* total size of command in bytes */
4
    };

Each load command is actually larger than this; the cmd field tells the loader how to interpret the following data. Load commands must be aligned to an 8-byte boundary for 64-bit Mach-O files.

Segments and SectionsSegments are the blocks of data and code which dyld actually maps into memory at runtime. Sections are subdivisions of segments. Segments and sections both have names, and quite a few are standard and predefined.

Here’s our first segment command:

1
    ___pagezerostart:
2
        dd 0x19         ; LC_SEGMENT_64
3
        dd ___pagezeroend - ___pagezerostart    ; command size
4
        db '__PAGEZERO',0,0,0,0,0,0 ; segment name (pad to 16 bytes)
5
        dq 0            ; VM address
6
        dq 0x100000000  ; VM size
7
        dq 0            ; file offset
8
        dq 0            ; file size
9
        dd 0x0          ; VM_PROT_NONE (maximum protection)
10
        dd 0x0          ; VM_PROT_NONE (inital protection)
11
        dd 0            ; number of sections
12
        dd 0x0          ; flags
13
        align 8, db 0   ; pad with zero to 8-byte boundary
14
    ___pagezeroend:

This is the __PAGEZERO segment, which predefines the entire lower 4GB of the 64-bit virtual memory space as inaccessible. Because of this segment, which is marked unreadable, unwriteable, and nonexecutable, dereferencing NULL pointers causes an immediate segmentation fault.

The next segment command is more complicated:

1
    ___TEXTstart:
2
        dd 0x19         ; LC_SEGMENT_64
3
        dd ___TEXTend - ___TEXTstart    ; command size
4
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
5
        dq 0x100000000  ; VM address
6
        dq 0x1000       ; VM size
7
        dq 0            ; file offset
8
        dq 0x1000       ; file size
9
        dd 0x7          ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE
10
        dd 0x5          ; VM_PROT_READ | VM_PROT_EXECUTE
11
        dd 6            ; number of sections
12
        dd 0x0          ; flags
13
    ___TEXTtextstart:
14
        db '__text',0,0,0,0,0,0,0,0,0,0 ; section name (pad to 16 bytes)
15
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
16
        dq 0x100000000 + ___codestart - ___TEXTload ; address
17
        dq ___codeend - ___codestart    ; size
18
        dd ___codestart ; offset
19
        dd 0            ; alignment as power of 2 (1)
20
        dd 0            ; relocations data offset
21
        dd 0            ; number of relocations
22
        dd 0x80000400   ; S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS
23
        dd 0            ; reserved1
24
        dd 0            ; reserved2
25
        dd 0            ; reserved3
26
    ___TEXTstubsstart:
27
        db '__stubs',0,0,0,0,0,0,0,0,0  ; section name (pad to 16 bytes)
28
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
29
        dq 0x100000000 + ___stubstart - ___TEXTload ; address
30
        dq ___stubend - ___stubstart    ; size
31
        dd ___stubstart ; offset
32
        dd 1            ; alignment as power of 2 (2)
33
        dd 0            ; relocations data offset
34
        dd 0            ; number of relocations
35
        dd 0x80000408   ; S_SYMBOL_STUBS | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS
36
        dd 0            ; reserved1 (index into indirect symbol table)
37
        dd 6            ; reserved2 (size per stub)
38
        dd 0            ; reserved3
39
    ___TEXTstubhelperstart:
40
        db '__stub_helper',0,0,0    ; section name (pad to 16 bytes)
41
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
42
        dq 0x100000000 + ___stubhelpstart - ___TEXTload ; address
43
        dq ___stubhelpend - ___stubhelpstart    ; size
44
        dd ___stubhelpstart ; offset
45
        dd 2            ; alignment as power of 2 (4)
46
        dd 0            ; relocations data offset
47
        dd 0            ; number of relocations
48
        dd 0x80000400   ; S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS
49
        dd 0            ; reserved1
50
        dd 0            ; reserved2
51
        dd 0            ; reserved3
52
    ___TEXTcstringstart:
53
        db '__cstring',0,0,0,0,0,0,0    ; section name (pad to 16 bytes)
54
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
55
        dq 0x100000000 + ___strsstart - ___TEXTload ; address
56
        dq ___strsend - ___strsstart    ; size
57
        dd ___strsstart ; offset
58
        dd 0            ; alignment as power of 2 (1)
59
        dd 0            ; relocations data offset
60
        dd 0            ; number of relocations
61
        dd 0x00000002   ; S_CSTRING_LITERALS
62
        dd 0            ; reserved1
63
        dd 6            ; reserved2
64
        dd 0            ; reserved3
65
    ___TEXTunwindinfostart:
66
        db '__unwind_info',0,0,0    ; section name (pad to 16 bytes)
67
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
68
        dq 0x100000000 + ___uwstart - ___TEXTload   ; address
69
        dq ___uwend - ___uwstart    ; size
70
        dd ___uwstart   ; offset
71
        dd 0            ; alignment as power of 2 (1)
72
        dd 0            ; relocations data offset
73
        dd 0            ; number of relocations
74
        dd 0x00000000   ; no flags
75
        dd 0            ; reserved1
76
        dd 0            ; reserved2
77
        dd 0            ; reserved3
78
    ___TEXTehframestart:
79
        db '__eh_frame',0,0,0,0,0,0 ; section name (pad to 16 bytes)
80
        db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
81
        dq 0x100000000 + ___ehstart - ___TEXTload   ; address
82
        dq ___ehend - ___ehstart    ; size
83
        dd ___ehstart   ; offset
84
        dd 3            ; alignment as power of 2 (8)
85
        dd 0            ; relocations data offset
86
        dd 0            ; number of relocations
87
        dd 0x00000000   ; no flags
88
        dd 0            ; reserved1
89
        dd 0            ; reserved2
90
        dd 0            ; reserved3
91
        align 8, db 0   ; pad with zero to 8-byte boundary
92
    ___TEXTend:

So, this is the __TEXT segment, which covers all the executable code and a good bit of other data. It contains six sections. Each section is aligned according to its section information, and all the sections are shoved together at the end of the segment, such that the first quite-a-few bytes of __TEXT are zeroed. However, because of how the linker maps segments, __TEXT actually includes all the Mach-O headers. As we’ll see later, the symbol table even has its own entry for __mh_execute_header. Here are the sections:

__text - The actual code code of the executable, where all the functions are. In this case, just one function - main(). It’s marked as S_REGULAR, which means “it’s a plain old section”, and flagged as containing both “some instructions” (at least some executable code) and “pure instructions” (only executable code).
__stubs - The jump table which redirects into the lazy and non-lazy symbol sections. See my previous article for an explanation of the contents of this section. It’s marked as S_SYMBOL_STUBS, the meaning of which is fairly obvious.
__stub_helper - The helper function for lazy dynamically bound symbols.
__cstring - A section containing the read-only C string literals used within the code.
__unwind_info - The compact unwind information for the executable’s code. Generated for exception handling on OS X.
__eh_frame - The DWARF2 unwind information for the executable’s code. Generated for exception handling and debugging.

Next comes the __DATA segment:

1
    ___DATAstart:
2
        dd 0x19         ; LC_SEGMENT_64
3
        dd ___DATAend - ___DATAstart    ; command size
4
        db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
5
        dq 0x100001000  ; VM address
6
        dq 0x1000       ; VM size
7
        dq 0x1000       ; file offset
8
        dq 0x1000       ; file size
9
        dd 0x7          ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE
10
        dd 0x3          ; VM_PROT_READ | VM_PROT_WRITE
11
        dd 2            ; number of sections
12
        dd 0x0          ; flags
13
    ___DATAnlsymptrstart:
14
        db '__nl_symbol_ptr',0  ; section name (pad to 16 bytes)
15
        db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
16
        dq 0x100001000 + ___nlsymptrstart - ___DATAload ; address
17
        dq ___nlsymptrend - ___nlsymptrstart    ; size
18
        dd ___nlsymptrstart ; offset
19
        dd 3            ; alignment as power of 2 (8)
20
        dd 0            ; relocations data offset
21
        dd 0            ; number of relocations
22
        dd 0x00000006   ; S_NON_LAZY_SYMBOL_POINTERS
23
        dd 2            ; reserved1 (index into indirect symbol table)
24
        dd 0            ; reserved2
25
        dd 0            ; reserved3
26
    ___DATAlasymptrstart:
27
        db '__la_symbol_ptr',0  ; section name (pad to 16 bytes)
28
        db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes)
29
        dq 0x100001000 + ___lasymptrstart - ___DATAload ; address
30
        dq ___lasymptrend - ___lasymptrstart    ; size
31
        dd ___lasymptrstart ; offset
32
        dd 3            ; alignment as power of 2 (8)
33
        dd 0            ; relocations data offset
34
        dd 0            ; number of relocations
35
        dd 0x00000007   ; S_LAZY_SYMBOL_POINTERS
36
        dd 4            ; reserved1 (index into indirect symbol table)
37
        dd 0            ; reserved2
38
        dd 0            ; reserved3
39
        align 8, db 0   ; pad with zero to 8-byte boundary
40
    ___DATAend:

There’s only two sections here, since this program doesn’t have any global or static data: the non-lazy and lazy symbol stubs.

And then the last segment, __LINKEDIT:

1
    ___LINKEDITstart:
2
        dd 0x19         ; LC_SEGMENT_64
3
        dd ___LINKEDITend - ___LINKEDITstart    ; command size
4
        db '__LINKEDIT',0,0,0,0,0,0 ; segment name (pad to 16 bytes)
5
        dq 0x100002000  ; VM address
6
        dq 0x1000       ; VM size
7
        dq 0x2000       ; file offset
8
        dq ___LINKEDITdataend - ___LINKEDITdatastart    ; file size
9
        dd 0x7          ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE
10
        dd 0x1          ; VM_PROT_READ
11
        dd 0            ; number of sections
12
        dd 0x0          ; flags
13
        align 8, db 0   ; pad with zero to 8-byte boundary
14
    ___LINKEDITend:

The __LINKEDIT segment contains a variety of data used by dyld, such as the symbol table, the indirect symbol table, the rebase opcodes, the binding opcodes, the exports table, the function starts information, the data-in-code table, and some codesigning data.

Lots and Lots of Linker DataThe next several load commands deal with static and dynamic linking information:

1
    ___dyldinfostart:
2
        dd 0x80000022   ; LC_DYLD_INFO | LC_REQ_DYLD
3
        dd ___dyldinfoend - ___dyldinfostart    ; command size
4
        dd ___rebasestart   ; rebase info offset
5
        dd ___rebaseend - ___rebasestart    ; rebase info size
6
        dd ___bindstart ; binding info offset
7
        dd ___bindend - ___bindstart    ; binding info size
8
        dd 0            ; weak binding info offset
9
        dd 0            ; weak binding info size
10
        dd ___lazystart ; lazy binding info offset
11
        dd ___lazyend - ___lazystart    ; lazy binding info size
12
        dd ___exportstart   ; export info offset
13
        dd ___exportend - ___exportstart    ; export info size
14
        align 8, db 0   ; pad with zero to 8-byte boundary
15
    ___dyldinfoend:
16
    ___symtabinfostart:
17
        dd 0x2          ; LC_SYMTAB
18
        dd ___symtabinfoend - ___symtabinfostart    ; command size
19
        dd ___symtabstart   ; symbol table offset
20
        dd (___symtabend - ___symtabstart) >> 4 ; number of symbols
21
        dd ___strtabstart   ; string table offset
22
        dd ___strtabend - ___strtabstart    ; string table size
23
        align 8, db 0   ; pad with zero to 8-byte boundary
24
    ___symtabinfoend:
25
    ___dysymtabinfostart:
26
        dd 0xb          ; LC_DYSYMTAB
27
        dd ___dysymtabinfoend - ___dysymtabinfostart    ; command size
28
        dd 0            ; local symbols index
29
        dd 8            ; number of local symbols
30
        dd 8            ; external symbols index
31
        dd 2            ; number of external symbols
32
        dd 10           ; undefined symbols index
33
        dd 3            ; number of undefined symbols
34
        dd 0            ; table of contents offset
35
        dd 0            ; table of contents entries
36
        dd 0            ; module table offset
37
        dd 0            ; module table entries
38
        dd 0            ; external references table offset
39
        dd 0            ; external references table entries
40
        dd ___indirsymstart ; indirect symbol table offset
41
        dd (___indirsymend - ___indirsymstart) >> 2 ; indirect symbol table entries
42
        dd 0            ; local relocation table offset
43
        dd 0            ; local relocation table entries
44
        align 8, db 0   ; pad with zero to 8-byte boundary
45
    ___dysymtabinfoend:
46
    ___loaddylinkerstart:
47
        dd 0xe          ; LC_LOAD_DYLINKER
48
        dd ___loaddylinkerend - ___loaddylinkerstart    ; command size
49
        dd ___loaddylinkername - ___loaddylinkerstart   ; offset to name
50
    ___loaddylinkername:
51
        db '/usr/lib/dyld',0    ; name
52
        align 8, db 0   ; pad with zero to 8-byte boundary
53
    ___loaddylinkerend:
54
    ___maincmdstart:
55
        dd 0x80000028   ; LC_MAIN | LC_REQ_DYLD
56
        dd ___maincmdend - ___maincmdstart  ; command size
57
        dq _main        ; offset of main from start of __TEXT
58
        dq 0            ; stack size
59
        align 8, db 0   ; pad with zero to 8-byte boundary
60
    ___maincmdend:
61
    ___loadlibsystemstart:
62
        dd 0xc          ; LC_LOAD_DYLIB
63
        dd ___loadlibsystemend - ___loadlibsystemstart  ; command size
64
        dd ___loadlibsystemname - ___loadlibsystemstart ; offset to path
65
        dd 2            ; UNIX time stamp Wed Dec 31 19:00:02 1960
66
        dd 0x00a90300   ; current version (0.169.3.0)
67
        dd 0x00010000   ; compatibility version (0.1.0.0)
68
    ___loadlibsystemname:
69
        db '/usr/lib/libSystem.B.dylib' ; path
70
        align 8, db 0   ; pad with zero to 8-byte boundary
71
    ___loadlibsystemend:
72
    ___fstartscmdstart:
73
        dd 0x26         ; LC_FUNCTION_STARTS
74
        dd ___fstartscmdend - ___fstartscmdstart    ; command size
75
        dd ___functionstartsstart   ; offset to function starts data (fun label name, isn't it?)
76
        dd ___functionstartsend - ___functionstartsstart    ; size of function starts data (even more fun name!)
77
        align 8, db 0   ; pad with zero to 8-byte boundary
78
    ___fstartscmdend:
79
    ___datacodecmdstart:
80
        dd 0x29         ; LC_DATA_IN_CODE
81
        dd ___datacodecmdend - ___datacodecmdstart  ; command size
82
        dd ___datacodestart ; offset to data-in-code information
83
        dd ___datacodeend - ___datacodestart ; size of data-in-code information
84
        align 8, db 0   ; pad with zero to 8-byte boundary
85
    ___datacodecmdend:
86
    ___dycodesigncmdstart:
87
        dd 0x2b         ; LC_DYLIB_CODE_SIGN_DRS
88
        dd ___dycodesigncmdend - ___dycodesigncmdstart  ; command size
89
        dd ___dylibcodesignaturesstart  ; offset to code signatures from dylibs
90
        dd ___dylibcodesignaturesend - ___dylibcodesignaturesstart  ; you get the idea, right?
91
        align 8, db 0   ; pad with zero to 8-byte boundary
92
    ___dycodesigncmdend:

To summarize, this long blather of data consists of:

A list of dynamic linking info for the binary. This command, along with some others, is marked with LC_REQ_DYLD, meaning that if the version of dyld loading the binary doesn’t understand the command, it must give up right then rather than continue without the information.
The location of the symbol and strings tables. These are given as offsets from the beginning of the file, but it is understood that the data is contained within the __LINKEDIT segment. At runtime, dyld will perform the calculation symtable_base_address = linkedit_base_address + (symtab_offset - linkedit_offset) to get the actual location in memory of the symbol table. This is repeated similarly for the strings table, as well as the offsets given in the LC_DYLD_INFO and LC_DYSYMTAB commands.
A set of dynamic symbol data for the binary, giving the offsets and counts within the symbol table for various types of symbols.
The LC_LOAD_DYLINKER command which gives the hardcoded path for the dynamic linker to load the executable with. This is used by the kernel rather than the dynamic linker, which will run the specified program when the process is spawned. Don’t get the idea that you can use this to subvert the loading process, however; the kernel won’t let you pick just any dynamic linker.
LC_MAIN, a replacement for the older LC_UNIXTHREAD command. It used to be that executables were initialized with a thread state specified within the binary itself, but recently, someone realized this was a waste of time and space with dyld running early and the state being exactly the same in practically every executable. Instead, LC_MAIN gives the address of the entry point (main()) and dyld jumps right to that instead, also replacing the old crt1.o object which contained glue code to set up main().
LC_LOAD_DYLIB is the “I link to this dynamic library for some of my undefined symbols” command. This binary only links to libSystem.B.dylib, the OS X equivalent of libc.
LC_FUNCTION_STARTS is a table of data in the __LINKEDIT segment which gives the address of every function entry point in the executable. Among other things, this allows for functions to exist that have no entries in the symbol table.
LC_DATA_IN_CODE is similarly a table giving the locations of data bytes which are embedded within executable code. This is useful for any number of purposes, not the least of which is accurate disassembly.
LC_DYLIB_CODE_SIGN_DRS, finally, gives a list of designated requirements for each dynamic library linked with the executable. This allows the code signing machinery to determine the suitability of the executable without having to load every dynamic library it links to.

A Few More!Just when you thought we were done, there’re three more load commands we haven’t covered yet:

1
    ___uuidstart:
2
        dd 0x1b         ; LC_UUID
3
        dd ___uuidend - ___uuidstart    ; command size
4
        db 0xd3,0xec,0x58,0x28,0x02,0x26,0x36,0x29,0xab,0xc3,0x7d,0x6d,0xc9,0xf9,0x2d,0xda  ; D3EC5828-0226-3629-ABC3-7D6DC9F92DDA
5
        align 8, db 0   ; pad with zero to 8-byte boundary
6
    ___uuidend:
7
    ___osverstart:
8
        dd 0x24         ; LC_VERSION_MIN_MACOSX
9
        dd ___osverend - ___osverstart  ; command size
10
        dd 0x000a0800   ; OS min version: 10.8
11
        dd 0x000a0800   ; Build SDK version: 10.8
12
        align 8, db 0   ; pad with zero to 8-byte boundary
13
    ___osverend:
14
    ___sourceverstart:
15
        dd 0x2a         ; LC_SOURCE_VERSION
16
        dd ___sourceverend - ___sourceverstart  ; command size
17
        dq 0            ; Source version: 0.0.0.0.0
18
        align 8, db 0   ; pad with zero to 8-byte boundary
19
    ___sourceverend:
20
    ___loadcmdsend:

These are the binary’s UUID, the version of OS X it’s meant for, the version of the SDK it was linked against, and the “source version”. I can’t find any clue what the “source version” actually is, and it’s just a bunch of zeroes in the binaries I’ve looked at, so your guess is as good as mine.

Finally, Something Else!The first thing we do now is pad out the file to the start of main():

1
    ___TEXTload:
2
        times (0xf14-($-$$)) db 0   ; pad the __TEXT segment

You might ask why I didn’t write _main-($-$$) there, and hardcoded the start address. It certainly looks fragile. Well, it is. The problem is that nasm doesn’t provide a simple means to align data to the “end” of a segment, especially since we’re not using its built-in sectioning support. It doesn’t know where _main is until the padding has been added! In this case, I just hardcode the offset where main() starts (which is the exact value of the __TEXT,__text section’s addr field) and let it stand as a hack, rather than trying to figure out an elegant-but-complicated solution.

Now we take the data in order; we don’t even really have to do it in any particular order, since the labels we used in the load commands will relocate everything according to where we place it in the file, but there’s no reason not to. The first thing is __TEXT,__text, the executable code. Notice that we have to rewrite the original assembly code to nasm’s syntax - nasm uses the Intel syntax, rather than the GNU syntax. The major difference is that all the operands are backwards, and there’s no qualifier on the register names. All the various directives are also stripped out, since we’re doing their jobs by hand.

1
    ___codestart:
2
    _main:
3
        push    rbp
4
        mov     rbp, rsp
5
        xor     edi, edi
6
        call    _time
7
        lea     rdi, [rel L_str]
8
        mov     rsi, rax
9
        xor     al, al
10
        call    _printf
11
        xor     eax, eax
12
        pop     rbp
13
        ret
14
    ___codeend:

We also don’t have any size suffixes on the instructions, since nasm can infer them from the operands. The rel qualifier for the string load just tells nasm to generate a rip-relative access instead of an absolute position, which is necessary since we marked the executable as position-independent.

Next we have the symbol stubs for time() and printf(), and the stub helper:

1
    ___stubstart:
2
    _printf:
3
        jmp     [rel _lazy_printf]
4
    _time:
5
        jmp     [rel _lazy_time]
6
    ___stubend:
7

8
    ___stubhelpstart:
9
    _stub_helper:
10
        lea     r11, [rel _nonlazy_dyld_stub_binder]
11
        push    r11
12
        jmp     [rel _nonlazy_dyld_stub_binder]
13
        nop
14
        push    strict qword (_lazy_printf - ___lasymptrstart)
15
        jmp     _stub_helper
16
        push    strict qword (_lazy_time - ___lasymptrstart)
17
        jmp     _stub_helper
18
    ___stubhelpend:

The stubs themselves jump to the lazy symbol bindings in the __DATA segment. These initially jump right back into the bottom of _stub_helper, which loads the offset into the lazy symbol section of the symbol and calls into dyld itself through a nonlazy symbol (which will be bound by dyld when the executable is loaded). dyld will bind the symbol and rewrite the lazy symbol section so that future calls to that stub go directly to the function. Notice, these are all direct, non-conditional jumps, not subroutine calls. Also notice the use of the strict qword directives to force nasm to emit the full 64-bit values for the stack pushes.

Next comes the C strings section, very short and simple since we only have one string:

1
    ___strsstart:
2
    L_str:
3
        db      "Hello, world #%ld!\n",0
4
    ___strsend:

And now the unwinding table. This is encoded with the “compact unwind encoding” defined by Apple (as far as I know).

1
    ___uwstart:
2
        dd 1            ; unwind info version
3
        dd _commonEncodings - ___uwstart    ; common encodings array offset
4
        dd 0            ; count of common encodings
5
        dd _personalities - ___uwstart  ; personality array offset
6
        dd 0            ; count of personalities
7
        dd _index - ___uwstart  ; first-level index offset
8
        dd 2            ; count of entries in first-level index
9
    _commonEncodings:
10
    _personalities:
11
    _index:
12
    __entry1_0:
13
        dd _main        ; function offset
14
        dd __entry2_0 - ___uwstart  ; offset to second-level entry
15
        dd _lsda - ___uwstart   ; offset to language-specific data array entry
16
    __entry1_1:
17
        dd ___codeend+1 ; function offset (end of table)
18
        dd 0            ; offset to second-level entry - zero means end of table
19
        dd _lsda - ___uwstart   ; offset to LSDA
20
    _lsda:
21
    _pages:
22
    __entry2_0:
23
        dd 3            ; UNWIND_SECOND_LEVEL_COMPRESSED
24
        dw ___entrypage0 - __entry2_0   ; offset to entry page
25
        dw 1            ; number of entries in entry page
26
        dw ___enc0 - __entry2_0 ; offset to encoding page
27
        dw 1            ; number of entries in encoding page
28
    ___entrypage0:
29
    ____entrypage0_0:
30
        dd (0 << 24) | (0)  ; encoding index and function offset relative to first-level index offset
31
    ___enc0:
32
    ____enc0_0:
33
        dd 0x01000000   ; UNWIND_X86_64_MODE_RBP_FRAME | UNWIND_X86_64_REG_NONE
34
    ___uwend:

And then the DWARF-encoded version of the same information. To save everyone some time, I’m not going to write this part out with all the comments, because it’s complex and it just duplicates the unwinding info above in a much more verbose fashion.

1
    ___ehstart:
2
        db 0x14,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x7a,0x52,0x00,0x01,0x78,0x10,0x01
3
        db 0x10,0x0c,0x07,0x08,0x90,0x01,0x00,0x00,0x24,0x00,0x00,0x00,0x1c,0x00,0x00,0x00
4
        db 0x34,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x20,0x00,0x00,0x00,0x00,0x00,0x00,0x00
5
        db 0x00,0x41,0x0e,0x10,0x86,0x02,0x43,0x0d,0x06,0x00,0x00,0x00,0x00,0x00,0x00,0x00
6
    ___ehend:

Data, data, data… well, sort ofThat ends off the __TEXT segment. Now we have the __DATA segment, which contains the lazy and non-lazy symbol pointers:

1
    ___DATAload:
2

3
    ___nlsymptrstart:
4
    _nonlazy_dyld_stub_binder:
5
        dq 0x0000000000000000
6
    _nonlazy_table_start:
7
        dq 0x0000000000000000
8
    ___nlsymptrend:
9

10
    ___lasymptrstart:
11
    _lazy_printf:
12
        dq 0x100000000 + _stub_helper_printf
13
    _lazy_time:
14
        dq 0x100000000 + _stub_helper_time
15
    ___lasymptrend:

In a real executable, __DATA would usually also contain static data, space for globals, and some other stuff.

The link editor__LINKEDIT is a real pain, because it’s arbitrarily structured and the data within it isn’t always all that documented. I’ve done my best to represent what’s in it comprehensibly, but I can’t guarantee I’ve succeeded.

We start with the rebasing opcodes, which dyld uses when applying ASLR:

1
    ___rebasestart:
2
        db 0x10 | 0x01  ; REBASE_OPCODE_SET_TYPE_IMM | REBASE_TYPE_POINTER
3
        db 0x20 | 0x02  ; REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2)
4
        db 0x10         ; uleb128_encode(_lazy_printf - ___DATAload)
5
        db 0x50 | 0x02  ; REBASE_OPCODE_DO_REBASE_IMM_TIMES | 2
6
        align 8, db 0   ; pad with 0 to 8-byte boundary
7
    ___rebaseend:

This says, “using pointers, in the __DATA segment at offset 0x10, rebase 2 pointers based on the load address of that segment”.

Next come the binding opcodes and lazy binding opcodes:

1
    ___bindstart:
2
        db 0x11         ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1
3
        db 0x40         ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0
4
        db 'dyld_stub_binder',0 ; immediate operand
5
        db 0x51         ; BIND_OPCODE_SET_TYPE_IMM | BIND_TYPE_POINTER
6
        db 0x72         ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2)
7
        db 0x00         ; uleb128_encode(0)
8
        db 0x90         ; BIND_OPCODE_DO_BIND
9
        db 0x00         ; BIND_OPCODE_DONE
10
        align 8, db 0   ; pad with 0 to 8-byte boundary
11
    ___bindend:
12
    ___lazystart:
13
        db 0x72,0x10    ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2), uleb128_encode(0x10)
14
        db 0x11         ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1
15
        db 0x40,'_printf',0 ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0, '_printf'
16
        db 0x90,0x00    ; BIND_OPCODE_DO_BIND, BIND_OPCODE_DONE
17
        db 0x72,0x18    ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2), uleb128_encode(0x18)
18
        db 0x11         ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1
19
        db 0x40,'_time',0   ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0, '_time'
20
        db 0x90,0x00    ; BIND_OPCODE_DO_BIND, BIND_OPCODE_DONE
21
        align 8, db 0   ; pad with 0 to 8-byte boundary
22
    ___lazyend:

These opcodes bind a non-lazy symbol named dyld_stub_binder to offset 0 in the __DATA segment as a pointer. For lazy symbols, they bind a symbol named _printf to offset 0x10 in the __DATA segment and _time to offset 0x18.

And here’s the export trie:

1
    ___exportstart:
2
    _exnode0:
3
        db 0x00         ; terminal size
4
        db 0x01         ; child count
5
        db '_',0        ; name
6
        db _exnode1 - ___exportstart    ; child node offset
7
    _exnode1:
8
        db 0x00         ; terminal size
9
        db 0x02         ; child count
10
        db '_mh_execute_header',0   ; name
11
        db _exnode3 - ___exportstart    ; child node offset
12
    _exnode2:
13
        db 'main',0     ; name
14
        db _exnode4 - ___exportstart    ; child node offset
15
    _exnode3:
16
        db 0x02         ; terminal size
17
        db 0x00         ; flags
18
        db 0x00         ; address - uleb128_encode(0)
19
        db 0x00         ; child count
20
    _exnode4:
21
        db 0x03         ; terminal size
22
        db 0x00         ; flags
23
        db 0x94,0x1e    ; address - uleb128_encode(0xf14)
24
        db 0x00         ; child count
25
        align 8, db 0   ; pad with 0 to 8-byte boundary
26
    ___exportend:

This forms a trie, or prefix tree, for the two symbols exported by the executable, __mh_execute_header and _main.

Have the compressed function starts table, represented as a set of deltas to be added to the base code address:

1
    ___functionstartsstart:
2
        db 0x94         ; delta = 0x14, address  = ___codestart
3
        db 0x1e         ; delta = 0x1e, end
4
        align 8, db 0   ; pad with 0 to 8-byte boundary
5
    ___functionstartsend:

Here’s the data-in-code table. Whoops, there isn’t any in this executable, the load command’s just added anyway:

1
    ___datacodestart:
2
        align 8, db 0   ; pad with 0 to 8-byte boundary
3
    ___datacodeend:

How about some designated requirements for dylibs? I have no real idea what format this is in, I just interpreted it as best I could:

1
    ___dylibcodesignaturesstart:
2
        dd 1            ; count of code signatures (maybe?)
3
        dd 0            ; unknown
4
        dd 0x14         ; unknown
5
        db 0xfa,0xde,0x0c,0x00,0x00,0x00,0x00,0x28
6
        db 0x00,0x00,0x00,0x01,0x00,0x00,0x00,0x06
7
        db 0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x0b
8
        db 0x6c,0x69,0x62,0x53,0x79,0x73,0x74,0x65
9
        db 0x6d,0x2e,0x42,0x00,0x00,0x00,0x00,0x03  ; code signature for libSystem.B.dylib
10
        dd 0            ; unknown
11
        align 8, db 0   ; pad with 0 to 8-byte boundary
12
    ___dylibcodesignaturesend:

A symbol tableThe symbol table is where most the interesting stuff that’s left happens:

1
    ___symtabstart:
2
        dd L_srcdir - ___strtabstart    ; string table offset
3
        db 0x64         ; N_SO
4
        db 0x00         ; section 0
5
        dw 0x00         ; no desc
6
        dq 0            ; address 0
7
        dd L_srcfile - ___strtabstart   ; string table offset
8
        db 0x64         ; N_SO
9
        db 0x00         ; section 0
10
        dw 0x00         ; no desc
11
        dq 0            ; address 0
12
        dd L_objfile - ___strtabstart   ; string table offset
13
        db 0x66         ; N_OSO
14
        db 0x03         ; section 3
15
        dw 0x01         ; desc(?)
16
        dq 0x50b8c91f   ; st_mtime
17
        dd L_empty - ___strtabstart ; no string
18
        db 0x2e         ; N_BNSYM
19
        db 0x01         ; section 1
20
        dw 0x00         ; desc
21
        dq 0x100000000 + _main      ; start address
22
        dd L_main1 - ___strtabstart ; string table offset
23
        db 0x24         ; N_FUN
24
        db 0x01         ; section 1
25
        dw 0x00         ; desc
26
        dq 0x100000f14  ; start address
27
        dd L_empty - ___strtabstart ; no string
28
        db 0x24         ; N_FUN
29
        db 0x00         ; section 0
30
        dw 0x00         ; desc
31
        dq 0x20         ; address
32
        dd L_empty - ___strtabstart ; no string
33
        db 0x4e         ; N_ENSYM
34
        db 0x01         ; section 1
35
        dw 0x00         ; desc
36
        dw 0x20         ; address
37
    _sym_mh_execute_header:
38
        dd L_mhexechead - ___strtabstart    ; string table offset
39
        db 0x0f         ; N_SECT | N_EXT
40
        db 0x01         ; section 1
41
        dw 0x0010       ; REFERENCED_DYNAMICALLY
42
        dq 0x100000000 + __mh_execute_header    ; start address
43
    _sym_main:
44
        dd L_main2 - ___strtabstart ; string table offset
45
        db 0x0f         ; N_SECT | N_EXT
46
        dw 0x0000       ; no extra flags
47
        dq 0x100000000 + _main  ; start address
48
    _sym_printf:
49
        dd L_printf - ___strtabstart    ; string table offset
50
        db 0x01         ; N_UNDF | N_EXT
51
        dw 0x0100       ; dynamic library 1
52
        dq 0            ; address
53
    _sym_time:
54
        dd L_time - ___strtabstart  ; string table offset
55
        db 0x01         ; N_UNDF | N_EXT
56
        dw 0x0100       ; dynamic library 1
57
        dq 0            ; address
58
    _sym_dyld_stub_binder:
59
        dd L_binder - ___strtabstart    ; string table offset
60
        db 0x01         ; N_UNDF | N_EXT
61
        dw 0x0100       ; dynamic library 1
62
        dq 0            ; address
63
        align 8, db 0   ; pad with 0 to 8-byte boundary
64
    ___symtabend:
65

66
    ___indirsymstart:
67
        dd (_sym_printf - ___symtabstart) >> 4  ; index into symbol table
68
        dd (_sym_time - ___symtabstart) >> 4    ; index into symbol table
69
        dd (_sym_dyld_stub_binder - ___symtabstart) >> 4    ; index into symbol table
70
        dd 0x40000000   ; INDIRECT_SYMBOL_ABS
71
        dd (_sym_printf - ___symtabstart) >> 4  ; index into symbol table
72
        dd (_sym_time - ___symtabstart) >> 4    ; index into symbol table
73
        align 8, db 0   ; pad with 0 to 8-byte boundary
74
    ___indirsymend:
75

76
    ___strtabstart:
77
    L_spc:
78
        db ' '
79
    L_empty:
80
        db 0
81
    L_srcdir:
82
        db '/Users/gwynne/',0
83
    L_srcfile:
84
        db 'test.c',0
85
    L_objfile:
86
        db '/var/folders/b8/qgjb841d71d55cf8jh1myb540000gn/T/test-KyuIba.o',0
87
    L_main1:
88
        db '_main',0
89
    L_mhexechead:
90
        db '__mh_execute_header',0
91
    L_main2:
92
        db '_main',0
93
    L_printf:
94
        db '_printf',0
95
    L_time:
96
        db '_time',0
97
    L_binder:
98
        db 'dyld_stub_binder',0
99
        align 8, db 0   ; pad with 0 to 8-byte boundary
100
    ___strtabend:
101

102
    ___LINKEDITdataend:

Here you have the symbol table (including STABS entries), the indirect symbol table (which is nothing but a set of indexes into the symbol table which tell dyld how to use the symbol stubs in the event that the binding opcodes aren’t good enough - basically, legacy data), and the string table, which holds all the user-readable strings for the symbol table.

ConclusionThat is one long mess of mostly raw hexadecimal bytes. And here’s the punch line: As written here, it still doesn’t produce a working Mach-O binary!

Why not? Because I didn’t account for alignment requirements properly, and I ran out of time to fix the problem before the article had to go up. All the tables and structures here are correct, though, so hopefully, it’s still instructional as to just how much goes into even the simplest binary, and how much work you should be very glad ld and dyld are doing for you!

Thanks for reading, as always. I hope you enjoyed it!