译文 · 原文: Friday Q&A 2012-11-30: Let's Build A Mach-O Executable · 作者 Mike Ash
原文:https://www.mikeash.com/pyblog/friday-qa-2012-11-30-lets-build-a-mach-o-executable.html 发布:2012-11-30 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样
这算是对我上一篇文章《dyld: Dynamic Linking On OS X》的后续跟进,在那篇文章中我探讨了动态链接器 dyld 的工作原理。本周,我将重新实现编译器和静态链接器的功能,仅借助汇编器(assembler)的帮助,完全从零开始构建一个 Mach-O 二进制文件。
因事制宜的工具
在 OS X 上,从汇编语言输入生成二进制文件的最佳工具当然是汇编器 as。但是,如果你尝试用它构建原始二进制文件,你会发现 as 本身就充当了静态链接器的角色。这并非我们所需。
在这方面更灵活的工具是 nasm,即 Netwide Assembler(通用汇编器)。Xcode 命令行工具会安装 nasm,但遗憾的是,Apple 提供的是一个极其过时的版本 0.98.40,其错误修复可追溯到 2007 年,功能则停留在 1999 年的水平。撰写本文时最新的版本是 2.10.05,你可以通过 port install nasm、brew install nasm 或你选择的任何其他包管理器来安装。如果你不使用包管理器,也可以下载并自行编译源码。
nasm 2.x 包含诸多实用特性,例如 64 位支持和 Mach-O 输出。不过我们不会使用 nasm 的 Mach-O 支持功能,因为本次实践的初衷正是要手动完成这部分工作 —— 但若能直接使用 64 位指令构建 64 位二进制文件,而非将程序拆分成 32 位字来处理,那确实会更加理想!
重新引入素数程序
以下是我们将用于构建 Mach-O 二进制文件的 C 语言源代码。为保持最终生成的二进制文件结构相对简洁,我在编写时只引入了最少必要信息:
#define NULL ((void *)0L) extern int printf(const char * restrict format, ...); typedef long time_t; extern time_t time(time_t *sloc);
int main(void) { printf("Hello, world #%ld!\n", time(NULL)); return 0; }需要注意几点:
-
我没有使用
#include <stdio.h>和#include <time.h>,而是手动声明了printf()和time()函数,定义了time_t类型,并用宏定义了NULL。这样做可以避免为标准头文件中定义的各种内容生成额外的调试信息。 -
我将
main()定义为不接受任何参数。虽然这在常规实践中是极其糟糕的做法,但由于 C 语言的调用约定(calling conventions),它实际上能正确运行。 -
我使用了包含格式替换的格式字符串,这样我用来生成测试文件的编译器就不会过于” 高效” 地将其替换为
puts()调用。
这会产生以下汇编代码(使用 Clang 3.3svn 在 -Os 优化级别下构建):
.section __TEXT,__text,regular,pure_instructions .globl _main _main: ## @main .cfi_startproc ## BB#0: ## %entry pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp xorl %edi, %edi callq _time leaq L_.str(%rip), %rdi movq %rax, %rsi xorb %al, %al callq _printf xorl %eax, %eax popq %rbp ret .cfi_endproc
.section __TEXT,__cstring,cstring_literals L_.str: ## @.str .asciz "Hello, world #%ld!\n"
.subsections_via_symbols代码本身非常直接:在 __TEXT,__text 节内,建立栈帧,调用 time(),加载 L_.str 字符串,将 al 设为零,调用 printf,将 eax 归零,销毁栈帧,然后返回。接着,在 __TEXT,__cstring 节中,定义 L_.str 标签指向一个以零终止的 ASCII 字符串。最后,声明此文件中没有任何符号位于基本块内 —— 链接器会在死代码剥离时利用此信息。
其余的伪指令与调用帧信息(Call Frame Information)相关,这些信息用于展开数据(‘.unwind_info’ 和 .eh_frame,即异常处理支持)以及调试信息(.debug_frame)。我们将手动构建前两种。
为了保持清晰,我会省略完整的 DWARF 调试信息。即使对于这个非常简单的程序,它也会给这篇已经很长的文章带来相当多的内容。
Mach-O 可执行文件的起始
我们的 nasm 输入文件将用于生成一个 Mach-O 文件,因此需要以一个 Mach-O 头部(header)开始。我们将使用 64 位小端序(little-endian)的 Mach-O 格式,其头部如下所示:
struct mach_header_64 { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ uint32_t reserved; /* reserved */ };
/* Constant for the magic field of the mach_header_64 (64-bit architectures) */ #define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */ #define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */以下是用于我们 Mach-O(Mach 可执行文件格式)头的 nasm 输入:
bits 64 cpu x64
__mh_execute_header: dd 0xfeedfacf ; MH_MAGIC_64 dd 16777223 ; CPU_TYPE_X86 | CPU_ARCH_ABI64 dd 0x80000003 ; CPU_SUBTYPE_I386_ALL | CPU_SUBTYPE_LIB64 dd 2 ; MH_EXECUTE dd 16 ; number of load commands dd ___loadcmdsend - ___loadcmdsstart ; size of load commands dd 0x00200085 ; MH_NOUNDEFS | MH_DYLDLINK | MH_TWOLEVEL | MH_PIE dd 0 ; reserved ___loadcmdsstart:bits 和 cpu 指令只是告诉 nasm 以 64 位模式运行。
紧接在 Mach-O 头之后的是加载命令(load commands)。执行文件必须包含一系列必备的命令,而除此之外还可能存在大量其他命令。Clang 为此可执行文件生成了 16 条加载命令。一条加载命令看起来像这样:
struct load_command { uint32_t cmd; /* type of load command */ uint32_t cmdsize; /* total size of command in bytes */ };每个加载命令实际上比这更大;cmd 字段告诉加载器如何解释后续的数据。对于 64 位的 Mach-O 文件,加载命令必须对齐到 8 字节边界。
段和节
段(Segments)是 dyld 在运行时实际映射到内存中的数据块和代码块。节(Sections)是段的细分。段和节都有名称,并且其中相当多是标准且预定义的。
这是我们的第一个段命令:
___pagezerostart: dd 0x19 ; LC_SEGMENT_64 dd ___pagezeroend - ___pagezerostart ; command size db '__PAGEZERO',0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0 ; VM address dq 0x100000000 ; VM size dq 0 ; file offset dq 0 ; file size dd 0x0 ; VM_PROT_NONE (maximum protection) dd 0x0 ; VM_PROT_NONE (inital protection) dd 0 ; number of sections dd 0x0 ; flags align 8, db 0 ; pad with zero to 8-byte boundary ___pagezeroend:这是 __PAGEZERO 段,它预先将 64 位虚拟内存空间的低 4GB 定义为不可访问。由于这个段被标记为不可读、不可写且不可执行,解引用 NULL 指针会立即导致段错误(segmentation fault)。
下一个段命令则更为复杂:
___TEXTstart: dd 0x19 ; LC_SEGMENT_64 dd ___TEXTend - ___TEXTstart ; command size db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 ; VM address dq 0x1000 ; VM size dq 0 ; file offset dq 0x1000 ; file size dd 0x7 ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE dd 0x5 ; VM_PROT_READ | VM_PROT_EXECUTE dd 6 ; number of sections dd 0x0 ; flags ___TEXTtextstart: db '__text',0,0,0,0,0,0,0,0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___codestart - ___TEXTload ; address dq ___codeend - ___codestart ; size dd ___codestart ; offset dd 0 ; alignment as power of 2 (1) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x80000400 ; S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS dd 0 ; reserved1 dd 0 ; reserved2 dd 0 ; reserved3 ___TEXTstubsstart: db '__stubs',0,0,0,0,0,0,0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___stubstart - ___TEXTload ; address dq ___stubend - ___stubstart ; size dd ___stubstart ; offset dd 1 ; alignment as power of 2 (2) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x80000408 ; S_SYMBOL_STUBS | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS dd 0 ; reserved1 (index into indirect symbol table) dd 6 ; reserved2 (size per stub) dd 0 ; reserved3 ___TEXTstubhelperstart: db '__stub_helper',0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___stubhelpstart - ___TEXTload ; address dq ___stubhelpend - ___stubhelpstart ; size dd ___stubhelpstart ; offset dd 2 ; alignment as power of 2 (4) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x80000400 ; S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS dd 0 ; reserved1 dd 0 ; reserved2 dd 0 ; reserved3 ___TEXTcstringstart: db '__cstring',0,0,0,0,0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___strsstart - ___TEXTload ; address dq ___strsend - ___strsstart ; size dd ___strsstart ; offset dd 0 ; alignment as power of 2 (1) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000002 ; S_CSTRING_LITERALS dd 0 ; reserved1 dd 6 ; reserved2 dd 0 ; reserved3 ___TEXTunwindinfostart: db '__unwind_info',0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___uwstart - ___TEXTload ; address dq ___uwend - ___uwstart ; size dd ___uwstart ; offset dd 0 ; alignment as power of 2 (1) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000000 ; no flags dd 0 ; reserved1 dd 0 ; reserved2 dd 0 ; reserved3 ___TEXTehframestart: db '__eh_frame',0,0,0,0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___ehstart - ___TEXTload ; address dq ___ehend - ___ehstart ; size dd ___ehstart ; offset dd 3 ; alignment as power of 2 (8) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000000 ; no flags dd 0 ; reserved1 dd 0 ; reserved2 dd 0 ; reserved3 align 8, db 0 ; pad with zero to 8-byte boundary ___TEXTend:所以,这是 __TEXT 段,它覆盖了所有可执行代码以及大量其他数据。该段包含六个节(section)。每个节都按照其节信息进行对齐,并且所有节都被紧密排列在段的末尾,因此 __TEXT 的开头相当多字节都为零。然而,由于链接器映射段的方式,__TEXT 实际上包含了所有 Mach-O 头信息。正如我们稍后将看到的,符号表甚至有自己对 __mh_execute_header 的条目。以下是各个节:
-
__text - 可执行文件的实际代码,所有函数都存放于此。在这个例子中,只有一个函数 - main ()。它被标记为 S_REGULAR,意思是 “它是一个普通老式节”,并被标记为包含 “某些指令”(至少一些可执行代码)和 “纯指令”(仅包含可执行代码)。
-
__stubs - 跳转表,用于重定向到惰性(lazy)和非惰性(non-lazy)符号节。关于此节内容的解释,请参阅我之前的文章。它被标记为 S_SYMBOL_STUBS,其含义相当明显。
-
__stub_helper - 用于惰性动态绑定符号的辅助函数。
-
__cstring - 包含代码中使用的只读 C 字符串字面量的节。
-
__unwind_info - 可执行文件代码的紧凑型栈展开信息。用于 OS X 上的异常处理。
-
__eh_frame - 可执行文件代码的 DWARF2 栈展开信息。用于异常处理和调试。
接下来是 __DATA 段:
___DATAstart: dd 0x19 ; LC_SEGMENT_64 dd ___DATAend - ___DATAstart ; command size db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100001000 ; VM address dq 0x1000 ; VM size dq 0x1000 ; file offset dq 0x1000 ; file size dd 0x7 ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE dd 0x3 ; VM_PROT_READ | VM_PROT_WRITE dd 2 ; number of sections dd 0x0 ; flags ___DATAnlsymptrstart: db '__nl_symbol_ptr',0 ; section name (pad to 16 bytes) db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100001000 + ___nlsymptrstart - ___DATAload ; address dq ___nlsymptrend - ___nlsymptrstart ; size dd ___nlsymptrstart ; offset dd 3 ; alignment as power of 2 (8) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000006 ; S_NON_LAZY_SYMBOL_POINTERS dd 2 ; reserved1 (index into indirect symbol table) dd 0 ; reserved2 dd 0 ; reserved3 ___DATAlasymptrstart: db '__la_symbol_ptr',0 ; section name (pad to 16 bytes) db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100001000 + ___lasymptrstart - ___DATAload ; address dq ___lasymptrend - ___lasymptrstart ; size dd ___lasymptrstart ; offset dd 3 ; alignment as power of 2 (8) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000007 ; S_LAZY_SYMBOL_POINTERS dd 4 ; reserved1 (index into indirect symbol table) dd 0 ; reserved2 dd 0 ; reserved3 align 8, db 0 ; pad with zero to 8-byte boundary ___DATAend:这里只有两个段,因为该程序没有任何全局或静态数据:非惰性符号桩(non-lazy symbol stubs)和惰性符号桩(lazy symbol stubs)。
然后是最后一个段,__LINKEDIT:
___LINKEDITstart: dd 0x19 ; LC_SEGMENT_64 dd ___LINKEDITend - ___LINKEDITstart ; command size db '__LINKEDIT',0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100002000 ; VM address dq 0x1000 ; VM size dq 0x2000 ; file offset dq ___LINKEDITdataend - ___LINKEDITdatastart ; file size dd 0x7 ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE dd 0x1 ; VM_PROT_READ dd 0 ; number of sections dd 0x0 ; flags align 8, db 0 ; pad with zero to 8-byte boundary ___LINKEDITend:__LINKEDIT 段包含动态链接器(dyld,dynamic loader)使用的多种数据,例如符号表(symbol table)、间接符号表(indirect symbol table)、重定位操作码(rebase opcodes)、绑定操作码(binding opcodes)、导出表(exports table)、函数起始信息(function starts information)、代码内数据表(data-in-code table)以及部分代码签名数据(codesigning data)。
接下来的若干加载命令(load commands)将处理静态链接和动态链接信息:
___dyldinfostart: dd 0x80000022 ; LC_DYLD_INFO | LC_REQ_DYLD dd ___dyldinfoend - ___dyldinfostart ; command size dd ___rebasestart ; rebase info offset dd ___rebaseend - ___rebasestart ; rebase info size dd ___bindstart ; binding info offset dd ___bindend - ___bindstart ; binding info size dd 0 ; weak binding info offset dd 0 ; weak binding info size dd ___lazystart ; lazy binding info offset dd ___lazyend - ___lazystart ; lazy binding info size dd ___exportstart ; export info offset dd ___exportend - ___exportstart ; export info size align 8, db 0 ; pad with zero to 8-byte boundary ___dyldinfoend: ___symtabinfostart: dd 0x2 ; LC_SYMTAB dd ___symtabinfoend - ___symtabinfostart ; command size dd ___symtabstart ; symbol table offset dd (___symtabend - ___symtabstart) >> 4 ; number of symbols dd ___strtabstart ; string table offset dd ___strtabend - ___strtabstart ; string table size align 8, db 0 ; pad with zero to 8-byte boundary ___symtabinfoend: ___dysymtabinfostart: dd 0xb ; LC_DYSYMTAB dd ___dysymtabinfoend - ___dysymtabinfostart ; command size dd 0 ; local symbols index dd 8 ; number of local symbols dd 8 ; external symbols index dd 2 ; number of external symbols dd 10 ; undefined symbols index dd 3 ; number of undefined symbols dd 0 ; table of contents offset dd 0 ; table of contents entries dd 0 ; module table offset dd 0 ; module table entries dd 0 ; external references table offset dd 0 ; external references table entries dd ___indirsymstart ; indirect symbol table offset dd (___indirsymend - ___indirsymstart) >> 2 ; indirect symbol table entries dd 0 ; local relocation table offset dd 0 ; local relocation table entries align 8, db 0 ; pad with zero to 8-byte boundary ___dysymtabinfoend: ___loaddylinkerstart: dd 0xe ; LC_LOAD_DYLINKER dd ___loaddylinkerend - ___loaddylinkerstart ; command size dd ___loaddylinkername - ___loaddylinkerstart ; offset to name ___loaddylinkername: db '/usr/lib/dyld',0 ; name align 8, db 0 ; pad with zero to 8-byte boundary ___loaddylinkerend: ___maincmdstart: dd 0x80000028 ; LC_MAIN | LC_REQ_DYLD dd ___maincmdend - ___maincmdstart ; command size dq _main ; offset of main from start of __TEXT dq 0 ; stack size align 8, db 0 ; pad with zero to 8-byte boundary ___maincmdend: ___loadlibsystemstart: dd 0xc ; LC_LOAD_DYLIB dd ___loadlibsystemend - ___loadlibsystemstart ; command size dd ___loadlibsystemname - ___loadlibsystemstart ; offset to path dd 2 ; UNIX time stamp Wed Dec 31 19:00:02 1960 dd 0x00a90300 ; current version (0.169.3.0) dd 0x00010000 ; compatibility version (0.1.0.0) ___loadlibsystemname: db '/usr/lib/libSystem.B.dylib' ; path align 8, db 0 ; pad with zero to 8-byte boundary ___loadlibsystemend: ___fstartscmdstart: dd 0x26 ; LC_FUNCTION_STARTS dd ___fstartscmdend - ___fstartscmdstart ; command size dd ___functionstartsstart ; offset to function starts data (fun label name, isn't it?) dd ___functionstartsend - ___functionstartsstart ; size of function starts data (even more fun name!) align 8, db 0 ; pad with zero to 8-byte boundary ___fstartscmdend: ___datacodecmdstart: dd 0x29 ; LC_DATA_IN_CODE dd ___datacodecmdend - ___datacodecmdstart ; command size dd ___datacodestart ; offset to data-in-code information dd ___datacodeend - ___datacodestart ; size of data-in-code information align 8, db 0 ; pad with zero to 8-byte boundary ___datacodecmdend: ___dycodesigncmdstart: dd 0x2b ; LC_DYLIB_CODE_SIGN_DRS dd ___dycodesigncmdend - ___dycodesigncmdstart ; command size dd ___dylibcodesignaturesstart ; offset to code signatures from dylibs dd ___dylibcodesignaturesend - ___dylibcodesignaturesstart ; you get the idea, right? align 8, db 0 ; pad with zero to 8-byte boundary ___dycodesigncmdend:总结一下,这段冗长的数据说明包含:
-
该二进制文件的动态链接信息列表。这个命令与其他一些命令一同被标记为
LC_REQ_DYLD,这意味着如果加载该二进制文件的 dyld(动态链接器)版本不理解该命令,它必须立即放弃,而不是在缺少该信息的情况下继续执行。 -
符号表(symbol table)和字符串表(strings table)的位置。它们以文件起始处的偏移量给出,但可以理解为这些数据包含在
__LINKEDIT段内。在运行时,dyld 会执行计算symtable_base_address = linkedit_base_address + (symtab_offset - linkedit_offset)来获得符号表在内存中的实际位置。对于字符串表以及LC_DYLD_INFO和LC_DYSYMTAB命令中给出的偏移量,也会进行类似的计算。 -
一组该二进制文件的动态符号数据,给出了符号表中各种类型符号的偏移量和数量。
-
LC_LOAD_DYLINKER命令提供了用于加载可执行文件的动态链接器的硬编码路径。此命令供内核使用而非动态链接器本身,内核将在进程创建时运行指定的程序。但不要误以为可以利用此命令颠覆加载流程 —— 内核不允许随意选择动态链接器。 -
LC_MAIN是较旧的LC_UNIXTHREAD命令的替代方案。过去可执行文件需通过二进制文件内部指定的线程状态进行初始化,但近来有人意识到,随着 dyld(动态链接器)在早期介入运行,且几乎所有可执行文件的初始状态完全相同,这种做法纯属浪费时间和空间。因此LC_MAIN直接提供了入口点(main())的地址,dyld 将直接跳转到该地址,同时也替代了原先包含设置main()胶水代码的crt1.o对象文件。 -
LC_LOAD_DYLIB是” 我为部分未定义符号链接此动态库” 命令。当前二进制文件仅链接了libSystem.B.dylib,即 OS X 中相当于 libc 的系统库。(译注:现代 macOS 中该库名称可能已调整) -
LC_FUNCTION_STARTS是一个位于__LINKEDIT段中的数据表,它提供了可执行文件中每个函数入口点的地址。除了其他用途外,这使得那些在符号表中没有条目的函数也能够存在。 -
LC_DATA_IN_CODE类似地是一个数据表,它给出了嵌入在可执行代码内的数据字节位置。这对许多目的都很有用,其中至少包括实现精确的反汇编。 -
最后,
LC_DYLIB_CODE_SIGN_DRS提供了一个列表,列出了与该可执行文件链接的每个动态库的指定要求。这使得代码签名机制无需加载所链接的每个动态库,就能判断该可执行文件的适用性。
还有更多!就在你以为我们已经讲完时,还有三个我们尚未涉及的加载命令(load command):
___uuidstart: dd 0x1b ; LC_UUID dd ___uuidend - ___uuidstart ; command size db 0xd3,0xec,0x58,0x28,0x02,0x26,0x36,0x29,0xab,0xc3,0x7d,0x6d,0xc9,0xf9,0x2d,0xda ; D3EC5828-0226-3629-ABC3-7D6DC9F92DDA align 8, db 0 ; pad with zero to 8-byte boundary ___uuidend: ___osverstart: dd 0x24 ; LC_VERSION_MIN_MACOSX dd ___osverend - ___osverstart ; command size dd 0x000a0800 ; OS min version: 10.8 dd 0x000a0800 ; Build SDK version: 10.8 align 8, db 0 ; pad with zero to 8-byte boundary ___osverend: ___sourceverstart: dd 0x2a ; LC_SOURCE_VERSION dd ___sourceverend - ___sourceverstart ; command size dq 0 ; Source version: 0.0.0.0.0 align 8, db 0 ; pad with zero to 8-byte boundary ___sourceverend: ___loadcmdsend:这些是二进制文件的 UUID、其适用的 OS X 版本、链接时使用的 SDK 版本,以及 “源代码版本”。我找不到关于 “source version”(源代码版本)到底指什么的任何线索,而且在我查看的二进制文件中它全都是零,所以你的猜测和我的一样。
最后,还有另一件事!我们现在要做的第一件事是填充文件至 main() 函数的起始位置:
___TEXTload: times (0xf14-($-$$)) db 0 ; pad the __TEXT segment你可能会问为什么我不写 _main-($-$$) 而是硬编码起始地址。这看起来确实很脆弱。没错,确实如此。问题在于 nasm 没有提供将数据对齐到 segment(段)“末尾” 的简单方法,尤其是我们没有使用它的内置分段支持。在添加 padding(填充)之前,它根本不知道 _main 在哪里!在这种情况下,我只是硬编码了 main() 开始的偏移量(这正是 __TEXT,__text 段的 addr 字段的精确值),并把它作为一个 hack(权宜之计),而不是试图找出一个优雅但复杂的解决方案。
现在我们按顺序处理数据;实际上我们不需要严格遵循任何特定顺序,因为加载命令中使用的标签会根据我们在文件中的位置重新定位所有内容,但没有理由不这样做。首先是 __TEXT,__text 段,即可执行代码。请注意,我们必须将原始汇编代码重写为 nasm 语法 ——nasm 使用 Intel 语法而非 GNU 语法。主要区别在于所有操作数顺序相反,并且寄存器名称不带限定符。所有各种指导指令也被剥离,因为我们将手动完成它们的工作。
___codestart: _main: push rbp mov rbp, rsp xor edi, edi call _time lea rdi, [rel L_str] mov rsi, rax xor al, al call _printf xor eax, eax pop rbp ret ___codeend:我们也没有在指令上使用任何大小后缀,因为 nasm(汇编器)可以从操作数(operands)中推断它们。字符串加载的 rel 限定符(相对限定符)只是告诉 nasm 生成 rip 相对访问(rip-relative access)而不是绝对位置,这是必要的,因为我们已将可执行文件标记为位置无关(position-independent)。接下来,我们有 time () 和 printf () 的符号存根(symbol stubs),以及存根助手(stub helper):
___stubstart: _printf: jmp [rel _lazy_printf] _time: jmp [rel _lazy_time] ___stubend:
___stubhelpstart: _stub_helper: lea r11, [rel _nonlazy_dyld_stub_binder] push r11 jmp [rel _nonlazy_dyld_stub_binder] nop push strict qword (_lazy_printf - ___lasymptrstart) jmp _stub_helper push strict qword (_lazy_time - ___lasymptrstart) jmp _stub_helper ___stubhelpend:这些桩函数(stub)本身会跳转到 __DATA 段中的惰性符号绑定(lazy symbol binding)。这些绑定最初会直接跳转回 _stub_helper 的底部,该函数会加载符号在惰性符号节中的偏移量,并通过一个非惰性符号(会在可执行文件加载时由 dyld 绑定)调用 dyld 自身。dyld 将绑定该符号并重写惰性符号节,以便未来对该桩函数的调用直接指向目标函数。请注意,这些都是直接的、无条件的跳转,而非子程序调用。同时注意使用了严格的 qword 指令来强制 NASM(Netwide Assembler,一个汇编器)为栈推入操作生成完整的 64 位值。
接下来是 C 字符串段(C strings section),非常简短,因为我们只有一个字符串:
___strsstart: L_str: db "Hello, world #%ld!\n",0 ___strsend:现在来看展开表(unwinding table)。它是由苹果定义的” compact unwind encoding(紧凑展开编码)“来编码的(据我所知)。
___uwstart: dd 1 ; unwind info version dd _commonEncodings - ___uwstart ; common encodings array offset dd 0 ; count of common encodings dd _personalities - ___uwstart ; personality array offset dd 0 ; count of personalities dd _index - ___uwstart ; first-level index offset dd 2 ; count of entries in first-level index _commonEncodings: _personalities: _index: __entry1_0: dd _main ; function offset dd __entry2_0 - ___uwstart ; offset to second-level entry dd _lsda - ___uwstart ; offset to language-specific data array entry __entry1_1: dd ___codeend+1 ; function offset (end of table) dd 0 ; offset to second-level entry - zero means end of table dd _lsda - ___uwstart ; offset to LSDA _lsda: _pages: __entry2_0: dd 3 ; UNWIND_SECOND_LEVEL_COMPRESSED dw ___entrypage0 - __entry2_0 ; offset to entry page dw 1 ; number of entries in entry page dw ___enc0 - __entry2_0 ; offset to encoding page dw 1 ; number of entries in encoding page ___entrypage0: ____entrypage0_0: dd (0 << 24) | (0) ; encoding index and function offset relative to first-level index offset ___enc0: ____enc0_0: dd 0x01000000 ; UNWIND_X86_64_MODE_RBP_FRAME | UNWIND_X86_64_REG_NONE ___uwend:接下来是同样信息的 DWARF(调试信息格式)编码版本。为了节省大家的时间,我不打算把这部分完整写出来并附上全部注释,因为它很复杂,而且只是用一种更冗长的方式重复了上面提到的展开信息。
___ehstart: db 0x14,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x7a,0x52,0x00,0x01,0x78,0x10,0x01 db 0x10,0x0c,0x07,0x08,0x90,0x01,0x00,0x00,0x24,0x00,0x00,0x00,0x1c,0x00,0x00,0x00 db 0x34,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x20,0x00,0x00,0x00,0x00,0x00,0x00,0x00 db 0x00,0x41,0x0e,0x10,0x86,0x02,0x43,0x0d,0x06,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ___ehend:数据、数据、数据… 嗯,差不多就这样。那结束了 __TEXT 段。现在我们有 __DATA 段,其中包含惰性符号指针(lazy symbol pointers)和非惰性符号指针(non-lazy symbol pointers):
___DATAload:
___nlsymptrstart: _nonlazy_dyld_stub_binder: dq 0x0000000000000000 _nonlazy_table_start: dq 0x0000000000000000 ___nlsymptrend:
___lasymptrstart: _lazy_printf: dq 0x100000000 + _stub_helper_printf _lazy_time: dq 0x100000000 + _stub_helper_time ___lasymptrend:在一个真实的可执行文件中,__DATA 段通常还包含静态数据、全局变量的存储空间以及其他一些内容。
链接编辑器 __LINKEDIT 段非常棘手,因为它的结构是任意的,且其中的数据并非总是有详尽的文档记录。我已尽力以易于理解的方式呈现其内容,但无法保证我完全做到了这一点。
我们从重定位操作码(rebase opcodes)开始,这些操作码被 dyld(动态链接器)用于应用 ASLR(地址空间布局随机化)时。
___rebasestart: db 0x10 | 0x01 ; REBASE_OPCODE_SET_TYPE_IMM | REBASE_TYPE_POINTER db 0x20 | 0x02 ; REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2) db 0x10 ; uleb128_encode(_lazy_printf - ___DATAload) db 0x50 | 0x02 ; REBASE_OPCODE_DO_REBASE_IMM_TIMES | 2 align 8, db 0 ; pad with 0 to 8-byte boundary ___rebaseend:这段指令的含义是:“使用指针,在 __DATA 段的偏移量 0x10 处,基于该段的加载地址对 2 个指针进行重定位”。
接下来是绑定操作码(binding opcodes)和延迟绑定操作码(lazy binding opcodes):
___bindstart: db 0x11 ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1 db 0x40 ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0 db 'dyld_stub_binder',0 ; immediate operand db 0x51 ; BIND_OPCODE_SET_TYPE_IMM | BIND_TYPE_POINTER db 0x72 ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2) db 0x00 ; uleb128_encode(0) db 0x90 ; BIND_OPCODE_DO_BIND db 0x00 ; BIND_OPCODE_DONE align 8, db 0 ; pad with 0 to 8-byte boundary ___bindend: ___lazystart: db 0x72,0x10 ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2), uleb128_encode(0x10) db 0x11 ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1 db 0x40,'_printf',0 ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0, '_printf' db 0x90,0x00 ; BIND_OPCODE_DO_BIND, BIND_OPCODE_DONE db 0x72,0x18 ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2), uleb128_encode(0x18) db 0x11 ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1 db 0x40,'_time',0 ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0, '_time' db 0x90,0x00 ; BIND_OPCODE_DO_BIND, BIND_OPCODE_DONE align 8, db 0 ; pad with 0 to 8-byte boundary ___lazyend:这些操作码(opcodes)会将名为 dyld_stub_binder 的非惰性符号(non-lazy symbol)绑定到 __DATA 段的偏移量 0 处作为指针。对于惰性符号(lazy symbol),它们会将名为 _printf 的符号绑定到 __DATA 段的偏移量 0x10 处,并将 _time 绑定到偏移量 0x18 处。
以下是导出前缀树(export trie):
___exportstart: _exnode0: db 0x00 ; terminal size db 0x01 ; child count db '_',0 ; name db _exnode1 - ___exportstart ; child node offset _exnode1: db 0x00 ; terminal size db 0x02 ; child count db '_mh_execute_header',0 ; name db _exnode3 - ___exportstart ; child node offset _exnode2: db 'main',0 ; name db _exnode4 - ___exportstart ; child node offset _exnode3: db 0x02 ; terminal size db 0x00 ; flags db 0x00 ; address - uleb128_encode(0) db 0x00 ; child count _exnode4: db 0x03 ; terminal size db 0x00 ; flags db 0x94,0x1e ; address - uleb128_encode(0xf14) db 0x00 ; child count align 8, db 0 ; pad with 0 to 8-byte boundary ___exportend:这构成了一个 trie 树(前缀树),用于可执行文件导出的两个符号 __mh_execute_header 和 _main。
有一个压缩的函数起始表(function starts table),它表示为一组要加到基代码地址上的增量:
___functionstartsstart: db 0x94 ; delta = 0x14, address = ___codestart db 0x1e ; delta = 0x1e, end align 8, db 0 ; pad with 0 to 8-byte boundary ___functionstartsend:这里是数据嵌入代码的表格。哎呀,在这个可执行文件中实际上并没有这样的表格,加载命令只是被添加进来了:
___datacodestart: align 8, db 0 ; pad with 0 to 8-byte boundary ___datacodeend:动态库的指定要求 要不就试试为动态库(dylibs)制定一些指定要求?我其实不太确定这种格式具体该怎么写,只是尽力按照我的理解来解释:
___dylibcodesignaturesstart: dd 1 ; count of code signatures (maybe?) dd 0 ; unknown dd 0x14 ; unknown db 0xfa,0xde,0x0c,0x00,0x00,0x00,0x00,0x28 db 0x00,0x00,0x00,0x01,0x00,0x00,0x00,0x06 db 0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x0b db 0x6c,0x69,0x62,0x53,0x79,0x73,0x74,0x65 db 0x6d,0x2e,0x42,0x00,0x00,0x00,0x00,0x03 ; code signature for libSystem.B.dylib dd 0 ; unknown align 8, db 0 ; pad with 0 to 8-byte boundary ___dylibcodesignaturesend:符号表 符号表(symbol table)是剩余的大部分有趣内容所发生的地方:
___symtabstart: dd L_srcdir - ___strtabstart ; string table offset db 0x64 ; N_SO db 0x00 ; section 0 dw 0x00 ; no desc dq 0 ; address 0 dd L_srcfile - ___strtabstart ; string table offset db 0x64 ; N_SO db 0x00 ; section 0 dw 0x00 ; no desc dq 0 ; address 0 dd L_objfile - ___strtabstart ; string table offset db 0x66 ; N_OSO db 0x03 ; section 3 dw 0x01 ; desc(?) dq 0x50b8c91f ; st_mtime dd L_empty - ___strtabstart ; no string db 0x2e ; N_BNSYM db 0x01 ; section 1 dw 0x00 ; desc dq 0x100000000 + _main ; start address dd L_main1 - ___strtabstart ; string table offset db 0x24 ; N_FUN db 0x01 ; section 1 dw 0x00 ; desc dq 0x100000f14 ; start address dd L_empty - ___strtabstart ; no string db 0x24 ; N_FUN db 0x00 ; section 0 dw 0x00 ; desc dq 0x20 ; address dd L_empty - ___strtabstart ; no string db 0x4e ; N_ENSYM db 0x01 ; section 1 dw 0x00 ; desc dw 0x20 ; address _sym_mh_execute_header: dd L_mhexechead - ___strtabstart ; string table offset db 0x0f ; N_SECT | N_EXT db 0x01 ; section 1 dw 0x0010 ; REFERENCED_DYNAMICALLY dq 0x100000000 + __mh_execute_header ; start address _sym_main: dd L_main2 - ___strtabstart ; string table offset db 0x0f ; N_SECT | N_EXT dw 0x0000 ; no extra flags dq 0x100000000 + _main ; start address _sym_printf: dd L_printf - ___strtabstart ; string table offset db 0x01 ; N_UNDF | N_EXT dw 0x0100 ; dynamic library 1 dq 0 ; address _sym_time: dd L_time - ___strtabstart ; string table offset db 0x01 ; N_UNDF | N_EXT dw 0x0100 ; dynamic library 1 dq 0 ; address _sym_dyld_stub_binder: dd L_binder - ___strtabstart ; string table offset db 0x01 ; N_UNDF | N_EXT dw 0x0100 ; dynamic library 1 dq 0 ; address align 8, db 0 ; pad with 0 to 8-byte boundary ___symtabend:
___indirsymstart: dd (_sym_printf - ___symtabstart) >> 4 ; index into symbol table dd (_sym_time - ___symtabstart) >> 4 ; index into symbol table dd (_sym_dyld_stub_binder - ___symtabstart) >> 4 ; index into symbol table dd 0x40000000 ; INDIRECT_SYMBOL_ABS dd (_sym_printf - ___symtabstart) >> 4 ; index into symbol table dd (_sym_time - ___symtabstart) >> 4 ; index into symbol table align 8, db 0 ; pad with 0 to 8-byte boundary ___indirsymend:
___strtabstart: L_spc: db ' ' L_empty: db 0 L_srcdir: db '/Users/gwynne/',0 L_srcfile: db 'test.c',0 L_objfile: db '/var/folders/b8/qgjb841d71d55cf8jh1myb540000gn/T/test-KyuIba.o',0 L_main1: db '_main',0 L_mhexechead: db '__mh_execute_header',0 L_main2: db '_main',0 L_printf: db '_printf',0 L_time: db '_time',0 L_binder: db 'dyld_stub_binder',0 align 8, db 0 ; pad with 0 to 8-byte boundary ___strtabend:
___LINKEDITdataend:这里存放着符号表(包括 STABS 条目)、间接符号表(indirect symbol table)(这其实只是一组指向符号表的索引,用于在绑定操作码(binding opcodes)无法满足需求时告诉 dyld 如何使用符号桩 —— 本质上属于遗留数据),以及字符串表(string table),其中保存了符号表所有人类可读的字符串。
结论 以上是一大段主要由原始十六进制字节组成的混乱内容。而关键在于:即使按照这里写的方式生成,它仍然无法产出一个可运行的 Mach-O 二进制文件!
为什么呢?因为我没有正确处理对齐要求(alignment requirements),而且在文章发布前我没有足够的时间修复这个问题。不过这里所有的表格和结构都是正确的,因此希望它仍然具有教学意义 —— 即使是构造最简单的二进制文件也需要这么多步骤,你应该非常感谢 ld 和 dyld 为你承担了多少工作!
一如既往,感谢阅读。希望你喜欢这篇文章!
Original (English)
Source: https://www.mikeash.com/pyblog/friday-qa-2012-11-30-lets-build-a-mach-o-executable.html
This is something of a followup to my last article, dyld: Dynamic Linking On OS X, in which I explored how the dynamic linker dyld does its job. This week, I’m going to recreate the function of both the compiler and the static linker, building a Mach-O binary completely from scratch with only the help of the assembler.
The Right Tool For the Right JobThe best tool on OS X for producing binary files from assembly-language inputs is, of course, the assembler, as. But, if you try to build a raw binary from this, you’ll find that as also functions as a static linker in its own right. This isn’t what we’re after.
A more flexible tool, in this particular respect, is nasm, the Netwide Assembler. nasm is installed by the Xcode commandline tools, but unfortunately, Apple ships a horrifyingly outdated version, 0.98.40, which dates back to 2007 in terms of bug fixes, and to 1999 for features. The most recent version at the time of this writing is 2.10.05, which can be installed with port install nasm, brew install nasm, or whatever other package manager of your choice. If you don’t use a package manager, you can download and compile the source yourself.
nasm 2.x includes a number of useful things, like 64-bit support, and Mach-O output. We won’t be using nasm’s Mach-O support, since the point of all this is to do it by hand, but it’d be kind of nice to build a 64-bit binary using 64-bit instructions instead of split 32-bit words!
Reinserting the Prime ProgramHere’s the C source code for which we’ll build our Mach-O binary. To keep the resulting binary relatively simple, I’ve written it to avoid importing more than the bare minimum of information:
#define NULL ((void *)0L) extern int printf(const char * restrict format, ...); typedef long time_t; extern time_t time(time_t *sloc);
int main(void) { printf("Hello, world #%ld!\n", time(NULL)); return 0; }Some things to notice:
-
Rather than #include <stdio.h> and #include <time.h>, I’ve manually declared printf() and time(), defined the time_t type, and macroed NULL. This avoids emitting extra debug information for the various stuff defined in the standard headers.
-
I’ve defined main() as taking no parameters. This is extremely poor practice in general, but because of C’s calling conventions, it works correctly.
-
I’ve used a format string that actually does a format replacement so that the compiler with which I produced my test files doesn’t get all efficient and replace it with a puts() call instead.
This generates the following assembly (built with Clang 3.3svn at -Os):
.section __TEXT,__text,regular,pure_instructions .globl _main _main: ## @main .cfi_startproc ## BB#0: ## %entry pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp xorl %edi, %edi callq _time leaq L_.str(%rip), %rdi movq %rax, %rsi xorb %al, %al callq _printf xorl %eax, %eax popq %rbp ret .cfi_endproc
.section __TEXT,__cstring,cstring_literals L_.str: ## @.str .asciz "Hello, world #%ld!\n"
.subsections_via_symbolsThe code itself is very straightforward: Inside the __TEXT,_text section, set up a stack frame, call time(), load the L.str string, set al to zero, call printf, zero eax, tear down the stack frame, and return. Then, in the __TEXT,_cstring section, define the L.str label to point to a zero-terminated ASCII string. Finally, declare that no symbols in this file occur inside basic blocks, which the linker uses during dead code stripping.
The rest of the directives are related to Call Frame Information, which is used for unwinding data (‘.unwind_info’ and .eh_frame, exception handling support) and debug information (.debug_frame). We’ll be building the first two by hand.
For sanity’s sake, I’ll be omitting the full DWARF debugging information. Even for this very simple program it would represent a considerable addition to this already overlong article.
The Start of a Mach-O ExecutableOur nasm input file will be used to generate a Mach-O file, so we need to start it with a Mach-O header. We’ll use the 64-bit Mach-O little-endian format, whose header looks like this:
struct mach_header_64 { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ uint32_t reserved; /* reserved */ };
/* Constant for the magic field of the mach_header_64 (64-bit architectures) */ #define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */ #define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */Here’s the nasm input for our Mach-O header:
bits 64 cpu x64
__mh_execute_header: dd 0xfeedfacf ; MH_MAGIC_64 dd 16777223 ; CPU_TYPE_X86 | CPU_ARCH_ABI64 dd 0x80000003 ; CPU_SUBTYPE_I386_ALL | CPU_SUBTYPE_LIB64 dd 2 ; MH_EXECUTE dd 16 ; number of load commands dd ___loadcmdsend - ___loadcmdsstart ; size of load commands dd 0x00200085 ; MH_NOUNDEFS | MH_DYLDLINK | MH_TWOLEVEL | MH_PIE dd 0 ; reserved ___loadcmdsstart:The bits and cpu directives just tell nasm to run in 64-bit mode.
Immediately after the Mach-O header comes the load commands. There’s a whole list of commands which are required for an executable, and a huge pile more which might be in one. Clang produces 16 load commands for this executable. A load command looks like this:
struct load_command { uint32_t cmd; /* type of load command */ uint32_t cmdsize; /* total size of command in bytes */ };Each load command is actually larger than this; the cmd field tells the loader how to interpret the following data. Load commands must be aligned to an 8-byte boundary for 64-bit Mach-O files.
Segments and SectionsSegments are the blocks of data and code which dyld actually maps into memory at runtime. Sections are subdivisions of segments. Segments and sections both have names, and quite a few are standard and predefined.
Here’s our first segment command:
___pagezerostart: dd 0x19 ; LC_SEGMENT_64 dd ___pagezeroend - ___pagezerostart ; command size db '__PAGEZERO',0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0 ; VM address dq 0x100000000 ; VM size dq 0 ; file offset dq 0 ; file size dd 0x0 ; VM_PROT_NONE (maximum protection) dd 0x0 ; VM_PROT_NONE (inital protection) dd 0 ; number of sections dd 0x0 ; flags align 8, db 0 ; pad with zero to 8-byte boundary ___pagezeroend:This is the __PAGEZERO segment, which predefines the entire lower 4GB of the 64-bit virtual memory space as inaccessible. Because of this segment, which is marked unreadable, unwriteable, and nonexecutable, dereferencing NULL pointers causes an immediate segmentation fault.
The next segment command is more complicated:
___TEXTstart: dd 0x19 ; LC_SEGMENT_64 dd ___TEXTend - ___TEXTstart ; command size db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 ; VM address dq 0x1000 ; VM size dq 0 ; file offset dq 0x1000 ; file size dd 0x7 ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE dd 0x5 ; VM_PROT_READ | VM_PROT_EXECUTE dd 6 ; number of sections dd 0x0 ; flags ___TEXTtextstart: db '__text',0,0,0,0,0,0,0,0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___codestart - ___TEXTload ; address dq ___codeend - ___codestart ; size dd ___codestart ; offset dd 0 ; alignment as power of 2 (1) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x80000400 ; S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS dd 0 ; reserved1 dd 0 ; reserved2 dd 0 ; reserved3 ___TEXTstubsstart: db '__stubs',0,0,0,0,0,0,0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___stubstart - ___TEXTload ; address dq ___stubend - ___stubstart ; size dd ___stubstart ; offset dd 1 ; alignment as power of 2 (2) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x80000408 ; S_SYMBOL_STUBS | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS dd 0 ; reserved1 (index into indirect symbol table) dd 6 ; reserved2 (size per stub) dd 0 ; reserved3 ___TEXTstubhelperstart: db '__stub_helper',0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___stubhelpstart - ___TEXTload ; address dq ___stubhelpend - ___stubhelpstart ; size dd ___stubhelpstart ; offset dd 2 ; alignment as power of 2 (4) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x80000400 ; S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS dd 0 ; reserved1 dd 0 ; reserved2 dd 0 ; reserved3 ___TEXTcstringstart: db '__cstring',0,0,0,0,0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___strsstart - ___TEXTload ; address dq ___strsend - ___strsstart ; size dd ___strsstart ; offset dd 0 ; alignment as power of 2 (1) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000002 ; S_CSTRING_LITERALS dd 0 ; reserved1 dd 6 ; reserved2 dd 0 ; reserved3 ___TEXTunwindinfostart: db '__unwind_info',0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___uwstart - ___TEXTload ; address dq ___uwend - ___uwstart ; size dd ___uwstart ; offset dd 0 ; alignment as power of 2 (1) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000000 ; no flags dd 0 ; reserved1 dd 0 ; reserved2 dd 0 ; reserved3 ___TEXTehframestart: db '__eh_frame',0,0,0,0,0,0 ; section name (pad to 16 bytes) db '__TEXT',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100000000 + ___ehstart - ___TEXTload ; address dq ___ehend - ___ehstart ; size dd ___ehstart ; offset dd 3 ; alignment as power of 2 (8) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000000 ; no flags dd 0 ; reserved1 dd 0 ; reserved2 dd 0 ; reserved3 align 8, db 0 ; pad with zero to 8-byte boundary ___TEXTend:So, this is the __TEXT segment, which covers all the executable code and a good bit of other data. It contains six sections. Each section is aligned according to its section information, and all the sections are shoved together at the end of the segment, such that the first quite-a-few bytes of __TEXT are zeroed. However, because of how the linker maps segments, __TEXT actually includes all the Mach-O headers. As we’ll see later, the symbol table even has its own entry for __mh_execute_header. Here are the sections:
-
__text - The actual code code of the executable, where all the functions are. In this case, just one function - main(). It’s marked as S_REGULAR, which means “it’s a plain old section”, and flagged as containing both “some instructions” (at least some executable code) and “pure instructions” (only executable code).
-
__stubs - The jump table which redirects into the lazy and non-lazy symbol sections. See my previous article for an explanation of the contents of this section. It’s marked as S_SYMBOL_STUBS, the meaning of which is fairly obvious.
-
__stub_helper - The helper function for lazy dynamically bound symbols.
-
__cstring - A section containing the read-only C string literals used within the code.
-
__unwind_info - The compact unwind information for the executable’s code. Generated for exception handling on OS X.
-
__eh_frame - The DWARF2 unwind information for the executable’s code. Generated for exception handling and debugging.
Next comes the __DATA segment:
___DATAstart: dd 0x19 ; LC_SEGMENT_64 dd ___DATAend - ___DATAstart ; command size db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100001000 ; VM address dq 0x1000 ; VM size dq 0x1000 ; file offset dq 0x1000 ; file size dd 0x7 ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE dd 0x3 ; VM_PROT_READ | VM_PROT_WRITE dd 2 ; number of sections dd 0x0 ; flags ___DATAnlsymptrstart: db '__nl_symbol_ptr',0 ; section name (pad to 16 bytes) db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100001000 + ___nlsymptrstart - ___DATAload ; address dq ___nlsymptrend - ___nlsymptrstart ; size dd ___nlsymptrstart ; offset dd 3 ; alignment as power of 2 (8) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000006 ; S_NON_LAZY_SYMBOL_POINTERS dd 2 ; reserved1 (index into indirect symbol table) dd 0 ; reserved2 dd 0 ; reserved3 ___DATAlasymptrstart: db '__la_symbol_ptr',0 ; section name (pad to 16 bytes) db '__DATA',0,0,0,0,0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100001000 + ___lasymptrstart - ___DATAload ; address dq ___lasymptrend - ___lasymptrstart ; size dd ___lasymptrstart ; offset dd 3 ; alignment as power of 2 (8) dd 0 ; relocations data offset dd 0 ; number of relocations dd 0x00000007 ; S_LAZY_SYMBOL_POINTERS dd 4 ; reserved1 (index into indirect symbol table) dd 0 ; reserved2 dd 0 ; reserved3 align 8, db 0 ; pad with zero to 8-byte boundary ___DATAend:There’s only two sections here, since this program doesn’t have any global or static data: the non-lazy and lazy symbol stubs.
And then the last segment, __LINKEDIT:
___LINKEDITstart: dd 0x19 ; LC_SEGMENT_64 dd ___LINKEDITend - ___LINKEDITstart ; command size db '__LINKEDIT',0,0,0,0,0,0 ; segment name (pad to 16 bytes) dq 0x100002000 ; VM address dq 0x1000 ; VM size dq 0x2000 ; file offset dq ___LINKEDITdataend - ___LINKEDITdatastart ; file size dd 0x7 ; VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE dd 0x1 ; VM_PROT_READ dd 0 ; number of sections dd 0x0 ; flags align 8, db 0 ; pad with zero to 8-byte boundary ___LINKEDITend:The __LINKEDIT segment contains a variety of data used by dyld, such as the symbol table, the indirect symbol table, the rebase opcodes, the binding opcodes, the exports table, the function starts information, the data-in-code table, and some codesigning data.
Lots and Lots of Linker DataThe next several load commands deal with static and dynamic linking information:
___dyldinfostart: dd 0x80000022 ; LC_DYLD_INFO | LC_REQ_DYLD dd ___dyldinfoend - ___dyldinfostart ; command size dd ___rebasestart ; rebase info offset dd ___rebaseend - ___rebasestart ; rebase info size dd ___bindstart ; binding info offset dd ___bindend - ___bindstart ; binding info size dd 0 ; weak binding info offset dd 0 ; weak binding info size dd ___lazystart ; lazy binding info offset dd ___lazyend - ___lazystart ; lazy binding info size dd ___exportstart ; export info offset dd ___exportend - ___exportstart ; export info size align 8, db 0 ; pad with zero to 8-byte boundary ___dyldinfoend: ___symtabinfostart: dd 0x2 ; LC_SYMTAB dd ___symtabinfoend - ___symtabinfostart ; command size dd ___symtabstart ; symbol table offset dd (___symtabend - ___symtabstart) >> 4 ; number of symbols dd ___strtabstart ; string table offset dd ___strtabend - ___strtabstart ; string table size align 8, db 0 ; pad with zero to 8-byte boundary ___symtabinfoend: ___dysymtabinfostart: dd 0xb ; LC_DYSYMTAB dd ___dysymtabinfoend - ___dysymtabinfostart ; command size dd 0 ; local symbols index dd 8 ; number of local symbols dd 8 ; external symbols index dd 2 ; number of external symbols dd 10 ; undefined symbols index dd 3 ; number of undefined symbols dd 0 ; table of contents offset dd 0 ; table of contents entries dd 0 ; module table offset dd 0 ; module table entries dd 0 ; external references table offset dd 0 ; external references table entries dd ___indirsymstart ; indirect symbol table offset dd (___indirsymend - ___indirsymstart) >> 2 ; indirect symbol table entries dd 0 ; local relocation table offset dd 0 ; local relocation table entries align 8, db 0 ; pad with zero to 8-byte boundary ___dysymtabinfoend: ___loaddylinkerstart: dd 0xe ; LC_LOAD_DYLINKER dd ___loaddylinkerend - ___loaddylinkerstart ; command size dd ___loaddylinkername - ___loaddylinkerstart ; offset to name ___loaddylinkername: db '/usr/lib/dyld',0 ; name align 8, db 0 ; pad with zero to 8-byte boundary ___loaddylinkerend: ___maincmdstart: dd 0x80000028 ; LC_MAIN | LC_REQ_DYLD dd ___maincmdend - ___maincmdstart ; command size dq _main ; offset of main from start of __TEXT dq 0 ; stack size align 8, db 0 ; pad with zero to 8-byte boundary ___maincmdend: ___loadlibsystemstart: dd 0xc ; LC_LOAD_DYLIB dd ___loadlibsystemend - ___loadlibsystemstart ; command size dd ___loadlibsystemname - ___loadlibsystemstart ; offset to path dd 2 ; UNIX time stamp Wed Dec 31 19:00:02 1960 dd 0x00a90300 ; current version (0.169.3.0) dd 0x00010000 ; compatibility version (0.1.0.0) ___loadlibsystemname: db '/usr/lib/libSystem.B.dylib' ; path align 8, db 0 ; pad with zero to 8-byte boundary ___loadlibsystemend: ___fstartscmdstart: dd 0x26 ; LC_FUNCTION_STARTS dd ___fstartscmdend - ___fstartscmdstart ; command size dd ___functionstartsstart ; offset to function starts data (fun label name, isn't it?) dd ___functionstartsend - ___functionstartsstart ; size of function starts data (even more fun name!) align 8, db 0 ; pad with zero to 8-byte boundary ___fstartscmdend: ___datacodecmdstart: dd 0x29 ; LC_DATA_IN_CODE dd ___datacodecmdend - ___datacodecmdstart ; command size dd ___datacodestart ; offset to data-in-code information dd ___datacodeend - ___datacodestart ; size of data-in-code information align 8, db 0 ; pad with zero to 8-byte boundary ___datacodecmdend: ___dycodesigncmdstart: dd 0x2b ; LC_DYLIB_CODE_SIGN_DRS dd ___dycodesigncmdend - ___dycodesigncmdstart ; command size dd ___dylibcodesignaturesstart ; offset to code signatures from dylibs dd ___dylibcodesignaturesend - ___dylibcodesignaturesstart ; you get the idea, right? align 8, db 0 ; pad with zero to 8-byte boundary ___dycodesigncmdend:To summarize, this long blather of data consists of:
-
A list of dynamic linking info for the binary. This command, along with some others, is marked with LC_REQ_DYLD, meaning that if the version of dyld loading the binary doesn’t understand the command, it must give up right then rather than continue without the information.
-
The location of the symbol and strings tables. These are given as offsets from the beginning of the file, but it is understood that the data is contained within the __LINKEDIT segment. At runtime, dyld will perform the calculation symtable_base_address = linkedit_base_address + (symtab_offset - linkedit_offset) to get the actual location in memory of the symbol table. This is repeated similarly for the strings table, as well as the offsets given in the LC_DYLD_INFO and LC_DYSYMTAB commands.
-
A set of dynamic symbol data for the binary, giving the offsets and counts within the symbol table for various types of symbols.
-
The LC_LOAD_DYLINKER command which gives the hardcoded path for the dynamic linker to load the executable with. This is used by the kernel rather than the dynamic linker, which will run the specified program when the process is spawned. Don’t get the idea that you can use this to subvert the loading process, however; the kernel won’t let you pick just any dynamic linker.
-
LC_MAIN, a replacement for the older LC_UNIXTHREAD command. It used to be that executables were initialized with a thread state specified within the binary itself, but recently, someone realized this was a waste of time and space with dyld running early and the state being exactly the same in practically every executable. Instead, LC_MAIN gives the address of the entry point (main()) and dyld jumps right to that instead, also replacing the old crt1.o object which contained glue code to set up main().
-
LC_LOAD_DYLIB is the “I link to this dynamic library for some of my undefined symbols” command. This binary only links to libSystem.B.dylib, the OS X equivalent of libc.
-
LC_FUNCTION_STARTS is a table of data in the __LINKEDIT segment which gives the address of every function entry point in the executable. Among other things, this allows for functions to exist that have no entries in the symbol table.
-
LC_DATA_IN_CODE is similarly a table giving the locations of data bytes which are embedded within executable code. This is useful for any number of purposes, not the least of which is accurate disassembly.
-
LC_DYLIB_CODE_SIGN_DRS, finally, gives a list of designated requirements for each dynamic library linked with the executable. This allows the code signing machinery to determine the suitability of the executable without having to load every dynamic library it links to.
A Few More!Just when you thought we were done, there’re three more load commands we haven’t covered yet:
___uuidstart: dd 0x1b ; LC_UUID dd ___uuidend - ___uuidstart ; command size db 0xd3,0xec,0x58,0x28,0x02,0x26,0x36,0x29,0xab,0xc3,0x7d,0x6d,0xc9,0xf9,0x2d,0xda ; D3EC5828-0226-3629-ABC3-7D6DC9F92DDA align 8, db 0 ; pad with zero to 8-byte boundary ___uuidend: ___osverstart: dd 0x24 ; LC_VERSION_MIN_MACOSX dd ___osverend - ___osverstart ; command size dd 0x000a0800 ; OS min version: 10.8 dd 0x000a0800 ; Build SDK version: 10.8 align 8, db 0 ; pad with zero to 8-byte boundary ___osverend: ___sourceverstart: dd 0x2a ; LC_SOURCE_VERSION dd ___sourceverend - ___sourceverstart ; command size dq 0 ; Source version: 0.0.0.0.0 align 8, db 0 ; pad with zero to 8-byte boundary ___sourceverend: ___loadcmdsend:These are the binary’s UUID, the version of OS X it’s meant for, the version of the SDK it was linked against, and the “source version”. I can’t find any clue what the “source version” actually is, and it’s just a bunch of zeroes in the binaries I’ve looked at, so your guess is as good as mine.
Finally, Something Else!The first thing we do now is pad out the file to the start of main():
___TEXTload: times (0xf14-($-$$)) db 0 ; pad the __TEXT segmentYou might ask why I didn’t write _main-($-$$) there, and hardcoded the start address. It certainly looks fragile. Well, it is. The problem is that nasm doesn’t provide a simple means to align data to the “end” of a segment, especially since we’re not using its built-in sectioning support. It doesn’t know where _main is until the padding has been added! In this case, I just hardcode the offset where main() starts (which is the exact value of the __TEXT,__text section’s addr field) and let it stand as a hack, rather than trying to figure out an elegant-but-complicated solution.
Now we take the data in order; we don’t even really have to do it in any particular order, since the labels we used in the load commands will relocate everything according to where we place it in the file, but there’s no reason not to. The first thing is __TEXT,__text, the executable code. Notice that we have to rewrite the original assembly code to nasm’s syntax - nasm uses the Intel syntax, rather than the GNU syntax. The major difference is that all the operands are backwards, and there’s no qualifier on the register names. All the various directives are also stripped out, since we’re doing their jobs by hand.
___codestart: _main: push rbp mov rbp, rsp xor edi, edi call _time lea rdi, [rel L_str] mov rsi, rax xor al, al call _printf xor eax, eax pop rbp ret ___codeend:We also don’t have any size suffixes on the instructions, since nasm can infer them from the operands. The rel qualifier for the string load just tells nasm to generate a rip-relative access instead of an absolute position, which is necessary since we marked the executable as position-independent.
Next we have the symbol stubs for time() and printf(), and the stub helper:
___stubstart: _printf: jmp [rel _lazy_printf] _time: jmp [rel _lazy_time] ___stubend:
___stubhelpstart: _stub_helper: lea r11, [rel _nonlazy_dyld_stub_binder] push r11 jmp [rel _nonlazy_dyld_stub_binder] nop push strict qword (_lazy_printf - ___lasymptrstart) jmp _stub_helper push strict qword (_lazy_time - ___lasymptrstart) jmp _stub_helper ___stubhelpend:The stubs themselves jump to the lazy symbol bindings in the __DATA segment. These initially jump right back into the bottom of _stub_helper, which loads the offset into the lazy symbol section of the symbol and calls into dyld itself through a nonlazy symbol (which will be bound by dyld when the executable is loaded). dyld will bind the symbol and rewrite the lazy symbol section so that future calls to that stub go directly to the function. Notice, these are all direct, non-conditional jumps, not subroutine calls. Also notice the use of the strict qword directives to force nasm to emit the full 64-bit values for the stack pushes.
Next comes the C strings section, very short and simple since we only have one string:
___strsstart: L_str: db "Hello, world #%ld!\n",0 ___strsend:And now the unwinding table. This is encoded with the “compact unwind encoding” defined by Apple (as far as I know).
___uwstart: dd 1 ; unwind info version dd _commonEncodings - ___uwstart ; common encodings array offset dd 0 ; count of common encodings dd _personalities - ___uwstart ; personality array offset dd 0 ; count of personalities dd _index - ___uwstart ; first-level index offset dd 2 ; count of entries in first-level index _commonEncodings: _personalities: _index: __entry1_0: dd _main ; function offset dd __entry2_0 - ___uwstart ; offset to second-level entry dd _lsda - ___uwstart ; offset to language-specific data array entry __entry1_1: dd ___codeend+1 ; function offset (end of table) dd 0 ; offset to second-level entry - zero means end of table dd _lsda - ___uwstart ; offset to LSDA _lsda: _pages: __entry2_0: dd 3 ; UNWIND_SECOND_LEVEL_COMPRESSED dw ___entrypage0 - __entry2_0 ; offset to entry page dw 1 ; number of entries in entry page dw ___enc0 - __entry2_0 ; offset to encoding page dw 1 ; number of entries in encoding page ___entrypage0: ____entrypage0_0: dd (0 << 24) | (0) ; encoding index and function offset relative to first-level index offset ___enc0: ____enc0_0: dd 0x01000000 ; UNWIND_X86_64_MODE_RBP_FRAME | UNWIND_X86_64_REG_NONE ___uwend:And then the DWARF-encoded version of the same information. To save everyone some time, I’m not going to write this part out with all the comments, because it’s complex and it just duplicates the unwinding info above in a much more verbose fashion.
___ehstart: db 0x14,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x7a,0x52,0x00,0x01,0x78,0x10,0x01 db 0x10,0x0c,0x07,0x08,0x90,0x01,0x00,0x00,0x24,0x00,0x00,0x00,0x1c,0x00,0x00,0x00 db 0x34,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x20,0x00,0x00,0x00,0x00,0x00,0x00,0x00 db 0x00,0x41,0x0e,0x10,0x86,0x02,0x43,0x0d,0x06,0x00,0x00,0x00,0x00,0x00,0x00,0x00 ___ehend:Data, data, data… well, sort ofThat ends off the __TEXT segment. Now we have the __DATA segment, which contains the lazy and non-lazy symbol pointers:
___DATAload:
___nlsymptrstart: _nonlazy_dyld_stub_binder: dq 0x0000000000000000 _nonlazy_table_start: dq 0x0000000000000000 ___nlsymptrend:
___lasymptrstart: _lazy_printf: dq 0x100000000 + _stub_helper_printf _lazy_time: dq 0x100000000 + _stub_helper_time ___lasymptrend:In a real executable, __DATA would usually also contain static data, space for globals, and some other stuff.
The link editor__LINKEDIT is a real pain, because it’s arbitrarily structured and the data within it isn’t always all that documented. I’ve done my best to represent what’s in it comprehensibly, but I can’t guarantee I’ve succeeded.
We start with the rebasing opcodes, which dyld uses when applying ASLR:
___rebasestart: db 0x10 | 0x01 ; REBASE_OPCODE_SET_TYPE_IMM | REBASE_TYPE_POINTER db 0x20 | 0x02 ; REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2) db 0x10 ; uleb128_encode(_lazy_printf - ___DATAload) db 0x50 | 0x02 ; REBASE_OPCODE_DO_REBASE_IMM_TIMES | 2 align 8, db 0 ; pad with 0 to 8-byte boundary ___rebaseend:This says, “using pointers, in the __DATA segment at offset 0x10, rebase 2 pointers based on the load address of that segment”.
Next come the binding opcodes and lazy binding opcodes:
___bindstart: db 0x11 ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1 db 0x40 ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0 db 'dyld_stub_binder',0 ; immediate operand db 0x51 ; BIND_OPCODE_SET_TYPE_IMM | BIND_TYPE_POINTER db 0x72 ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2) db 0x00 ; uleb128_encode(0) db 0x90 ; BIND_OPCODE_DO_BIND db 0x00 ; BIND_OPCODE_DONE align 8, db 0 ; pad with 0 to 8-byte boundary ___bindend: ___lazystart: db 0x72,0x10 ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2), uleb128_encode(0x10) db 0x11 ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1 db 0x40,'_printf',0 ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0, '_printf' db 0x90,0x00 ; BIND_OPCODE_DO_BIND, BIND_OPCODE_DONE db 0x72,0x18 ; BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB | indexOfSegment(__DATA) (2), uleb128_encode(0x18) db 0x11 ; BIND_OPCODE_SET_DYLIB_ORDINAL_IMM | 1 db 0x40,'_time',0 ; BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM | 0, '_time' db 0x90,0x00 ; BIND_OPCODE_DO_BIND, BIND_OPCODE_DONE align 8, db 0 ; pad with 0 to 8-byte boundary ___lazyend:These opcodes bind a non-lazy symbol named dyld_stub_binder to offset 0 in the __DATA segment as a pointer. For lazy symbols, they bind a symbol named _printf to offset 0x10 in the __DATA segment and _time to offset 0x18.
And here’s the export trie:
___exportstart: _exnode0: db 0x00 ; terminal size db 0x01 ; child count db '_',0 ; name db _exnode1 - ___exportstart ; child node offset _exnode1: db 0x00 ; terminal size db 0x02 ; child count db '_mh_execute_header',0 ; name db _exnode3 - ___exportstart ; child node offset _exnode2: db 'main',0 ; name db _exnode4 - ___exportstart ; child node offset _exnode3: db 0x02 ; terminal size db 0x00 ; flags db 0x00 ; address - uleb128_encode(0) db 0x00 ; child count _exnode4: db 0x03 ; terminal size db 0x00 ; flags db 0x94,0x1e ; address - uleb128_encode(0xf14) db 0x00 ; child count align 8, db 0 ; pad with 0 to 8-byte boundary ___exportend:This forms a trie, or prefix tree, for the two symbols exported by the executable, __mh_execute_header and _main.
Have the compressed function starts table, represented as a set of deltas to be added to the base code address:
___functionstartsstart: db 0x94 ; delta = 0x14, address = ___codestart db 0x1e ; delta = 0x1e, end align 8, db 0 ; pad with 0 to 8-byte boundary ___functionstartsend:Here’s the data-in-code table. Whoops, there isn’t any in this executable, the load command’s just added anyway:
___datacodestart: align 8, db 0 ; pad with 0 to 8-byte boundary ___datacodeend:How about some designated requirements for dylibs? I have no real idea what format this is in, I just interpreted it as best I could:
___dylibcodesignaturesstart: dd 1 ; count of code signatures (maybe?) dd 0 ; unknown dd 0x14 ; unknown db 0xfa,0xde,0x0c,0x00,0x00,0x00,0x00,0x28 db 0x00,0x00,0x00,0x01,0x00,0x00,0x00,0x06 db 0x00,0x00,0x00,0x02,0x00,0x00,0x00,0x0b db 0x6c,0x69,0x62,0x53,0x79,0x73,0x74,0x65 db 0x6d,0x2e,0x42,0x00,0x00,0x00,0x00,0x03 ; code signature for libSystem.B.dylib dd 0 ; unknown align 8, db 0 ; pad with 0 to 8-byte boundary ___dylibcodesignaturesend:A symbol tableThe symbol table is where most the interesting stuff that’s left happens:
___symtabstart: dd L_srcdir - ___strtabstart ; string table offset db 0x64 ; N_SO db 0x00 ; section 0 dw 0x00 ; no desc dq 0 ; address 0 dd L_srcfile - ___strtabstart ; string table offset db 0x64 ; N_SO db 0x00 ; section 0 dw 0x00 ; no desc dq 0 ; address 0 dd L_objfile - ___strtabstart ; string table offset db 0x66 ; N_OSO db 0x03 ; section 3 dw 0x01 ; desc(?) dq 0x50b8c91f ; st_mtime dd L_empty - ___strtabstart ; no string db 0x2e ; N_BNSYM db 0x01 ; section 1 dw 0x00 ; desc dq 0x100000000 + _main ; start address dd L_main1 - ___strtabstart ; string table offset db 0x24 ; N_FUN db 0x01 ; section 1 dw 0x00 ; desc dq 0x100000f14 ; start address dd L_empty - ___strtabstart ; no string db 0x24 ; N_FUN db 0x00 ; section 0 dw 0x00 ; desc dq 0x20 ; address dd L_empty - ___strtabstart ; no string db 0x4e ; N_ENSYM db 0x01 ; section 1 dw 0x00 ; desc dw 0x20 ; address _sym_mh_execute_header: dd L_mhexechead - ___strtabstart ; string table offset db 0x0f ; N_SECT | N_EXT db 0x01 ; section 1 dw 0x0010 ; REFERENCED_DYNAMICALLY dq 0x100000000 + __mh_execute_header ; start address _sym_main: dd L_main2 - ___strtabstart ; string table offset db 0x0f ; N_SECT | N_EXT dw 0x0000 ; no extra flags dq 0x100000000 + _main ; start address _sym_printf: dd L_printf - ___strtabstart ; string table offset db 0x01 ; N_UNDF | N_EXT dw 0x0100 ; dynamic library 1 dq 0 ; address _sym_time: dd L_time - ___strtabstart ; string table offset db 0x01 ; N_UNDF | N_EXT dw 0x0100 ; dynamic library 1 dq 0 ; address _sym_dyld_stub_binder: dd L_binder - ___strtabstart ; string table offset db 0x01 ; N_UNDF | N_EXT dw 0x0100 ; dynamic library 1 dq 0 ; address align 8, db 0 ; pad with 0 to 8-byte boundary ___symtabend:
___indirsymstart: dd (_sym_printf - ___symtabstart) >> 4 ; index into symbol table dd (_sym_time - ___symtabstart) >> 4 ; index into symbol table dd (_sym_dyld_stub_binder - ___symtabstart) >> 4 ; index into symbol table dd 0x40000000 ; INDIRECT_SYMBOL_ABS dd (_sym_printf - ___symtabstart) >> 4 ; index into symbol table dd (_sym_time - ___symtabstart) >> 4 ; index into symbol table align 8, db 0 ; pad with 0 to 8-byte boundary ___indirsymend:
___strtabstart: L_spc: db ' ' L_empty: db 0 L_srcdir: db '/Users/gwynne/',0 L_srcfile: db 'test.c',0 L_objfile: db '/var/folders/b8/qgjb841d71d55cf8jh1myb540000gn/T/test-KyuIba.o',0 L_main1: db '_main',0 L_mhexechead: db '__mh_execute_header',0 L_main2: db '_main',0 L_printf: db '_printf',0 L_time: db '_time',0 L_binder: db 'dyld_stub_binder',0 align 8, db 0 ; pad with 0 to 8-byte boundary ___strtabend:
___LINKEDITdataend:Here you have the symbol table (including STABS entries), the indirect symbol table (which is nothing but a set of indexes into the symbol table which tell dyld how to use the symbol stubs in the event that the binding opcodes aren’t good enough - basically, legacy data), and the string table, which holds all the user-readable strings for the symbol table.
ConclusionThat is one long mess of mostly raw hexadecimal bytes. And here’s the punch line: As written here, it still doesn’t produce a working Mach-O binary!
Why not? Because I didn’t account for alignment requirements properly, and I ran out of time to fix the problem before the article had to go up. All the tables and structures here are correct, though, so hopefully, it’s still instructional as to just how much goes into even the simplest binary, and how much work you should be very glad ld and dyld are doing for you!
Thanks for reading, as always. I hope you enjoyed it!