环形缓冲区与镜像内存(上)

Mike Ash Friday Q&A 中文译文:环形缓冲区与镜像内存(上)

作者 TommyWu
封面圖片: 环形缓冲区与镜像内存(上)

译文 · 原文: Friday Q&A 2012-02-03: Ring Buffers and Mirrored Memory: Part I · 作者 Mike Ash

原文:https://www.mikeash.com/pyblog/friday-qa-2012-02-03-ring-buffers-and-mirrored-memory-part-i.html 发布:2012-02-03 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样


探索虚拟内存的应用总是很有趣,其中环形缓冲区(ring buffer)的构建就是一个很好的应用场景。环形缓冲区是一种实现数据 FIFO(先进先出)队列的方式,通过虚拟内存技巧将相同数据映射为多个副本,可以在逻辑上拼接不连续的数据,从而使实现更简洁高效。读者 Jose Vazquez 和 Dylan Lukes 建议我探讨这种构造的构建方法。今天我将讲解虚拟内存技巧的实现原理,随后在第二部分展示如何在此基础上实现环形缓冲区。

环形缓冲区环形缓冲区是一种实现快速 FIFO(先进先出)队列的方式。其工作原理是分配一块内存区域并追踪读写指针。这些指针随着数据的读写操作而移动。使其成为” 环形” 缓冲区的关键在于:当任一指针到达内存区域末端时,会自动环绕回起始位置。

为便于说明,我们假设环形缓冲区初始为空:

read
|
v
+----------+
| |
+----------+
^
|
write

当数据被写入时,数据会填满缓冲区,随后写指针向前移动:

read
|
v
+----------+
|xxxxxxx |
+----------+
^
|
write

当数据被读取时,读指针会跟随移动:

read
|
v
+----------+
| xx |
+----------+
^
|
write

当写指针(write pointer)移至末端时,它会环绕(wrap)回到开头。

read
|
v
+----------+
|xxx xxxxx|
+----------+
^
|
write

此时,环形缓冲区(ring buffer)中的数据已不再连续,因为数据延伸到末尾后会绕回到开头。环形缓冲区的代码在写入时必须将数据拆分存储,而在读取时又要将其重新拼合起来。进一步读取之后,读指针(read pointer)也会环绕一圈,此时缓冲区的状态如下所示:

read
|
v
+----------+
| xx |
+----------+
^
|
write

这种结构可以无限循环下去,每次到达末尾时就会重新绕回起点。

这种结构非常适合任何数据被持续写入后不久又被读取的场景。它相对容易实现线程安全,因此特别适用于线程间的实时数据通信。例如,环形缓冲区常用于将音频数据传递到播放线程,以及从录音线程接收数据。

镜像技术

这种技术效果良好且应用广泛。然而,回绕(wraparound)行为使情况变得稍微复杂。它增加了读写代码的复杂度,因为代码必须检查回绕情况,然后将读写操作分两次进行。更糟糕的是,回绕强制要求在将数据写入或读出环形缓冲区(ring buffer)时进行内存拷贝。由于数据不一定连续,因此无法向外部暴露指向数据的指针。例如,如果从文件读取数据到环形缓冲区,能够直接将环形缓冲区的写指针传递给读取函数会很方便,但如果空闲空间发生回绕,这就无法轻松实现。相反,你不得不通过多次读取来使情况复杂化,或者先读取到一个单独的缓冲区,然后再将内存拷贝过去。

想象一下,如果能够镜像(mirror)一块内存,使其同时出现在两个地方,那该多好。对其中一个位置的任何更改都会立即反映在另一个位置。例如,开始时是空的:

+----------+ +----------+
| | | |
+----------+ +----------+

于是当向其中一个副本写入数据时,另一个副本也能立即看到这些数据变化:

+----------+ +----------+
| xx | | xx |
+----------+ +----------+

让我们对环形缓冲区的内存进行这种操作。此外,让我们将两个镜像副本安排得直接并排排列:

read
|
v
+----------+----------+
|xxx xxxxx|xxx xxxxx|
+----------+----------+
^
|
write

来看一下!尽管数据在环形缓冲区中进行了环绕,但它们现在是连续的,因为第二个副本直接与第一个相连。即使数据发生了环绕,我们依然可以将指向它的指针传递给任何需要连续数据的地方。即使数据环绕,它仍是连续的,这归功于镜像机制。

同样的原理也适用于空闲空间,当缓冲区的数据没有发生环绕时:

read
|
v
+----------+----------+
| xx | xx |
+----------+----------+
^
|
write

再次,空闲空间是连续的,大大简化了事情。这个指针可以直接传递给读取调用或其他任何想要写入单个内存块的东西。

代码:是时候深入代码了。和往常一样,今天冒险的代码可以在 GitHub 上找到:

https://github.com/mikeash/MAMirroredQueue

使用 Mach 进行镜像

能够像这样镜像内存会非常方便。好消息!使用 Mach 虚拟内存 API(虚拟内存应用程序接口),你可以做到!

当处理虚拟内存(virtual memory)时,一切都与内存页(memory pages)相关。一个内存页是虚拟内存系统处理的一块方便大小的内存,通常是 4096 字节。内存页是虚拟内存处理的最小单位。如果你映射或取消映射内存,你只能处理整数个内存页。

虽然在任何你可能遇到的系统上,页面大小都是 4096 字节,但硬编码这个值并不是好做法。系统的页面大小可以从 vm_page_size 全局变量获取。这里有一个简单的包装器,它将全局变量隐藏在函数调用背后:

size_t get_page_size(void)
{
return vm_page_size;
}

现在,我们来讨论如何实现实际的镜像内存分配器。Mach 虚拟内存函数(Mach virtual memory functions)允许分配一块内存、将一块内存重映射到指定位置,以及释放一块内存。关键的一点是,与 mallocfree 不同,你可以只释放部分已分配的内存。换句话说,你可以分配两个页面,然后只释放其中一个,而另一个页面仍然有效。

这正是使我们能够获得一块连续镜像内存的方法。如果我们只是分配一块内存,然后尝试将其重映射到紧随其后的空间,这很可能会失败,因为那个空间很可能已经被占用。你无法直接要求系统在某个后面跟着大量空闲空间的区域分配内存。不过,可以通过间接的方法做到这一点。以下是具体操作:

  • 分配所需内存两倍大小的空间。

  • 释放刚才分配的内存的后半部分。

  • 将前半部分重映射到现在已空闲的后半部分空间。

这里存在一个竞态条件(race condition),在步骤 2 和 3 之间,另一个线程可能在新释放的后半部分内存中进行分配。不过这并非灾难。若发生这种情况,步骤 3 将失败,我们可以直接释放内存并重试整个操作。由于竞态窗口很小,该流程通常无需多次尝试即可成功。

让我们看看实现代码。我没有仅限于镜像对分配,而是将此函数设计为能够分配任意数量的连续镜像内存块。该函数接受需要分配的内存量和要创建的镜像副本数量。需要分配的内存量必须是整数个页面(page),为避免后续出现意外问题,我会检查此条件,若不符则返回错误:

void *allocate_mirrored(size_t howmuch, unsigned howmany)
{
// make sure it's positive and an exact multiple of page size
if(howmuch <= 0 || howmuch != howmuch / get_page_size() * get_page_size())
{
errno = EINVAL;
return NULL;
}

既然大小已确认无误,我们开始循环。mem 变量将用于存放分配到的指针。我们先将其初始化为 NULL,然后持续尝试直到成功。

char *mem = NULL;
while(mem == NULL)
{

我们需要对这些 Mach 调用的结果进行大量错误检查。在初始内存块分配完成后,处理错误意味着在返回错误前要释放仍然已分配的内存。在进入实际代码前,我们需要一个宏来集中处理这种错误检查。

CHECK_ERR 宏会检查一个表达式是否等于 KERN_SUCCESS 并处理清理工作。它接受第二个参数,即需要释放的字节数。如果初始分配失败,则不需要释放任何内存。后续的失败则需要根据当时实际已分配的情况,要么释放整个内存块(包括所有镜像页面),要么仅释放部分内存。如果需要,在释放内存后,宏会设置 errno(错误码)并返回 NULL:

#define CHECK_ERR(expr, todealloc) do { \
kern_return_t __check_err = (expr); \
if(__check_err != KERN_SUCCESS) \
{ \
if(todealloc > 0) \
vm_deallocate(mach_task_self(), (vm_address_t)mem, (todealloc)); \
errno = ENOMEM; \
return NULL; \
} \
} while(0)

在这些准备完成后,我们终于可以执行此函数中的第一个 Mach 系统调用,以完成初始内存分配:

CHECK_ERR(vm_allocate(mach_task_self(), (vm_address_t *)&mem, howmuch * howmany, VM_FLAGS_ANYWHERE), 0);

如果这次分配失败,无需释放任何内存,因此该宏的第二个参数设为 0。接下来,我们将释放第一个块之后所有新分配的空间。首先,通过将 howmuch 加到新分配的指针上来计算剩余部分的位置,然后使用 vm_deallocate 释放 howmany - 1 份副本:

char *target = mem + howmuch;
CHECK_ERR(vm_deallocate(mach_task_self(), (vm_address_t)target, howmuch * (howmany - 1)), howmuch * howmany);

如果此操作失败,那么我们应当释放整个已分配区域。实际上这种情况不会发生,即便真的发生,尝试释放整个区域也不太可能成功,但彻底处理总归是好的。

接下来,我们将初始块(initial chunk)重映射到剩余空间。这需要在循环中执行,每次循环处理一个额外的副本:

for(unsigned i = 1; i < howmany && mem != NULL; i++)
{

然后,将初始的内存块重新映射到当前的目标段中:

vm_prot_t curProtection, maxProtection;
kern_return_t err = vm_remap(mach_task_self(),
(vm_address_t *)&target,
howmuch,
0, // mask
0, // anywhere
mach_task_self(),
(vm_address_t)mem,
0, // copy
&curProtection,
&maxProtection,
VM_INHERIT_COPY);

这是一个相当复杂的调用,让我们看看所有参数的含义。第一个参数 mach_task_self() 告诉系统我们要将内存映射到我们自己的进程,而不是其他进程。实际上,mach 调用允许在拥有必要权限的情况下操纵其他进程的内存映射。我们在这里没有做任何那种有趣的事情,所以我们就传递 mach_task_self()

第二个参数是内存重映射的目标地址。如果你不关心重映射内存的位置,vm_remap 可以为你选择一个可用地址,并会通过这个参数以引用方式返回该地址。然而,在这个案例中,我们需要它映射到一个特定的位置,所以我们不使用那个选项。不过,我们仍然必须传递一个指针,因为这是 API 的一部分。

第三个参数是要重映射的内存量,它就是 allocate_mirrored 函数中的 howmuch 参数。

第四个参数是一个位掩码,用于指定当我们让 vm_remap 自行选择起始地址时,该地址的对齐约束。由于我们并没有这样做,这个参数无关紧要,我们直接传递 0。

第五个参数是一个标志,用于表明 vm_remap 是自行选择重映射地址,还是必须映射到指定位置。传入 0 表示必须使用提供的位置。

第六个参数是重映射操作的源任务(进程)。由于我们完全在自己的进程内操作,再次传入 mach_task_self()。第七个参数是源地址,即分配空间的起始位置,存储在 mem 中。

第八个参数是一个标志,决定内存是复制还是以可读写方式重映射。这段代码的全部目的就是实现可读写重映射,因此我们传入 0 表示不需要复制。

接下来的两个参数是输出参数,用于告知内存区域的读/写/执行保护属性。我们并不关心这些,但文档并未说明传入 NULL 是否合法,因此我们传入指向哑变量的指针。

最后一个参数说明子进程如何继承这块内存区域。VM_INHERIT_COPY 表示子进程会获得一份副本,这通常符合预期行为。子进程本身不太可能关心这块内存,因此这个参数并不那么关键。

以上就是那个复杂的重映射调用。假设操作成功,初始内存块现在已被重映射到目标区域。接下来我们可以将目标指针递增 howmuch 个字节,为需要映射下一块内存做好位置准备(在需要进行多次重映射的情况下):

target += howmuch;

既然「remap call(重映射调用)」已完成,我们需要进行错误检查。这个调用的错误检查稍微复杂一些,因为由于空间被其他东西占用而产生的错误并不是致命的。回忆一下,整个过程是在赌不会发生「race condition(竞态条件)」,即另一个线程在我们有机会重映射之前分配了镜像空间中的内存。如果发生这种情况,这不是失败,我们只需再试一次。对于这种情况,错误将是 KERN_NO_SPACE。当那个事件发生时,我们释放原始块以及到目前为止已重映射的任何部分,然后将 mem 设置为 NULL 以让循环重试:

if(err == KERN_NO_SPACE)
{
CHECK_ERR(vm_deallocate(mach_task_self(), (vm_address_t)mem, howmuch * i), 0);
mem = NULL;
}

在其他所有情况下,CHECK_ERR 宏都足够了:

else
{
CHECK_ERR(err, howmuch * i);
}

如果在到达循环末尾时,mem 被设置为某个值,那就成功了!接下来只需将新指针返回给调用者即可:

}
}
return mem;
}

当然,我们需要一个对应的函数来释放镜像内存。我们只需对指针调用 vm_deallocate,并指定正确的大小即可。但如何才能知道正确的大小呢?像 mallocfree 这类函数会存储一些元数据来描述内存区域的大小。对于这个函数,我们会简单处理,让调用者像分配函数一样,将大小和数量作为参数传入。这样一来,这就成了调用者的问题,他们可以根据自身情况用合适的方式解决:

void free_mirrored(void *ptr, size_t howmuch, unsigned howmany)
{
vm_deallocate(mach_task_self(), (vm_address_t)ptr, howmuch * howmany);
}

就是这样!我们现在可以按需分配镜像内存,并且支持任意数量的连续镜像。这里有一段简短的测试代码,用于验证其是否按预期工作:通过向镜像内存的某个区域写入数据,再从另一区域读取,确保它们的内容始终保持一致:

static void test_size(unsigned howmany, size_t howmuch)
{
char *buf = allocate_mirrored(howmuch, howmany);
unsigned short seed[3] = { 0 };
for(unsigned j = 0; j < howmany; j++)
{
for(size_t i = 0; i < howmuch; i++)
buf[i] = nrand48(seed);
if(memcmp(buf, buf + howmuch * j, howmuch) != 0)
fprintf(stderr, "FAIL: writing to first half didn't update second half with size %lu\n", (long)howmuch);
for(size_t i = 0; i < howmuch; i++)
buf[howmuch * j + i] = nrand48(seed);
if(memcmp(buf, buf + howmuch * j, howmuch) != 0)
fprintf(stderr, "FAIL: writing to second half didn't update first half with size %lu\n", (long)howmuch);
}
free_mirrored(buf, howmuch, howmany);
}

只需用不同的大小和数量调用该函数几次,并确保一切按预期工作。

void test_allocate_mirrored(void)
{
for(unsigned i = 2; i < 10; i++)
{
test_size(i, get_page_size());
test_size(i, get_page_size() * 2);
test_size(i, get_page_size() * 10);
test_size(i, get_page_size() * 100);
}
}

果然,所有结果都如预期所示。

结论

随着镜像分配器(mirrored allocator)搭建完成并开始运行,我们已经准备好在其之上构建环形缓冲区(ring buffer)了。遗憾的是,我们的时间已经用尽,所以请关注第二部分,在那里我们将详细探讨其工作原理。


#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2012-02-03-ring-buffers-and-mirrored-memory-part-i.html

Playing with virtual memory is always fun, and one place where it can be put to good use is in building a ring buffer. A ring buffer is a way to implement a FIFO queue of data, and using virtual memory tricks to mirror multiple copies of the same data can make the implementation simpler and better by virtually concatenating noncontiguous data. Readers Jose Vazquez and Dylan Lukes suggested that I explore the building of such a construct. Today I will talk about how to implement the virtual memory tricks, and then in part II I will show how to implement the ring buffer on top of them.

Ring BufferA ring buffer is a way to implement a fast FIFO (first in, first out) queue. It works by allocating a chunk of memory and tracking read and write pointers. Those pointers advance as data is read and written. The part that makes it a ring buffer is that, when one of those pointers goes off the end, it wraps back to the beginning.

To illustrate, the ring buffer starts out empty:

read
|
v
+----------+
| |
+----------+
^
|
write

When data is written, the data fills out the buffer and the write pointer is advanced:

read
|
v
+----------+
|xxxxxxx |
+----------+
^
|
write

The read pointer follows when data is read:

read
|
v
+----------+
| xx |
+----------+
^
|
write

When the write pointer goes off the end, it wraps back to the beginning:

read
|
v
+----------+
|xxx xxxxx|
+----------+
^
|
write

At this point, the data in the ring buffer is no longer contiguous, as it wraps off the end and back to the beginning. The ring buffer code has to split the data apart when writing and then stitch it back together when reading it back out again. After reading some more, the read pointer also wraps around and the buffer looks like this:

read
|
v
+----------+
| xx |
+----------+
^
|
write

And it can continue like this forever, wrapping around each time it hits the end.

This structure is great for anything where data is continuously written and then read shortly afterwards. It’s relatively easy to make thread safe, so it can be especially useful for communication of realtime data between threads. Ring buffers are commonly used for communicating audio into playback threads and out of recording threads, for example.

MirroringThis technique works well and is used a lot. However, the wraparound behavior makes things a bit difficult. It complicates the reading and writing code, as it has to check for wraparound and then split the read/write in two. Worse, however, is the fact that the wraparound forces a memory copy for getting data into or out of the ring buffer. Since the data isn’t necessarily contiguous, it’s not possible to expose pointers to the data to the outside world. For example, if you’re reading into the ring buffer from a file, it would be great to be able to pass the ring buffer’s write pointer directly to the read function, but this won’t easily work if the free space wraps around. Instead, you either have to complicate things with multiple reads, or read into a separate buffer and then copy the memory over.

Imagine if it were possible to mirror a chunk of memory, such that it appeared in two places at once. Any change to one place would be immediately reflected in the other place. For example, you’d start out empty:

+----------+ +----------+
| | | |
+----------+ +----------+

Then upon writing to one copy, the other copy immediately sees the data as well:

+----------+ +----------+
| xx | | xx |
+----------+ +----------+

Let’s do this to the ring buffer’s memory. Furthermore, let’s arrange it so the two mirrored copies are directly side by side:

read
|
v
+----------+----------+
|xxx xxxxx|xxx xxxxx|
+----------+----------+
^
|
write

Check it out! The data is now contiguous despite wrapping around in the ring buffer, because the second copy directly adjoins the first. Even though the data wraps around, we can still pass a pointer to it to anything that expects contiguous data. Even though the data wraps around, it is contiguous, thanks to mirroring.

The same thing applies for the empty space, for the case when the buffer’s data doesn’t wrap around:

read
|
v
+----------+----------+
| xx | xx |
+----------+----------+
^
|
write

Again, the free space is contiguous, simplifying things greatly. This pointer can be passed directly to a read call or anything else that wants to write into a single block of memory.

CodeIt’s time to dive into code. As usual, the code for today’s adventure is available on GitHub:

https://github.com/mikeash/MAMirroredQueue

Mirroring With MachBeing able to mirror memory like this would be really handy. Well, good news! With the mach virtual memory APIs, you can!

When dealing with virtual memory, everything relates to memory pages. A page is a chunk of memory of a convenient size to be handled by the virtual memory system, which is typically 4096 bytes. A page is the smallest unit that virtual memory will deal with. If you map or unmap memory, you can only deal with an integral number of pages.

Although the page size will be 4096 bytes on any system you’re likely to encounter, it’s not a good practice to hardcode this. The system’s page size can be obtained from the vm_page_size global variable. Here’s a simple wrapper that hides the global behind a function call:

size_t get_page_size(void)
{
return vm_page_size;
}

Now, let’s talk about how to implement the actual mirrored memory allocator. The mach virtual memory functions allow allocating a chunk of memory, remapping a chunk of memory to a specified location, and deallocating a chunk of memory. Crucially, unlike malloc and free, it’s possible to deallocate only a piece of an allocation. In other words, you can allocate two pages, then only deallocate one, and the other page remains valid.

This is what allows us to obtain a contiguous chunk of mirrored memory. If we just allocated a chunk of memory and then tried to remap it to the space afterwards, this is likely to fail, as that space may well be occupied. There’s no way to directly ask the system to allocate memory in an area that’s followed by a bunch of free space. However, this can be done indirectly. Here’s the approach:

  • Allocate twice as much memory as required.

  • Deallocate the second half of the memory that was just allocated.

  • Remap the first half into the now free second half.

There’s a race condition here, where another thread could end up allocating memory in the newly freed second half between steps 2 and 3. However, this isn’t a disaster. If it happens, step 3 will fail, and we can just deallocate the memory and try the whole operation again. Since the race window is small, it’s unlikely to require many tries for this procedure to succeed.

Let’s look at the code to do it. Rather than only implement mirrored pairs, I made this function able to allocate an arbitrary number of contiguous mirrored memory chunks. This function takes a quantity of memory to allocate and a number of mirrored copies to make. The quantity of memory to allocate has to be an integral number of pages, and to keep things from blowing up weirdly later on, I check for that and return an error if it doesn’t match:

void *allocate_mirrored(size_t howmuch, unsigned howmany)
{
// make sure it's positive and an exact multiple of page size
if(howmuch <= 0 || howmuch != howmuch / get_page_size() * get_page_size())
{
errno = EINVAL;
return NULL;
}

Now that the size is known to be good, we’ll start the loop. The mem variable will hold the allocated pointer. We start it out as NULL and then keep trying until things work.

char *mem = NULL;
while(mem == NULL)
{

We’re going to be doing a lot of error checking on the results of these mach calls. After the initial chunk of memory is allocated, handling an error means deallocating the memory that’s still allocated before returning the error. Before we get to the actual code, we need a macro to centralize this error handling.

The CHECK_ERR macro checks an expression against KERN_SUCCESS and handles cleanup. It takes a second parameter, which is the number of bytes to deallocate. In the event that the initial allocation fails, no memory needs to be deallocated. Later failures need to either deallocate the entire chunk of memory, including all of the mirrored pages, or just some of them, depending on exactly what is allocated at that time. After deallocating memory, if needed, the macro then sets errno and returns NULL:

#define CHECK_ERR(expr, todealloc) do { \
kern_return_t __check_err = (expr); \
if(__check_err != KERN_SUCCESS) \
{ \
if(todealloc > 0) \
vm_deallocate(mach_task_self(), (vm_address_t)mem, (todealloc)); \
errno = ENOMEM; \
return NULL; \
} \
} while(0)

With this in place we can finally make the first mach call of this function, to make the initial memory allocation:

CHECK_ERR(vm_allocate(mach_task_self(), (vm_address_t *)&mem, howmuch * howmany, VM_FLAGS_ANYWHERE), 0);

No memory needs to be freed if this allocation fails, so the second parameter to the macro is 0. Next, we’ll deallocate all of this newly allocated space beyond the first chunk. First, we calculate the location of the remainder by adding howmuch to the newly allocated pointer, then using vm_deallocate to deallocate howmany - 1 copies:

char *target = mem + howmuch;
CHECK_ERR(vm_deallocate(mach_task_self(), (vm_address_t)target, howmuch * (howmany - 1)), howmuch * howmany);

If this fails, then we should deallocate the entire allocated region. Realistically, this won’t fail, and if it does then the attempt to deallocate the entire region isn’t too likely to work either, but it’s nice to be thorough.

Next up, we remap the initial chunk into the remaining space. We need to do this in a loop, once for each additional copy:

for(unsigned i = 1; i < howmany && mem != NULL; i++)
{

Then remap the initial chunk into the current target section:

vm_prot_t curProtection, maxProtection;
kern_return_t err = vm_remap(mach_task_self(),
(vm_address_t *)&target,
howmuch,
0, // mask
0, // anywhere
mach_task_self(),
(vm_address_t)mem,
0, // copy
&curProtection,
&maxProtection,
VM_INHERIT_COPY);

This is a pretty complicated call, so let’s look at what all of the parameters mean. The first parameter, mach_task_self(), tells it that we want to map memory into our own process and not some other process. The mach calls actually allow manipulating memory mappings in other processes, if you have the necessary privileges. We’re not doing anything funny like that here, so we just pass mach_task_self().

The second parameter is the target of the memory remap. If you don’t care about the location of the remapped memory, vm_remap can choose an available address for you, and it will return that address in this parameter, thus it’s passed by reference. In this case, however, we need it to go into a particular spot, so we don’t use that option. We still have to pass a pointer, though, since that’s part of the API.

The third parameter is how much memory to remap, which is just the howmuch parameter to allocate_mirrored.

The fourth parameter is a bitmask to specify alignment restrictions for the starting address, when we have vm_remap choose it for us. Since we aren’t doing that, this parameter doesn’t matter and we just pass 0.

The fifth parameter is a flag saying whether vm_remap can choose its own address for the remap, or whether it must remap into the location provided. By passing 0, we say that it must use the location provided.

The sixth parameter is the source task (process) for the remap operation. Since we’re working entirely within our own process, we once again pass mach_task_self(). The seventh parameter is the source address, which is just the beginning of the allocated space, stored in mem.

The eight parameter is a flag determining whether the memory is to be copied or remapped read-write. The whole purpose of this code is to remap read-write, so we pass 0 to indicate that we don’t want a copy.

The next two parameters are out parameters telling us about the read/write/execute protections on the memory region. We don’t care about these, but the documentation doesn’t say that it’s legal to pass NULL, so we pass in pointers to dummy variables.

The last parameter indicates how child processes inherit this region. VM_INHERIT_COPY means that child processes get a copy, which is generally the expected behavior. It’s unlikely that a child process would ever care about this memory to begin with, so this parameter is not all that important.

That’s the monster remap call. Assuming success, the initial chunk of memory is now remapped into the target area. We can then increment the target pointer by howmuch to position it for the next chunk to be remapped, in the event that we need to do more than one:

target += howmuch;

Now that the remap call is done, we need to do error checking. The error checking for this call is a little more complex, because an error due to the space being occupied by something else is not fatal. Recall that this whole procedure is betting against a race conditiong occurring, where another thread allocates memory in the mirrored space before we get a chance to remap it. If that happens, it’s not a failure, we simply have to try again.

For that case, the error will be KERN_NO_SPACE. In that event, we deallocate the original chunk plus whatever has been remapped so far, then set mem to NULL to have the loop try again:

if(err == KERN_NO_SPACE)
{
CHECK_ERR(vm_deallocate(mach_task_self(), (vm_address_t)mem, howmuch * i), 0);
mem = NULL;
}

For all other cases, the CHECK_ERR macro suffices:

else
{
CHECK_ERR(err, howmuch * i);
}

If we get to the end of the loop with mem set to something, then success! Nothing left to do but return the new pointer to the caller:

}
}
return mem;
}

Of course, we need a corresponding function to deallocate the mirrored memory. All we need to do is call vm_deallocate on the pointer with the correct size. How can we know the correct size? With functions like malloc and free, some metadata is stashed away to describe how large the memory region is. For this function, we’ll just cheat and make the caller pass the size and count in as parameters, just like with the allocation function. This way it’s the caller’s problem, and they can solve it however is appropriate for them:

void free_mirrored(void *ptr, size_t howmuch, unsigned howmany)
{
vm_deallocate(mach_task_self(), (vm_address_t)ptr, howmuch * howmany);
}

And that’s it! We can now allocate mirrored memory as desired, with an arbitrary number of contiguous mirrorings. Here’s a little test code to make sure it works as intended, by writing to part of the mirrored memory, reading from another part, and making sure they always have the same contents:

static void test_size(unsigned howmany, size_t howmuch)
{
char *buf = allocate_mirrored(howmuch, howmany);
unsigned short seed[3] = { 0 };
for(unsigned j = 0; j < howmany; j++)
{
for(size_t i = 0; i < howmuch; i++)
buf[i] = nrand48(seed);
if(memcmp(buf, buf + howmuch * j, howmuch) != 0)
fprintf(stderr, "FAIL: writing to first half didn't update second half with size %lu\n", (long)howmuch);
for(size_t i = 0; i < howmuch; i++)
buf[howmuch * j + i] = nrand48(seed);
if(memcmp(buf, buf + howmuch * j, howmuch) != 0)
fprintf(stderr, "FAIL: writing to second half didn't update first half with size %lu\n", (long)howmuch);
}
free_mirrored(buf, howmuch, howmany);
}

Just call that a few times with different sizes and counts and make sure everything works as expected:

void test_allocate_mirrored(void)
{
for(unsigned i = 2; i < 10; i++)
{
test_size(i, get_page_size());
test_size(i, get_page_size() * 2);
test_size(i, get_page_size() * 10);
test_size(i, get_page_size() * 100);
}
}

And indeed, everything comes out as it’s supposed to.

ConclusionWith the mirrored allocator up and running, we’re ready to build the ring buffer on top of it. Unfortunately, we’re out of time, so come back for part II where we’ll examine in detail how that side of things works.