信号处理

Mike Ash Friday Q&A 中文译文:信号处理

作者 TommyWu
封面圖片: 信号处理

译文 · 原文: Friday Q&A 2011-04-01: Signal Handling · 作者 Mike Ash

原文:https://www.mikeash.com/pyblog/friday-qa-2011-04-01-signal-handling.html 发布:2011-04-01 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样


祝所有读者愚人节快乐,欢迎来到这个不会整天用恶作剧烦你的网站。相反,我将为你带来新一期的 Friday Q & A。本期中,我将讨论在 Mac 程序中处理信号的各种方法,这个主题由博客好友 Landon Fuller 提出。

信号(Signals)
信号是所能想象到的最原始的进程间通信(interprocess communication)形式之一。信号只是一个发送给进程的小整数。你可以使用 kill 命令发送信号,该命令也提供了对应的 C 函数。

当一个信号被传递时,它可以终止进程、暂停 / 恢复进程、被忽略,或者调用一些自定义代码。最后一种选项称为信号处理(signal handling),这正是我今天要讨论的内容。

已定义信号的列表可以在头文件 sys/signal.h 中查看。其中许多信号有我们熟悉的用途。SIGINT 是当你在 shell 中按下 Control-C 时生成的信号。SIGABRT 用于在调用 abort() 时终止你的程序,而 SIGSEGV 则是臭名昭著的段错误(segmentation fault),当你解引用一个错误的指针时就会出现。

信号处理是较为晦涩的知识领域,大多数程序根本无需考虑它。不过在某些场景下它会很有用。对于终端程序和服务器程序而言,捕获 SIGHUPSIGINT 及其他类似信号以在退出前执行清理操作是很实用的做法,这相当于 Cocoa 的 applicationWillTerminate: 的一种底层版本。SIGWINCH 信号对于高级终端应用程序相当便利。SIGUSR1SIGUSR2 是用户自定义信号,可用于实现特定功能。

信号处理函数 信号处理最底层的接口是 sigaction 函数。它提供了一些复杂而深奥的选项,但其核心价值在于允许你指定一个函数,当相应信号被传递时该函数就会被调用:

static void Handler(int signal)
{
// signal came in!
}
struct sigaction action = { 0 };
action.sa_handler = Handler;
sigaction(SIGUSR1, &action, NULL);

错误。

重入(Reentrancy)

问题在于信号是异步投递的,而此处注册的函数也是异步调用的。代码总是需要在某个线程上执行。根据信号生成方式的不同,处理函数要么在信号关联的线程上运行(例如,SIGSEGV 处理函数将在发生段错误的线程上运行),要么在进程中的任意线程上运行。问题本质在于,这相当于用户态下的一个中断,当它触发时,正在执行的任何代码都会被暂停,直到处理函数执行完毕。

任何经历过经典 Mac 时代的人都知道,编写在中断环境下运行的代码是困难的。问题就出在 ** 重入(reentrancy)上。许多人将重入与线程安全(thread safety)** 混为一谈,但它们并非同一概念,尽管有些相似。

线程安全(Thread safety)是指一段特定代码能够安全地同时在多个线程上运行。实现线程安全最常用的方法是使用锁(locks)。一个调用获取锁,执行工作,然后释放锁。在此期间到来的第二个线程将会阻塞,直到第一个线程完成。

如果代码是 **reentrant(可重入)** 的,意味着同一段代码可以在同一个线程上安全地运行多次。这与线程安全不同,且实现难度要大得多。

如果将线程安全中使用的锁机制应用于重入性会怎样?第一次调用获取锁。当它仍在执行时,代码被再次调用。第二次调用会尝试获取锁,但锁已被占用,因此它会阻塞。然而,第一次调用在第二次调用完成前无法继续运行。第二次调用又在第一次调用完成前无法运行。结果是程序完全冻结。

编写可重入代码非常困难,因此系统函数中可重入的非常少。由于信号处理器(signal handler)充当中断的角色,它只能调用可重入的代码。你甚至无法安全地调用像 printf 这样简单的函数,因为 printf 可能会获取锁,如果在信号处理器运行的线程上已经有一个活跃的 printf 调用,就会导致死锁(deadlock)

sigaction 的手册页列出了你可以在信号处理器中安全调用的函数列表,这个列表非常有限。

完整列表为:_exit ()、access ()、alarm ()、cfgetispeed ()、cfgetospeed ()、cfsetispeed ()、cfsetospeed ()、chdir ()、chmod ()、chown ()、close ()、creat ()、dup ()、dup2 ()、execle ()、execve ()、fcntl ()、fork ()、fpathconf ()、fstat ()、fsync ()、getegid ()、geteuid ()、getgid ()、getgroups ()、getpgrp ()、getpid ()、getppid ()、getuid ()、kill ()、link ()、lseek ()、mkdir ()、mkfifo ()、open ()、pathconf ()、pause ()、pipe ()、raise ()、read ()、rename ()、rmdir ()、setgid ()、setpgid ()、setsid ()、setuid ()、sigaction ()、sigaddset ()、sigdelset ()、sigemptyset ()、sigfillset ()、sigismember ()、signal ()、sigpending ()、sigprocmask ()、sigsuspend ()、sleep ()、stat ()、sysconf ()、tcdrain ()、tcflow ()、tcflush ()、tcgetattr ()、tcgetpgrp ()、tcsendbreak ()、tcsetattr ()、tcsetpgrp ()、time ()、times ()、umask ()、uname ()、unlink ()、utime ()、wait ()、waitpid ()、write ()、aio_error ()、sigpause ()、aio_return ()、aio_suspend ()、sem_post ()、sigset ()、strcpy ()、strcat ()、strncpy ()、strncat ()、strlcpy ()、strlcat ()。

最后,列表以这个有趣的注释结尾:“…and perhaps some others.”(”… 或许还有其他一些。“)在这类文档中遇到” 或许”(Perhaps)这个词可不太令人安心。

你可以调用自己编写的可重入代码(reentrant code),但你很可能并没有这样的代码,因为它很难编写,除了上述列出的少数情况外不能调用任何系统函数,而且你以前也从未有过编写它的理由。对于 Objective-C 的类型,需要注意的是 objc_msgSend 不具备可重入性,因此你无法在信号处理程序中使用任何 Objective-C 代码。

你能安全进行的操作非常少。少到我甚至不打算讨论如何达成任何目的,因为这样做太不切实际,我只会告诉你:除非你真的清楚自己在做什么且享受痛苦,否则请避免使用信号处理程序。

幸运的是,有更好方法来实现这些目标。

kqueue

其中一种更好的方法是使用 kqueue(内核事件队列)。它是一项底层操作系统服务,允许程序监视多种不同事件,其中包括信号。你可以专门为信号处理创建一个 kqueue,也可以向程序中已有的 kqueue 添加信号处理事件。

设置过程稍微复杂一些,但总体上并不困难。首先,创建 kqueue:

int fd = kqueue();
struct kevent event = { SIGUSR1, EVFILT_SIGNAL, EV_ADD, 0, 0 };
kevent(fd, &event, 1, NULL, 0, NULL);
struct sigaction action = { 0 };
action.sa_handler = SIG_IGN;
sigaction(SIGUSR1, &action, NULL);
struct kevent event;
int count = kevent(fd, NULL, 0, &event, 1, NULL);
if(count == 1)
{
if(event.filter == EVFILT_SIGNAL)
printf("got signal %d\n", (int)event.ident);
}

kqueue 在实际程序中使用起来并非总是那么便捷。有两种合理的使用方式:一种是使用专设的信号处理线程,通过循环调用 kevent 来监听事件;另一种是将 kqueue 的文件描述符(file descriptor)通过类似 CFFileDescriptor 的机制添加到你的运行循环(runloop)中,从而与 Cocoa 的运行循环集成。但这两者都谈不上理想方案。

GCD 现在我们终于迎来了一种极其易用的信号处理方案:Grand Central Dispatch(GCD,大中央调度)。除了广为人知的多进程处理能力外,GCD 还提供了一套完整的事件监控功能,其能力与 kqueue 相当(事实上,GCD 内部正是通过 kqueue 实现这些功能的)。

要使用 GCD 处理信号,我们需要创建一个调度源(dispatch source)来监控该信号:

dispatch_source_t source = dispatch_source_create(DISPATCH_SOURCE_TYPE_SIGNAL, SIGUSR1, 0, dispatch_get_global_queue(0, 0));
dispatch_source_set_event_handler(source, ^{
printf("got SIGUSR1\n");
});
dispatch_resume(source);
struct sigaction action = { 0 };
action.sa_handler = SIG_IGN;
sigaction(SIGUSR1, &action, NULL);

就这样!每当有 SIGUSR1 信号到来时,处理器就会被调用。由于该源(source)的目标是一个全局队列(global queue),处理器会自动在后台线程(background thread)中运行,不会干扰其他操作。如果你愿意,也可以给 GCD(Grand Central Dispatch,大中央调度)提供一个自定义队列(custom queue),甚至主队列(main queue),以控制处理器运行的位置。与 kqueue 的情况类似,因为处理器是在正常线程上正常运行的,所以在其中执行你在任何其他代码中会做的操作都是安全的。GCD 使信号处理变得便捷、简单且安全。

结论 信号处理的需求虽不常见,但有时很有用。使用底层的 sigaction 来处理信号会让事情变得异常艰难,因为信号处理器被调用的方式对其包含的代码施加了极其严格的限制。这使得在这样的信号处理器中做任何有用的事情几乎不可能。

在几乎所有情况下,处理信号的最佳方式是使用 GCD。使用 GCD 进行信号处理既简单又安全。在那些罕见的需要处理信号的场合,GCD 让你只需几行代码就能完成。

如果你不能或不想使用 GCD(Grand Central Dispatch,大中央调度)但仍想避免 sigaction(信号处理函数),kqueue(内核事件队列)提供了一个很好的折中方案。虽然它比 GCD 方法更复杂地设置和管理,但它仍然能以合理的方式很好地处理信号。以上就是今天愚人节版的 Friday Q & A(周五问答)。两周后再见下一期。在那之前,一如既往地继续发送你的话题想法给我。Friday Q & A 由读者建议驱动,所以如果你有想看到的内容,就发送过来吧!


#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2011-04-01-signal-handling.html

Happy April Fool’s Day to all my readers, and welcome to one web site which won’t irritate you all day with bizarre practical jokes. Instead, I bring you another edition of Friday Q&A. In this edition, I will discuss various ways of handling signals in Mac programs, a topic suggested by friend of the blog Landon Fuller.

Signals Signals are one of the most primitive forms of interprocess communication imaginable. A signal is just a small integer sent to a process. You can send a signal using the kill command, which also has a corresponding function available from C.

When a signal is delivered, it can terminate the process, pause/resume the process, be ignored, or invoke some custom code. That last option is called signal handling, and that is what I want to discuss today.

The list of defined signals can be seen in the header sys/signal.h. Many of these are used for familiar purposes. SIGINT is the signal generated when you press control-C in the shell. SIGABRT is used to kill your program when you call abort(), and SIGSEGV is the infamous segmentation fault, which pops up when you dereference a bad pointer.

Signal handling is esoteric and most programs don’t need to worry about it at all. However, there are cases where it can be useful. For terminal and server programs, it’s handy to catch SIGHUP, SIGINT, and other similar signals to do cleanup before exiting, as a sort of low-level version of Cocoa’s applicationWillTerminate:. The SIGWINCH signal is handy for sophisticated terminal applications. SIGUSR1 and SIGUSR2 are user-defined signals which you can use for your own purposes.

sigaction The lowest level interface for signal handling is the sigaction function. It provides some sophisticated and arcane options, but the important part is that it allows you to specify a function which is called when the signal in question is delivered:

static void Handler(int signal)
{
// signal came in!
}
struct sigaction action = { 0 };
action.sa_handler = Handler;
sigaction(SIGUSR1, &action, NULL);

Wrong.

Reentrancy The problem is that signals are delivered asynchronously, and the function registered here is also invoked asynchronously. Code always has to run on a thread somewhere. Depending on how the signal is generated, the handler is either run on the thread that the signal is associated with (for example, a SIGSEGV handler will run on the thread that segfaulted) or it will run on an arbitrary thread in the process. The problem is that it’s essentially an interrupt in userland, and whatever code was running when it came in will be paused until the handler is done.

As anyone who was around in the classic Mac days knows, writing code that runs in an interrupt is hard. The problem is reentrancy. Many people confuse reentrancy with thread safety, but they are not the same concept, although they are somewhat similar.

Thread safety means that a particular piece of code can run on multiple threads at the same time safely. Thread safety is most commonly accomplished by using locks. A call acquires a lock, does work, releases the lock. A second thread that comes along in the middle will block until the first thread is done.

If code is reentrant that means that a particular piece of code can run multiple times on the same thread safely. This is different and considerably harder.

What if you take the thread safety approach of locking and apply it to reentrancy? The first call acquires the lock. While it’s active, the code is called again. It tries to acquire the lock, but the lock is already taken, so it blocks. However, the first call can’t run until the second call is done. The second call can’t run until the first call is done. The result is a frozen program.

Writing reentrant code is hard, and as a result very few system functions are reentrant. Because a signal handler functions as an interrupt, it can only call reentrant code. You can’t call something as simple as printf safely, because printf could take a lock, and if there’s already an active call to printf on the thread where the handler runs, you’ll deadlock.

The sigaction man page gives a list of functions you are allowed to call from a signal handler. It’s pretty limited.

The complete list is: _exit(), access(), alarm(), cfgetispeed(), cfgetospeed(), cfsetispeed(), cfsetospeed(), chdir(), chmod(), chown(), close(), creat(), dup(), dup2(), execle(), execve(), fcntl(), fork(), fpathconf(), fstat(), fsync(), getegid(), geteuid(), getgid(), getgroups(), getpgrp(), getpid(), getppid(), getuid(), kill(), link(), lseek(), mkdir(), mkfifo(), open(), pathconf(), pause(), pipe(), raise(), read(), rename(), rmdir(), setgid(), setpgid(), setsid(), setuid(), sigaction(), sigaddset(), sigdelset(), sigemptyset(), sigfillset(), sigismember(), signal(), sigpending(), sigprocmask(), sigsuspend(), sleep(), stat(), sysconf(), tcdrain(), tcflow(), tcflush(), tcgetattr(), tcgetpgrp(), tcsendbreak(), tcsetattr(), tcsetpgrp(), time(), times(), umask(), uname(), unlink(), utime(), wait(), waitpid(), write(), aio_error(), sigpause(), aio_return(), aio_suspend(), sem_post(), sigset(), strcpy(), strcat(), strncpy(), strncat(), strlcpy(), strlcat().

Finally, the list ends with this amusing note: “…and perhaps some others.” “Perhaps” is not a nice word to run into in this sort of documentation.

You can call your own reentrant code, but you probably don’t have any, because it’s hard to write, it can’t call any system functions except from the above list, and you never had any reason to write it before. For the Objective-C types, note that objc_msgSend is not reentrant, so you cannot use any Objective-C from a signal handler.

There is very little that you can do safely. There is so little that I’m not even going to discuss how to get anything done, because it’s so impractical to do so, and instead will simply tell you to avoid using signal handlers unless you really know what you’re doing and you enjoy pain.

Fortunately, there are better ways to do these things.

kqueue One of those better ways is to use kqueue. This is a low level operating service which allows a program to monitor many different events, and one of the events it can monitor is signals. You can create a kqueue just for signal handling, or you can add a signal handling event to an existing kqueue you already have within your program.

Setting things up is a bit more involved, but all in all not too hard. First, the kqueue is created:

int fd = kqueue();
struct kevent event = { SIGUSR1, EVFILT_SIGNAL, EV_ADD, 0, 0 };
kevent(fd, &event, 1, NULL, 0, NULL);
struct sigaction action = { 0 };
action.sa_handler = SIG_IGN;
sigaction(SIGUSR1, &action, NULL);
struct kevent event;
int count = kevent(fd, NULL, 0, &event, 1, NULL);
if(count == 1)
{
if(event.filter == EVFILT_SIGNAL)
printf("got signal %d\n", (int)event.ident);
}

kqueue isn’t always all that convenient to use in real programs, though. There are two reasonable ways to do it. One way is to have a dedicated signal handling thread which sits in a loop calling kevent repeatedly. Another way is to add the kqueue file descriptor to your runloop using something like CFFileDescriptor to integrate it with your Cocoa runloop. However neither of these is particularly great.

GCD Finally we reach a signal handling solution which is extremely easy to use: Grand Central Dispatch. In addition to the better-known multiprocessing capabilities, GCD also includes a full suite of event monitoring abilities which match those of kqueue. (And in fact, GCD implements them using kqueue internally.)

To handle a signal with GCD, we create a dispatch source to monitor the signal:

dispatch_source_t source = dispatch_source_create(DISPATCH_SOURCE_TYPE_SIGNAL, SIGUSR1, 0, dispatch_get_global_queue(0, 0));
dispatch_source_set_event_handler(source, ^{
printf("got SIGUSR1\n");
});
dispatch_resume(source);
struct sigaction action = { 0 };
action.sa_handler = SIG_IGN;
sigaction(SIGUSR1, &action, NULL);

That’s it! Every time a SIGUSR1 comes in, the handler is called. Because the source targets a global queue, the handler automatically runs in a background thread without interfering with anything else. If you prefer, you can give GCD a custom queue, or even the main queue, to control where the handler runs. Like with kqueue, because the handler runs normally on a normal thread, it’s safe to do anything in it that you would do in any other piece of code. GCD makes signal handling convenient, easy, and safe.

Conclusion Signal handling is a rare requirement, but sometimes useful. Using the low level sigaction to handle signals makes life unbelievably hard, as the signal handler is called in such a way as to place extreme restrictions on the code it contains. This makes it almost impossible to do anything useful in such a signal handler.

The best way to handle a signal in almost every case is to use GCD. Signal handling with GCD is easy and safe. On the rare occasions where you need to handle signals, GCD lets you do it with just a few lines of code.

If you can’t or don’t want to use GCD but still want to avoid sigaction, kqueue provides a good middle ground. While it’s more complicated to set up and manage than the GCD approach, it still works well to handle signals in a reasonable manner.

That wraps up today’s April Fool’s edition of Friday Q&A. Come back in two weeks for the next one. Until then, as always, keep sending me your ideas for topics. Friday Q&A is driven by reader suggestions, so if you have something you would like to see covered, send it in!