A High Level-Explaination of mach_override and How Misusing it Can Cause a Stack Overflow
mach_override is a C library for Mac OS X which allows you to override the implementation of one C function with another. In a recent article, I claimed that calling mach_override_ptr
twice using the same paramaters could cause a stack overflow due to infinite recursion. In this article, I will explain how mach_override works at a high level, and explain how it can cause infinite recursion.
The information in this article is derived from the mach_override source code. mach_override was written by Jonathan “Wolf” Rentzsch, who introduced it in a paper submitted to the MacHack conference in 2003. Since then, the implementation of mach_override has changed significantly, in no small part due to the switch from PowerPC to Intel. So while the MacHack paper and presentation slides contain more in-depth information than this article, much of that information is outdated, and must be reconciled against the source code.
How mach_override Works
mach_override exposes one public function, mach_override_ptr
, which takes three arguments. The first argument takes a pointer to the function to be overridden; the second argument takes a pointer to a function that will override the first; and the third argument takes the address of a function pointer which, if mach_override_ptr
succeeds, can be called to execute the original function. The function pointer used in the third argument is typically called within the replacement function, and must be scoped appropriately.
This API is superficially similar to Objective-C method swizzling, but it’s implementation differs significantly. Objective-C’s dynamic messaging system provides a level of indirection that allows users to reroute messages at runtime. Since function calls in C do not have such a level of indirection, mach_override has to take drastic measures to create its own.
In C, the addresses of functions are computed during linking and loading. By the time mach_override_ptr
executes, any code that will call the function to override will already have that function’s address. Searching through memory and modifying all the code that calls this function is impractical, so mach_override rewrites the beginning of the overridden function at runtime, replacing the first instruction with a jump instruction. For those not familiar with assembly programming, a jump is like a goto, except that it can jump to a line outside the current function.
However, the overridden function will not jump directly to the replacement function. It will jump to what mach_override calls a branch island. A branch island is a small function, created at runtime, which optionally performs a few instructions before jumping to another function. This branch island is called the escape island, and jumps directly to the replacement function without doing anything else.
A second branch island, called the reentry island, is created when the third parameter is not NULL
. This island executes the original first instruction from the overridden function, and then jumps to the second instruction of the original function. Thus, executing the reentry island is equivalent to executing the original function, so the address of the reentry island is assigned to the function pointer passed in as the third argument.
To create each branch island, mach_override allocates a page of virtual memory, makes it writable and executable, and then writes the branch to the beginning of the page. Similarly, to replace the first instruction of the original function, mach_override makes page(s) containing that instruction writable. Memory that is both writable and executable makes certain classes of exploits easier pull off, so you should only use mach_override when it’s absolutely necessary.
The Stack Overflow
Each time mach_override_ptr
is called, new branch islands are created. If called twice with the same parameters, mach_override_ptr
will create two identical escape islands. The second time mach_override_ptr
is called, it will replace the first instruction, which is a jump to the first escape island, with a jump to the second escape island.
The two reentry islands, however, will be different. Because the first instruction of the original function is copied to the reentry island, the first reentry island will contain the original first instruction, and the second reentry island will contain the aforementioned jump to the first escape island. Since the function pointer passed in as the third parameter is overwritten each time mach_override_ptr
is called, it ends up pointing to the second reentry island. When the replacement function calls this function pointer, it calls the second reentry island, which calls the first escape island, which calls the replacement function, as illustrated below.
The original function calls escape island 2, which calls the replacement function, which calls reentry island 2, which calls escape island 1, which calls the replacement function
This bug is only possible because mach_override_ptr
rewrites part of the original function, and because the same reentry function pointer is passed in both times. However, even when using unique replacement functions and reentry function pointers, calling mach_override_ptr
twice on the same original function can cause unexpected behavior. The MACH_OVERRIDE
macro in mach_override.h even attempts to prevent calling mach_override_ptr
on the same function more than once. Because of the potential to cause bugs if called more than once, and because of the security implications, great care should be taken while using mach_override.