Python 3.12 is out! It includes new features and performance improvements – some contributed by Meta – that we believe will benefit all Python users.
We’re sharing details about these new features that we worked closely with the Python community to develop.
This week’s release of Python 3.12 marks a milestone in our efforts to make our work developing and scaling Python for Meta’s use cases more accessible to the broader Python community. Open source at Meta is an important part of how we work and share our learnings with the community.
For several years, we have been sharing our work on Python and CPython through our open source Python runtime, Cinder. We have also been working closely with the Python community to introduce new features and optimizations to improve Python’s performance and to allow third parties to experiment with Python runtime optimization more easily.
For the Python 3.12 release, we collaborated with the Python community on several categories of features:
Type system improvements
Immortal Objects – PEP 683 makes it possible to create Python objects that don’t participate in reference counting, and will live until Python interpreter shutdown. The original motivation for this feature was to reduce memory use in the forking Instagram web-server workload by reducing copy-on-writes triggered by reference-count updates.
Immortal Objects are also an important step towards truly immutable Python objects that can be shared between Python interpreters with no need for locking, for example, via the global interpreter lock (GIL) This can enable improved Python single-process parallelism, whether via multiple sub-interpreters or GIL-free multi-threading.
Type system improvements
The engineering team behind Pyre, an open source Python type-checker, authored and implemented PEP 698 to add a @typing.override decorator, which helps avoid bugs when refactoring class inheritance hierarchies that use method overriding.
Python developers can apply this new decorator to a subclass method that overrides a method from a base class. As a result, static type checkers will be able to warn developers if the base class is modified such that the overridden method no longer exists. Developers can avoid accidentally turning a method override into dead code. This improves confidence in refactoring and helps keep the code more maintainable.
In previous Python versions, all comprehensions were compiled as nested functions, and every execution of a comprehension allocated and destroyed a single-use Python function object.
In Python 3.12, PEP 709 inlines all list, dict, and set comprehensions for better performance (up to two times better in the best case).
The implementation and debugging of PEP 709 also uncovered a pre-existing bytecode compiler bug that could result in silently wrong code execution in Python 3.11, which we fixed.
Eager asyncio tasks
While Python’s asynchronous programming support enables single-process concurrency, it also has noticeable runtime overhead. Every call to an async function creates an extra coroutine object, and the standard asyncio library will often bring additional overhead in the form of Task objects and event loop scheduling.
We observed that, in practice, in a fully async codebase, many async functions are often able to return a result immediately, with no need to suspend. (This may be due to memoization, for example.) In these cases, if the result of the function is immediately awaited (e.g., by await some_async_func(), the most common way to call an async function), the coroutine/Task objects and event loop scheduling can be unnecessary overhead.
Cinder eliminates this overhead via eager async execution. If an async function call is awaited immediately, it is called with a flag set that allows it to return a result directly, if possible, without creating a coroutine object. If an asyncio.gather() is immediately awaited, and all the async functions it gathers are able to return immediately, there’s no need to ever create a Task or schedule it to the event loop.
Fully eager async execution would be an invasive (and breaking) change to Python, and doesn’t work as well with the new Python 3.11+ TaskGroup API for managing concurrent tasks. So in Python 3.12 we added a simpler version of the feature: eager asyncio tasks. With eager tasks, coroutine and Task objects are still created when a result is available immediately, but we can sometimes avoid scheduling the task to the event loop and instead resolve it right away.
This is more efficient, but it is a semantic change, so this feature is opt-in via a custom task factory.
Other asyncio improvements
Faster super() calls
The new LOAD_SUPER_ATTR opcode optimizes code of the form super().attr and super().method(…). Such code previously had to allocate, and then throw away, a single-use “super” object each time it ran. Now it has little more overhead than an ordinary method call or attribute access.
Other performance optimizations
We also landed two hasattr optimizations and a 3.8x performance improvement to unittest.mock.Mock.
When we optimize Python for internal use at Meta, we are usually able to test and validate our optimizations directly against our real-world workloads. Optimization work on open-source Python doesn’t have such a production workload to test against and needs to be effective (and avoid regression) on a variety of different workloads.
The Python Performance Benchmark suite is the standard set of benchmarks used in open-source Python optimization work. During the 3.12 development cycle, we contributed several new benchmarks to it so that it more accurately represents workload characteristics we see at Meta.
A set of async_tree benchmarks that better model an asyncio-heavy workload.
A pair of benchmarks that exercise comprehensions and super() more thoroughly, which were blind spots of the existing benchmark suite.
Some parts of Cinder (our JIT compiler and Static Python) wouldn’t make sense as part of upstream CPython (because of limited platform support, C versus C++, semantic changes, and just the size of the code), so our goal is to package these as an independent extension module, CinderX.
This requires a number of new hooks in the core runtime. We landed many of these hooks in Python 3.12:
An API to set the vectorcall entrypoint for a Python function. This gives the JIT an entry point to take over execution for a given function.
We added dictionary watchers, type watchers, function watchers, and code object watchers. All of these allow the Cinder JIT to be notified of dynamic changes that might invalidate its assumptions, so its fast path can remain as fast as possible.
We landed extensibility in the code generator for CPython’s core interpreter that will allow Static Python to easily re-generate an interpreter with added Static Python opcodes, and a C API to visit all GC-tracked objects, which will allow the Cinder JIT to discover functions that were created before it was enabled.
We also added a thread-safe API for writing to perf-map files. Perf-map files allow the Linux perf profiler to give a human-readable name to dynamically-generated sections of machine code, e.g. from a JIT compiler. This API will allow the Cinder JIT to safely write to perf map files without colliding with other JITs or with the new Python 3.12 perf trampoline feature.
These improvements will be useful to anyone building a third party JIT compiler or runtime optimizer for CPython. There are also plans to use the watchers internally in core CPython.
Beyond Python 3.12
Python plays a significant role at Meta. It’s an important part of our infrastructure, including the Instagram server stack. And it’s the lingua franca for our AI/ML work, highlighted by our development of PyTorch, a machine learning framework for a wide range of use cases including computer vision, natural language processing, and more.
Our work with the Python community doesn’t end with the 3.12 release. We are currently discussing a new proposal, PEP 703, with the Python Steering Council to remove the GIL and allow Python to run in multiple threads in parallel. This update could greatly help anyone using Python in a multi-threaded environment.
Meta’s involvement with the Python community also goes beyond code. In 2023, we continued supporting the Developer in Residence program for Python and sponsored events like PyCon US. We also shared our learnings in talks like “Breaking Boundaries: Advancements in High-Performance AI/ML through PyTorch’s Python Compiler” and posts on the Meta Engineering blog.
We are grateful to be a part of this open source community and look forward to working together to move the Python programming language forward.
The author would like to acknowledge the following people for their work in contributing to all of these new features: Eddie Elizondo, Vladimir Matveev, Itamar Oren, Steven Troxler, Joshua Xu, Shannon Zhu, Jacob Bower, Pranav Thulasiram Bhat, Ariel Lin, Andrew Frost, and Sam Gross.
Read MoreEngineering at Meta