PEP 703: How removing the GIL stopped being all-or-nothing
Big decisions don't stall because they're wrong. They stall when they're all-or-nothing.
PEP 703 — a Python Enhancement Proposal, the formal document type Python’s maintainers use to debate language changes — was accepted in October 2023 by the project’s five-person Steering Council. The acceptance carried a written rollback clause: the right to undo the whole change if the rollout proves too disruptive. The proposal made Python’s Global Interpreter Lock optional. Its implementation, by Meta engineer Sam Gross, replaced the atomic-everywhere approach that had stalled prior attempts with five layered mechanisms whose costs the runtime can spread instead of paying all at once. Three years and two phased releases later, the new build is officially supported, single-thread overhead is 5–10%, the default Python build still uses the GIL, and the Council still holds the right to undo it.
A thirty-year design choice and the cost of removing it
The Global Interpreter Lock has been part of CPython since the first release in 1991. The rule is simple: only one OS thread can execute Python bytecode at a time. The rule is what lets every Python object’s reference count — the small integer that tracks how many things point at the object — be modified with a plain processor instruction instead of a synchronizing one. Reference counts change on almost every operation in Python: every variable assignment, every function call, every list-or-dict access. The lock is what keeps those changes safe.
Modern processors share memory between cores through caches. To make a memory write thread-safe without a global lock, the processor needs an atomic instruction — a special operation that forces the write to be visible to every other core before any of them can act on stale data. On contemporary hardware, atomic instructions are roughly an order of magnitude slower than plain ones, because they invalidate other cores’ cached copies and prevent the CPU from reordering instructions across the operation. Reference counting is hot — Python performs hundreds of millions of refcount operations per second — and turning every one of them atomic is what makes removing the GIL expensive.
The first serious attempt was Larry Hastings’s Gilectomy, a fork of CPython that removed the GIL and replaced refcount mutations with atomic ones. Hastings reported the result on LWN in 2016:
@larryhastings
View on GitHub →
That means using atomic increment and decrement operations, which leads to a 30% performance hit right off the bat.
Multi-thread scaling worked. Single-thread performance regressed by roughly thirty percent. By the 2018 Python Language Summit, Hastings publicly described the project as out of bullets. Within Python’s core circles, removing the GIL settled into the category of known-impossible without losing the language’s single-thread performance — the constraint Guido van Rossum had set in 2016 as the binding one.
The number that changed the conversation
In May 2022, Meta engineer Sam Gross presented a different prototype at the Language Summit. His implementation, nogil, removed the GIL but reported a single-thread overhead of five to eight percent — not thirty.
Brandt Bucher, a CPython core developer at Microsoft, named the shift in one sentence:
@brandtbucher
View on GitHub →
Thank you for doing this… now we have something we can actually consider!
The frame moved from can this be done? to should this be done? The technical bar had finally cleared.
What Gross had done was not invent a faster atomic instruction. He layered the cost reduction across three places where most refcount operations could skip the atomic step, and two more where the underlying mechanics were redesigned to keep the new model from becoming a different kind of bottleneck.
The three skips:
Biased reference counting — every Python object now records which thread created it. While that owner thread alone touches the object, refcount changes use a plain non-atomic instruction. Only when another thread accesses the object does it switch to atomic mode. This is the common case for most short-lived objects.
Immortal objects (formalized in PEP 683, the companion proposal) — runtime singletons like None, True, False, and small integers carry a sentinel refcount value. The runtime doesn’t modify them at all. Refcount mutations on the most-touched objects in any Python program go to zero.
Deferred reference counting — refcount changes for transient stack-resident objects (function objects, code objects, modules) are postponed until the next garbage-collection pass instead of happening on every push and pop.
The two redesigned mechanics:
mimalloc replaces CPython’s pymalloc with a thread-aware allocator originally from Microsoft Research. Each thread allocates from its own arena, so memory allocation no longer needs cross-thread coordination. This isn’t a per-operation skip; it’s a different allocator whose normal mode is already lock-free for the common case.
PyMutex replaces the GIL’s single global lock with a one-byte lock embedded in each mutable container. The GIL guarded everything; PyMutex guards only the dict, list, or set actually being touched, and only when two threads happen to touch the same one. Again not a skip — locking still happens — but at a granularity that mostly stays uncontended.
Together, the three skips and the two redesigned mechanics drop the cost from thirty percent to five. No single trick gets there alone.
The deadline and the funded answer
PEP 703 was formally submitted on January 9, 2023. The Steering Council reviewed it through the spring without responding. By early June, a benchmark dispute had opened — Mark Shannon, lead of Microsoft’s Faster CPython team (a separate effort to make single-thread Python faster, on top of the GIL), posted measurements showing the overhead higher than Gross had claimed, and projected that gap to widen as the rest of the virtual machine accelerated. The technical question was not settled.
Five days later, Gross — until then a quiet voice in the thread, mostly responding to technical questions — published the message that moved the decision:
@colesbury
View on GitHub →
Without specific concerns or a clear bar for acceptance, I (and my funding organization) will have to treat the current decision-in-limbo as a “no” and will be unable to pursue the PEP further.
The funding organization was Meta. The implication was concrete: the work could not continue indefinitely without a decision. The Steering Council had to either accept, reject, or set a bar. The middle option — silence — was now expensive.
Exactly one month later, on July 7, Carl Meyer — engineering manager at Meta’s Python Runtime team — replied with a number:
@carljm
View on GitHub →
If PEP 703 is accepted, Meta can commit to support in the form of three engineer-years (from engineers experienced working in CPython internals) between the acceptance of PEP 703 and the end of 2025, to collaborate with the core dev team on landing the PEP 703 implementation smoothly in CPython and on ongoing improvements to the compatibility and performance of nogil CPython.
Three weeks after Meyer’s post, the Steering Council issued a formal notice of intent to accept.
The yes that contains a no
The acceptance arrived on October 24, 2023, written by Thomas Wouters on behalf of the Steering Council. Most PEP acceptances are short. This one carried an unusual clause:
@Yhg1s
View on GitHub →
In short, the SC accepts PEP 703, but with clear provisio: that the rollout be gradual and break as little as possible, and that we can roll back any changes that turn out to be too disruptive — which includes potentially rolling back all of PEP 703 entirely if necessary (however unlikely or undesirable we expect that to be).
The Council had not accepted GIL removal. It had accepted a conditional version of GIL removal — one whose rollback was named explicitly in the contract.
A second sentence in the same acceptance set the constraint that shaped the implementation:
The Council’s earlier notice of intent had already set the constraint that shaped the implementation:
@Yhg1s
View on GitHub →
We do not want another Python 3 situation, so any changes in third-party code needed to accommodate no-GIL builds should just work in with-GIL builds […] This is not Python 4.
The 2008 Python 2-to-3 transition split the ecosystem for nearly a decade. The Council was naming that fear in writing — and shaping the implementation around it.
The shape that shipped
Both sides — the runtime cost and the Council’s commitment — were paid in pieces instead of all at once.
For the runtime, the five mechanisms above dropped the cost from thirty percent to five. The acceptance side ran the same idea on a different level. The new build is opt-in at compile time. Packages compiled for it are marked with a separate identifier so they can’t mix with the regular build. The rollout was split across three numbered phases. And the Council’s written rollback covered all of it — all of PEP 703 entirely if necessary.
Phase I shipped in Python 3.13 (October 2024) at forty percent overhead because one of CPython’s other optimizations had to be turned off; Phase II shipped twelve months later in Python 3.14 with that optimization restored, and overhead came down to 5–10 percent. Phase III, when the new build becomes default, has no committed date.
None of this is free. The runtime pays in single-thread overhead. Native extensions pay in migration work — any extension that touches Python objects from a background thread needs new locking, and any extension that hasn’t declared compatibility silently turns the GIL back on. CPython maintainers pay in separability — every line of free-threaded code has to stay identifiable enough that the Council can still pull it out, which means the two paths can’t be cleanly entangled even when entanglement would simplify the code.
A single yes-or-no would have stalled. Gilectomy stalled exactly that way; the years between 2018 and 2022 stalled in the same shape. PEP 703 shipped because no part of it had to be all-or-nothing. The runtime spread the cost. The Council spread the commitment.
The GIL is still default in Python 3.14. So is the path to take it out — and the path to put it back.
Further reading
- PEP 703 — Making the Global Interpreter Lock Optional in CPython — the proposal itself.
- Steering Council acceptance announcement — the rollback-clause text in full.
- LWN: GIL removal and the Faster CPython project — Jake Edge’s 2023 summary of the performance dispute.
- LWN: Gilectomy at the 2018 Language Summit — the historical case the new approach had to outperform.