Documentation Index
Fetch the complete documentation index at: https://mintlify.com/python/cpython/llms.txt
Use this file to discover all available pages before exploring further.
String Interning
String interning is an optimization that ensures identical strings share the same memory location, enabling fast pointer comparison.What is Interning?
Interned strings form an interpreter-global set with two properties:- No two interned strings have the same content
- Two interned strings can be compared using pointer equality (
is)
- Dictionary lookups (attribute access, global variables)
- String comparisons in hot paths
- Memory usage (deduplication)
Example
Two Interning Mechanisms
CPython uses two different mechanisms:1. Singletons
Statically allocated strings that always exist.2. Dynamic Interning
Runtime interning of strings in an interpreter-wide dictionary.Singletons
Latin-1 Single Characters
All 256 single-character latin-1 strings are pre-allocated:Identifier Strings
Common identifiers marked in C source:Singleton Collection
Singletons collected bymake regen-global-objects:
- Scan CPython source for
_Py_IDand_Py_STRmacros - Generate code in Tools/build/generate_global_objects.py
- Produce declaration, initialization, and finalization code
Singleton Storage
Stored in runtime-global table:- Initialized at runtime startup
- Immutable until runtime finalization
- Shared across threads and interpreters without synchronization
The three singleton sets (latin-1 chars,
_Py_ID, _Py_STR) are disjoint - no overlaps.Dynamic Interning
All other interned strings are allocated dynamically.Storage
Stored in interpreter-wide dictionary:- Key and value reference the same object
- One dict entry per unique interned string
Static Allocation Flag
Dynamic strings have:Immortality and Reference Counting
Invariant: Every immortal string is interned.Never use
_Py_SetImmortal() on a string directly! Use _PyUnicode_InternImmortal() instead, which handles interning correctly.Mortal Interned Strings
The converse is NOT true - interned strings can be mortal. For mortal interned strings:- The 2 references from the dict (key + value) are excluded from refcount
unicode_dealloc()removes string from interned dict- At shutdown, dict clearing adds references back before deletion
When to Immortalize
Immortalize strings that live until interpreter shutdown:- Strings in code objects
- Strings in
marshaldata - Strings in compiler-generated constants
- Even with hot reloading or
eval(), identifier count stays low - Immortalizing prevents repeated allocation/deallocation
Internal API
Three internal interning functions:_PyUnicode_InternMortal
- Takes ownership of reference (steals)
- Returns new reference via pointer update
- “Reference neutral” (refcount unchanged from caller’s view)
_PyUnicode_InternImmortal
- Takes ownership of reference
- Immortalizes the result
- Returns new reference
_PyUnicode_InternStatic
- Only for
_Py_STR,_Py_ID, or single-byte strings - Not for general use
All intern functions take a pointer to
PyObject* and modify it in place. This enables reference ownership transfer.Reference Neutrality
- Steals the incoming reference (consumes it)
- Provides a new reference (creates it)
- Net effect: caller still has 1 reference
Critical: Never call intern functions with a borrowed reference! Always own the reference you pass.
Interning State
Stored in_PyUnicode_STATE(s).interned:
SSTATE_NOT_INTERNED(0)SSTATE_INTERNED_MORTAL(1)SSTATE_INTERNED_IMMORTAL(2)SSTATE_INTERNED_IMMORTAL_STATIC(3)
State Transitions
For dynamically allocated strings:Using
_PyUnicode_InternStatic on dynamically allocated strings is an error.Performance Impact
Dictionary Lookups
Without interning:Memory Savings
Explicit Interning
Python exposes interning viasys.intern():
When to Use
Explicit interning useful for:- Many string comparisons
- Large number of duplicate strings
- Performance-critical code paths
Implementation Files
C Implementation
- Objects/unicodeobject.c - String object and interning logic
- Python/pylifecycle.c - Runtime initialization (intern singletons)
Singleton Generation
- Tools/build/generate_global_objects.py - Collect and generate singletons
- Include/internal/pycore_global_objects.h - Generated singleton declarations
Related Topics
- Source Code Structure - Where string implementation lives
- Garbage Collector - String memory management
- Code Objects - Interned strings in bytecode
