Skip to content

Performance

YAMLRocks is built for speed from the ground up: a custom Rust scanner and parser, zero-copy scalar borrowing, direct Python object construction through the CPython API, interned mapping keys, and a release profile tuned with fat LTO. The result is a library that is faster than PyYAML’s C loader on every operation and dramatically faster than the pure-Python round-trip libraries.

Run python bench/bench.py in the repository to reproduce these numbers on your own machine. The figures below come from a release build and are indicative. Your hardware, payload, and Python version will move them around.

Every figure below is how many times faster YAMLRocks is than the named library.

  • Parsing: YAMLRocks is ~5-10x faster than PyYAML’s C CSafeLoader, ~55-85x faster than pure-Python PyYAML, ~85-135x faster than ruamel.yaml, and ~22-33x faster than yamlium.
  • Serializing: YAMLRocks is ~15-19x faster than PyYAML’s C CSafeDumper, ~70-90x faster than pure-Python PyYAML, ~155-210x faster than ruamel.yaml, and ~8-14x faster than yamlium.
  • Native includes: YAMLRocks is ~18x faster than a PyYAML !include constructor for configurations split across hundreds of files, exactly the Home Assistant startup and reload pattern.

Most environments without libyaml installed fall back to pure-Python PyYAML, which is where the largest gap shows.

How many times faster YAMLRocks is at parsing each payload:

Payloadvs PyYAML (C)vs PyYAML (pure)vs ruamelvs yamlium
small (10 lines)~7x faster~56x faster~90x faster~25x faster
medium (k8s manifest)~8x faster~70x faster~108x faster~27x faster
large (500 items)~10x faster~83x faster~133x faster~32x faster
deep (30 levels)~5x faster~56x faster~85x faster~22x faster

How many times faster YAMLRocks is at serializing each payload:

Payloadvs PyYAML (C)vs PyYAML (pure)vs ruamelvs yamlium
small~17x faster~86x faster~194x faster~11x faster
medium~16x faster~84x faster~185x faster~11x faster
large~16x faster~82x faster~191x faster~13x faster
deep~17x faster~71x faster~156x faster~8x faster

yamlium emits comparatively quickly, so its dump gap is the smallest of the four, but YAMLRocks still leads on every shape. The margin narrows as individual strings grow very long (where yamlium serializes straight from the original Python str objects and YAMLRocks copies each string once more), yet YAMLRocks stays ahead even for an array of 500 long plain strings.

Includes (Home Assistant-style split config)

Section titled “Includes (Home Assistant-style split config)”

How many times faster YAMLRocks’s native include resolver is than a PyYAML !include constructor:

FilesYAMLRocks is
50~18x faster
200~18x faster
500~18x faster

The constructor approach re-enters the Python parser once per file and rebuilds the loader machinery each time. YAMLRocks resolves the whole include graph in Rust in a single pass, which is why the gap stays wide as the file count grows.

The speed is not one trick; it is a stack of decisions that each remove work from the hot path.

  • A custom Rust scanner and parser. There is no general-purpose dependency to fight. The scanner is tuned for the shapes real configs use (short keys, plain scalars, repetitive structure), so the common cases stay on the fast path.
  • Zero-copy scalar borrowing. A single-line plain scalar that needs no unescaping is read directly out of the input buffer rather than copied into a fresh allocation. The bytes you passed in are the scalar, right up until a Python str is built from them.
  • Direct Python object construction. Plain loads skips the rich round-trip AST (the one that preserves comments, styles, and spans). It resolves events into a lean Rust Value tree and builds the Python dicts and lists from it with raw CPython calls (PyList_New + PyList_SET_ITEM, PyDict_New + PyDict_SetItem), avoiding the per-element overhead of append/__setitem__.
  • Interned, cached mapping keys. Repeated keys (the norm in configuration, where every list item shares the same fields) are interned once per document and reused, so the intern-table lookup happens once per distinct key rather than per occurrence. That cuts allocations and makes later dictionary lookups faster.
  • A tuned release profile. The release build uses lto = "fat", codegen-units = 1, and opt-level = 3, letting the optimizer inline across the whole crate.
  • Pass bytes, not str. YAMLRocks is happiest with bytes. If your YAML already arrives as bytes from a socket or file, hand them straight to loads and skip a UTF-8 round trip.
  • Write dumps output directly. dumps returns bytes, so you can write it to a file or socket without an extra .encode().
import yamlrocks
source = """
name: app
port: 8080
"""
data = yamlrocks.loads(source)
payload = yamlrocks.dumps(data) # already bytes
with open("out.yaml", "wb") as f: # note "wb"
f.write(payload)
  • Use the fast path, not round-trip, unless you need comments. Plain loads/dumps skip building the rich YAMLRocksDocument tree. Reach for OPT_ROUND_TRIP only when you actually need to preserve comments and layout.
  • Prefer native includes. For split configurations, OPT_INCLUDES resolves the whole graph in Rust, far faster than a hand-rolled !include constructor.
  • Build a release wheel. Install from PyPI (pip install yamlrocks) or build with maturin build --release. Debug builds are many times slower; never benchmark one.

The benchmark harness lives in the repository and compares YAMLRocks against PyYAML (C loader) and ruamel.yaml across the payloads in the tables above:

Terminal window
python bench/bench.py

It prints per-payload timings and the relative speedups. Run it on the machine and Python build you care about; numbers from someone else’s laptop are only a rough guide.

YAMLRocks is free-threaded safe. On a free-threaded (nogil) CPython build, parsing and serializing run without holding the GIL, so multiple threads can load and dump YAML in parallel and actually use multiple cores. There is no special flag to set: the same loads/dumps calls scale across threads on a free-threaded interpreter.