Annotated mode
When a tool needs to point a user at the exact spot of a problem (“the port on line 7 is out of range”), it needs to know where each value came from. Plain parsing throws that information away the moment the text becomes objects.
OPT_ANNOTATED keeps it. Instead of plain containers it returns lightweight
subclasses (YAMLRocksAnnotatedDict, YAMLRocksAnnotatedList, and YAMLRocksAnnotatedStr) that behave
exactly like dict, list, and str, but also carry the source location of the
node they represent. Your existing code keeps working unchanged; the location is
there when you reach for it.
import yamlrocks
data = yamlrocks.loads( b"name: app\nserver:\n host: localhost\n port: 8080\n", option=yamlrocks.OPT_ANNOTATED,)
isinstance(data, dict) # True (a real dict subclass)data.__line__ # 1data.__column__ # 1
data["server"].__line__ # 3 (the mapping body starts here)data["server"]["host"].__line__ # 3data["server"]["host"].__column__ # 9Every annotated node exposes five attributes:
| Attribute | Meaning |
|---|---|
__line__ | 1-based source line where the node starts |
__column__ | 1-based source column where the node starts |
__file__ | originating file path, or None |
__end_line__ | 1-based line just past the node’s last character |
__end_column__ | 1-based column just past the node’s last character |
__offset__ | 0-based byte offset of the node’s first character |
__end_offset__ | 0-based byte offset just past the node’s last character |
The start and end together give a full span you can underline. For a scalar the
end is just past its last character; for a mapping or sequence it reaches the end
of the block (the furthest point of any child). This mirrors the start/end marks
PyYAML exposes as node.start_mark/node.end_mark.
import yamlrocks
data = yamlrocks.loads(b"key: value\nbroad: x\n", option=yamlrocks.OPT_ANNOTATED)key = list(data)[1] # the 'broad' key(key.__line__, key.__column__) # (2, 1)(key.__end_line__, key.__end_column__) # (2, 6) (just past 'broad')They behave like the builtins
Section titled “They behave like the builtins”The whole point of annotated mode is that nothing else changes. A YAMLRocksAnnotatedDict
is a dict, a YAMLRocksAnnotatedList is a list, and a YAMLRocksAnnotatedStr is a str, so
they pass isinstance checks, support every method and operator, and serialize
the way you expect:
import yamlrocks
source = """name: appserver: host: localhost"""
data = yamlrocks.loads(source, option=yamlrocks.OPT_ANNOTATED)
# Dict behavior.list(data.keys()) # ['name', 'server']{**data["server"]} # {'host': 'localhost'}
# Str behavior on a scalar.host = data["server"]["host"]host.upper() # 'LOCALHOST'host == "localhost" # Truehost + ":8080" # 'localhost:8080'Because they are genuine subclasses, you can pass annotated values to any function
that expects a plain dict, list, or str and it will not notice the
difference.
Attaching attributes
Section titled “Attaching attributes”Some libraries hook a type by setting a class attribute on it (for example
voluptuous looks for a __voluptuous_compile__ method to compile a value). That
is supported: the annotated classes are writable, so you can attach methods or
other class attributes to them.
import yamlrocks
data = yamlrocks.loads(b"name: app", option=yamlrocks.OPT_ANNOTATED)
# Attach a class attribute (here a method) to the annotated string type.type(data["name"]).__shout__ = lambda self: self.upper() + "!"data["name"].__shout__() # 'APP!'Which nodes are annotated
Section titled “Which nodes are annotated”Mappings become YAMLRocksAnnotatedDict, sequences become YAMLRocksAnnotatedList, and string
scalars become YAMLRocksAnnotatedStr, including mapping keys, so you can point an
error at the exact key rather than only its value. By default the remaining
scalars (integers, floats, booleans, and None) stay as their plain Python
types. Annotated keys are still ordinary, hashable strings, so dict lookups with a
plain str work unchanged.
import yamlrocks
data = yamlrocks.loads( b"server:\n host: localhost\n port: 8080\n", option=yamlrocks.OPT_ANNOTATED,)
type(data).__name__ # 'YAMLRocksAnnotatedDict'next(iter(data)).__line__ # 1 (the `server` key's own line)type(data["server"]["host"]).__name__ # 'YAMLRocksAnnotatedStr'type(data["server"]["port"]).__name__ # 'int' (plain by default)So by default a string value like host carries __line__/__column__, but an
integer like port does not. To locate a non-string scalar, read the position
from the mapping or sequence that contains it, or opt into numeric annotation
(below).
Locating numbers: OPT_ANNOTATE_NUMBERS
Section titled “Locating numbers: OPT_ANNOTATE_NUMBERS”Add OPT_ANNOTATE_NUMBERS to also annotate integers and floats, so an error on a
numeric value (an out-of-range port, say) can point at its own line. Integers
become YAMLRocksAnnotatedInt and floats YAMLRocksAnnotatedFloat, carrying the same
attributes as annotated strings:
import yamlrocks
data = yamlrocks.loads( b"port: 8080\n", option=yamlrocks.OPT_ANNOTATED | yamlrocks.OPT_ANNOTATE_NUMBERS,)data["port"] # 8080data["port"].__line__ # 1data["port"] + 1 # 8081 (still an int in every way that matters)An annotated number is an int/float subclass: isinstance(x, int),
equality, arithmetic, and hashing all behave normally, but type(x) is int is
False, and there is a small per-number boxing cost. The flag is off by default
so the common case stays plain (and fast). bool and None are never annotated,
even with the flag: Python does not allow subclassing them, which is also why
PyYAML leaves them unannotated.
A complex key (a sequence or mapping used as a
mapping key) is rendered the same way as on the plain path, a tuple (a mapping
key becomes a tuple of its pairs), since a Python dict cannot key on an
unhashable annotated container. Scalar keys are still annotated; only collection
keys convert.
Knowing how a string was written: __style__
Section titled “Knowing how a string was written: __style__”A YAMLRocksAnnotatedStr also carries __style__, the source style of the scalar:
"plain", "single" ('...'), "double" ("..."), "literal" (|), or
"folded" (>). This lets a tool tell a block scalar from an inline one, for
example to offset a generated #line directive to the block body, since a block
scalar’s __line__ points at the |/> indicator line and the content begins on
the next line. The vocabulary matches round-trip
YAMLRocksNode.style.
import yamlrocks
source = """inline: hiblock: | body"""
data = yamlrocks.loads(source, option=yamlrocks.OPT_ANNOTATED)data["inline"].__style__ # 'plain'data["block"].__style__ # 'literal' (a | block; content starts at __line__ + 1)Knowing which tag produced a value: __source_tag__
Section titled “Knowing which tag produced a value: __source_tag__”Every annotated node carries __source_tag__: the tag that produced it, or
None for a plain inline scalar. It is the originating config directive
("!secret", "!env_var", an "!include" family tag) when the value came from
one, or the node’s own custom application tag ("!mytag") otherwise. Core
!!type tags are not provenance and report None.
For the three built-in config tags there are convenience predicates,
is_secret, is_env_var, and is_include (the last covers all five
!include* variants), so a tool can react without string-matching:
import osimport tempfileimport yamlrocks
workdir = tempfile.mkdtemp()with open(os.path.join(workdir, "secrets.yaml"), "w") as f: f.write("api_key: s3cr3t\n")with open(os.path.join(workdir, "configuration.yaml"), "w") as f: f.write("api_key: !secret api_key\ntitle: My App\n")
opt = yamlrocks.OPT_ANNOTATED | yamlrocks.OPT_SECRETS | yamlrocks.OPT_INCLUDESdata = yamlrocks.load(os.path.join(workdir, "configuration.yaml"), option=opt)
data["api_key"].is_secret # True (from `api_key: !secret api_key`)data["api_key"].__source_tag__ # '!secret'data["api_key"].__source_target__ # 'api_key' (the directive's argument)data["title"].__source_tag__ # None (a plain inline value)__source_target__ carries the directive’s argument: the secret name for
!secret, the path for !include, the variable spec for !env_var (or None
when there is no directive). Together with __source_tag__ it reconstructs the
original directive, e.g. f"{n.__source_tag__} {n.__source_target__}" gives back
"!secret api_key", which is what a tool needs to redact a value and re-emit
the reference rather than the resolved secret.
This is what lets a viewer or linter working on the parsed tree redact
secret-derived values, or flag where an env var or include fed a value, without a
separate bookkeeping pass. The same attribute and predicates are on the
round-trip YAMLRocksNode.
Sequence elements are annotated individually, so you can locate any item:
import yamlrocks
data = yamlrocks.loads(b"items:\n - a\n - b\n", option=yamlrocks.OPT_ANNOTATED)
type(data["items"]).__name__ # 'YAMLRocksAnnotatedList'data["items"].__line__ # 2data["items"][0].__line__ # 2data["items"][1].__line__ # 3Tracking the originating file
Section titled “Tracking the originating file”__file__ is None for input parsed from a string or bytes, since there is no
file behind it. It becomes meaningful when a value is pulled in from another file
through an !include directive: each annotated node then
reports the file it physically came from, which is exactly what you need to send a
user to the right place in a split configuration.
The example below is self-contained: it writes a small two-file configuration to a temporary directory, then reads back the file each node belongs to.
import osimport tempfileimport yamlrocks
config = tempfile.mkdtemp()
with open(os.path.join(config, "configuration.yaml"), "wb") as handle: handle.write(b"automation: !include automations.yaml\n")with open(os.path.join(config, "automations.yaml"), "wb") as handle: handle.write(b"- alias: night\n trigger: time\n")
data = yamlrocks.load( os.path.join(config, "configuration.yaml"), option=yamlrocks.OPT_ANNOTATED | yamlrocks.OPT_INCLUDES,)
# The included list and its items report the file they came from.assert data["automation"].__file__.endswith("automations.yaml")assert data["automation"][0].__file__.endswith("automations.yaml")Aliases share their anchor’s object
Section titled “Aliases share their anchor’s object”An anchor (&a) and every alias (*a) that references it resolve to the same
annotated object, exactly as PyYAML does. They are not independent copies, so a
mutation made through one reference is visible through all of them:
import yamlrocks
source = """base: &a k: 1ref: *a"""
data = yamlrocks.loads(source, option=yamlrocks.OPT_ANNOTATED)
data["base"] is data["ref"] # True, the same objectdata["base"]["k"] = 99data["ref"]["k"] # 99, seen through the shared referenceThis matters for tools that define a block once under an anchor and reuse it in
several places, then process the result in place: the work is done once and seen
everywhere. The plain fast path (loads with no options)
instead gives each alias an independent copy, which is faster when you do not need
shared identity.
Home Assistant compatibility
Section titled “Home Assistant compatibility”Annotated mode mirrors the node classes Home Assistant uses internally for exactly
this purpose. YAMLRocksAnnotatedDict, YAMLRocksAnnotatedList, and YAMLRocksAnnotatedStr stand in for
Home Assistant’s NodeDictClass, NodeListClass, and NodeStrClass, exposing
the same __line__, __column__, and __file__ attributes. Code that inspects
those attributes to produce friendly, location-aware error messages works against
YAMLRocks’s annotated objects unchanged.
See also
Section titled “See also”- Loading YAML: plain parsing without annotations.
- Round-trip editing: editable positions via
range(). - Includes: where
__file__comes from. - Home Assistant recipe: annotated mode in practice.
- API reference and options.