Skip to content

Schema validation

YAMLRocks can validate a document against a JSON Schema. Pass the schema as a Python dict to loads (or load) through the schema= keyword. If the document conforms, you get the parsed value back exactly as without a schema. If it does not, YAMLRocks raises YAMLRocksDecodeError with a precise source location and a JSON path to the offending node.

Validation runs against the rich syntax tree (the same structure that powers round-trip mode), so every node still knows its source line and column. That is how a schema failure can point at an exact line, column rather than just “somewhere in your data”.

import yamlrocks
schema = {
"type": "object",
"required": ["name", "port"],
"properties": {
"name": {"type": "string", "minLength": 1},
"port": {"type": "integer", "minimum": 1, "maximum": 65535},
"tags": {"type": "array", "items": {"type": "string"}},
},
"additionalProperties": False,
}
source = """
name: app
port: 8080
"""
yamlrocks.loads(source, schema=schema)
# {'name': 'app', 'port': 8080}

When a value is out of range, the error names both the JSON path ($.port) and the line and column in the original YAML:

import yamlrocks
schema = {
"type": "object",
"required": ["name", "port"],
"properties": {
"name": {"type": "string", "minLength": 1},
"port": {"type": "integer", "minimum": 1, "maximum": 65535},
},
"additionalProperties": False,
}
source = """
name: app
port: 70000
"""
yamlrocks.loads(source, schema=schema)
# yamlrocks.YAMLRocksDecodeError: schema validation failed: value 70000 is greater than
# maximum 65535 at $.port (line 2, column 7)

Schemas nest the same way your data does. A properties entry can itself be an object schema with its own required and properties:

import yamlrocks
schema = {
"type": "object",
"properties": {
"server": {
"type": "object",
"required": ["host"],
"properties": {
"host": {"type": "string"},
"port": {"type": "integer", "minimum": 1, "maximum": 65535},
},
},
},
}
source = """
server:
host: db
port: 5432
"""
yamlrocks.loads(source, schema=schema)
# {'server': {'host': 'db', 'port': 5432}}

A violation deep in the tree reports the full path to it:

import yamlrocks
schema = {
"type": "object",
"properties": {
"server": {
"type": "object",
"properties": {
"port": {"type": "integer", "minimum": 1},
},
},
},
}
source = """
server:
port: 0
"""
yamlrocks.loads(source, schema=schema)
# yamlrocks.YAMLRocksDecodeError: schema validation failed: value 0 is less than
# minimum 1 at $.server.port (line 2, column 9)

Use items to validate every element of a sequence against one schema, and minItems / maxItems to bound its length:

import yamlrocks
schema = {
"type": "array",
"items": {"type": "integer", "minimum": 0},
"minItems": 1,
"maxItems": 3,
}
source = """
- 1
- 2
"""
yamlrocks.loads(source, schema=schema)
# [1, 2]

When an element fails, the path uses array index notation ($[1]):

import yamlrocks
schema = {"type": "array", "items": {"type": "integer", "minimum": 0}}
source = """
- 1
- -5
"""
yamlrocks.loads(source, schema=schema)
# yamlrocks.YAMLRocksDecodeError: schema validation failed: value -5 is less than
# minimum 0 at $[1] (line 2, column 3)

enum restricts a value to a fixed set; const pins it to exactly one value:

import yamlrocks
schema = {
"type": "object",
"properties": {
"level": {"enum": ["debug", "info", "warning", "error"]},
"version": {"const": 1},
},
}
source = """
level: info
version: 1
"""
yamlrocks.loads(source, schema=schema)
# {'level': 'info', 'version': 1}

A value outside the enum is rejected at its exact location:

import yamlrocks
schema = {
"type": "object",
"properties": {"level": {"enum": ["debug", "info", "warning", "error"]}},
}
yamlrocks.loads(b"level: verbose\n", schema=schema)
# yamlrocks.YAMLRocksDecodeError: schema validation failed: value is not one of the
# allowed enum values at $.level (line 1, column 8)

allOf, anyOf, oneOf, and not compose smaller schemas. A common pattern is “this field is either a string or an integer”:

import yamlrocks
schema = {
"type": "object",
"properties": {
"id": {"anyOf": [{"type": "string"}, {"type": "integer"}]},
},
}
yamlrocks.loads(b"id: 7\n", schema=schema) # {'id': 7}
yamlrocks.loads(b"id: abc123\n", schema=schema) # {'id': 'abc123'}

If the value matches none of the branches, validation fails:

import yamlrocks
schema = {"anyOf": [{"type": "string"}, {"type": "integer"}]}
yamlrocks.loads(b"3.14", schema=schema)
# yamlrocks.YAMLRocksDecodeError: schema validation failed: value does not match any of
# the anyOf schemas at $ (line 1, column 1)

YAMLRocks implements a practical, draft-7-ish subset of JSON Schema, enough to express the constraints configuration files actually need, without pulling in a full validator. The supported keywords are:

GroupKeywords
Typestype (null, boolean, integer, number, string, array, object)
Valuesenum, const
Objectsproperties, required, additionalProperties (boolean or schema)
Arraysitems, minItems, maxItems
Numbersminimum, maximum, exclusiveMinimum, exclusiveMaximum
StringsminLength, maxLength
CombinatorsallOf, anyOf, oneOf, not

The validator is tuned for the scalar-and-shape constraints configuration files actually use. Three boundaries are worth knowing, and all three are reasons to reach for a dedicated JSON Schema library when you need them:

  • enum and const compare scalars. They are reliable for strings, numbers, booleans, and null. Using them to pin an array or object value is not supported and may reject an otherwise-matching value, so do not rely on structural const/enum.
  • Object rules apply to scalar keys. properties, required, and additionalProperties match string keys. A YAML collection key ([a, b]: ...) is not a JSON object key and is not subject to these rules, so it neither satisfies required nor trips additionalProperties: false.
  • The first error is reported. Validation stops at the first violation and raises it with its path, line, and column. It does not accumulate every problem in one pass, so fixing one error may reveal the next on the following run.

Editors such as VS Code (through the yaml-language-server extension) let a document declare its own schema with a comment, conventionally on the first line:

# yaml-language-server: $schema=https://example.com/config.schema.json
name: app
port: 8080

YAMLRocks recognizes this directive, but treats detecting it and acting on it as two separate steps, on purpose.

schema_ref reads the leading comment block and returns the declared reference, or None if the document does not declare one. It only inspects comments at the top of the file; it never parses the body and never performs any I/O, so it is always cheap and safe to call:

import yamlrocks
doc = b"# yaml-language-server: $schema=https://example.com/config.schema.json\nport: 8080\n"
yamlrocks.schema_ref(doc)
# 'https://example.com/config.schema.json'
yamlrocks.schema_ref(b"port: 8080\n")
# None

To validate against the in-file reference, pass schema="auto" together with a schema_resolver, a callable that receives the reference string and returns a schema dict (or None to decline). YAMLRocks detects the directive, calls your resolver, and validates against whatever it returns. If there is no directive, or the resolver returns None, validation is skipped and the parsed value is returned as usual.

import yamlrocks
# A real resolver might read from a local cache, a bundled file, or an
# allow-listed fetch. Here we just map known references to schemas.
SCHEMAS = {
"https://example.com/config.schema.json": {
"type": "object",
"required": ["name", "port"],
"properties": {
"name": {"type": "string"},
"port": {"type": "integer", "minimum": 1, "maximum": 65535},
},
},
}
def resolve(ref):
return SCHEMAS.get(ref)
doc = b"# yaml-language-server: $schema=https://example.com/config.schema.json\nname: app\nport: 8080\n"
yamlrocks.loads(doc, schema="auto", schema_resolver=resolve)
# {'name': 'app', 'port': 8080}

A document that declares a schema and violates it fails exactly like the explicit schema= path, with a line-accurate error:

import yamlrocks
SCHEMAS = {
"https://example.com/config.schema.json": {
"type": "object",
"properties": {"port": {"type": "integer"}},
},
}
doc = b"# yaml-language-server: $schema=https://example.com/config.schema.json\nport: not-a-number\n"
yamlrocks.loads(doc, schema="auto", schema_resolver=SCHEMAS.get)
# yamlrocks.YAMLRocksDecodeError: schema validation failed: expected type integer,
# found string at $.port (line 2, column 7)

This keeps the network decision where it belongs: in your hands. A resolver can consult a local cache, load a schema bundled with your application, or perform a fetch restricted to hosts you trust.