Real-world verification
YAMLRocks is tested against YAML that people actually write, review, and keep in version control. The real-world corpus pulls public configuration repositories in as pinned git submodules and runs them through the same parser and round-trip emitter users get from the Python package.
The promise is deliberately narrow and testable: every standalone YAML file in
the corpus must parse and re-emit byte-for-byte in OPT_ROUND_TRIP mode. For
Home Assistant, selected repositories are also loaded through configuration.yaml
with native !include resolution enabled, then checked that every unmodified
source file writes back byte-for-byte.
Current corpus
Section titled “Current corpus”The corpus currently spans 95 public repositories across 25 ecosystems and
roughly 22,700 YAML files. Each submodule is pinned to a specific commit by
this repository, so failures are reproducible until the corpus is deliberately
refreshed. The Files column counts *.yaml and *.yml files after excluding
Helm chart templates.
See the test-suite notes in
tests/realworld/README.md
for the exact layout and update workflow.
What is verified
Section titled “What is verified”Per-file parse and round-trip
Section titled “Per-file parse and round-trip”Every *.yaml and *.yml file discovered in the checked-out corpus is parsed in
round-trip mode and immediately emitted again. The emitted bytes must exactly
match the original file.
Custom tags such as !include, !secret, !vault, and other application tags
are preserved in this mode rather than resolved, so the test checks whether the
file is valid YAML and whether YAMLRocks can keep its comments, anchors, scalar
styles, layout, and tags intact.
Home Assistant include graphs
Section titled “Home Assistant include graphs”Home Assistant configurations get an additional test because split
configuration is one of YAMLRocks’s core use cases. Repositories with a
configuration.yaml are loaded with OPT_INCLUDES | OPT_ROUND_TRIP, which
resolves their !include tree. The test then verifies two write-back properties:
- the root document re-emits with its include directives restored;
- every resolved source file re-emits byte-for-byte when it was not modified.
Some public Home Assistant repositories reference files that were intentionally not committed, such as secrets or generated credentials. Those include graphs are marked as strict expected failures, so they cannot hide a parser regression.
Scope by ecosystem
Section titled “Scope by ecosystem”The corpus proves that YAMLRocks handles the YAML shapes in these public repositories. It does not claim to replace each ecosystem’s own validation, rendering, or runtime semantics.
| Ecosystem | Verified | Not claimed |
|---|---|---|
| Home Assistant | Standalone files round-trip; selected include graphs load and write back byte-for-byte. | Validation of Home Assistant integrations or unavailable private secrets. |
| ESPHome | Device configuration YAML parses and preserves application tags. | Execution of ESPHome substitutions, code generation, or !lambda semantics. |
| Ansible | Playbooks, roles, inventories, and custom tags such as !vault are preserved. | Replacement of Ansible’s loader, inventory plugins, or task execution semantics. |
| Kubernetes | Public manifests and examples parse and round-trip byte-for-byte. | Kubernetes API schema validation or kubectl behavior. |
| Docker Compose | Compose examples parse and preserve layout. | Compose model validation or container runtime behavior. |
| CloudFormation | Template YAML parses and round-trips. | AWS resource validation or CloudFormation intrinsic execution. |
| GitOps | Argo CD, Flux, Fleet, and related GitOps examples parse and round-trip. | Controller reconciliation or rendered cluster state. |
| Helm | Non-template YAML in charts and examples parses and round-trips. | Raw Go template files under chart templates/, which are not standalone YAML until Helm renders them. |
| OpenAPI | Specification and example YAML parses and round-trips. | OpenAPI semantic validation. |
| dbt | Project and package YAML parses and round-trips. | dbt model compilation or project validation. |
| CircleCI | Pipeline configuration YAML parses and round-trips. | CircleCI configuration validation or job execution. |
| GitHub Actions | Workflow examples parse and round-trip. | GitHub Actions workflow validation or runner behavior. |
| Serverless | Framework examples parse and round-trip. | Provider-specific deployment validation. |
| Tekton | Task and pipeline YAML parses and round-trips. | Kubernetes admission or Tekton controller behavior. |
| Prometheus | Prometheus, Alertmanager, and operator configuration YAML parses and round-trips. | Prometheus rule validation, query validation, or operator behavior. |
| Argo | Argo CD, Argo Workflows, and Argo Rollouts YAML parses and round-trips. | Argo controller behavior or workflow execution. |
| OpenTelemetry | Collector, contrib, and demo configuration YAML parses and round-trips. | Collector component validation or telemetry processing behavior. |
| Azure Pipelines | Pipeline example YAML parses and round-trips. | Azure DevOps pipeline validation or job execution. |
| Cloud Foundry | BOSH and Cloud Foundry deployment YAML parses and round-trips. | BOSH deployment semantics or Cloud Foundry runtime behavior. |
| Concourse | Concourse pipeline and deployment YAML parses and round-trips. | Concourse pipeline validation or worker behavior. |
| Crossplane | Crossplane package and platform YAML parses and round-trips. | Crossplane schema validation, composition rendering, or controller behavior. |
| MkDocs | MkDocs project configuration YAML parses and round-trips. | MkDocs plugin loading or site build behavior. |
| Woodpecker | Woodpecker pipeline and plugin YAML parses and round-trips. | Woodpecker CI validation or job execution. |
| cloud-init | cloud-init YAML examples parse and round-trip. | cloud-init module validation or boot-time behavior. |
| goss | goss YAML tests parse and round-trip. | goss assertion execution. |
Reproduce it locally
Section titled “Reproduce it locally”Fetch the corpus once, then run the real-world category:
git submodule update --inituv run pytest tests/realworld -m realworldRun a single ecosystem by filtering the test ids:
uv run pytest tests/realworld -k ansibleuv run pytest tests/realworld -k kubernetesThe category auto-skips when the submodules are not checked out, so everyday contributors can still run the normal test suite without downloading the full corpus.
Known invalid files
Section titled “Known invalid files”A small number of third-party files use a .yaml or .yml extension but are not
valid standalone YAML, usually because they are templates that another tool must
render first. These files are recorded as strict expected failures in the test
harness. If YAMLRocks ever starts accepting one unexpectedly, the suite fails so
the behavior change is visible.
Helm chart templates are excluded for the same reason: files under a chart’s
templates/ directory are Go text/template source and are not YAML documents
until Helm renders them.
See also
Section titled “See also”- Round-trip editing: the byte-for-byte editing promise.
- Includes: native
!includeresolution and write-back. - Performance: reproducible benchmark commands.
- Projects using YAMLRocks: actual public adopters.