Skip to content

Add SnapshotHash::Canonical machinery (no behavior change)#7419

Merged
AntoineToussaint merged 9 commits into
mainfrom
pr/canonical-hash-snapshot
May 4, 2026
Merged

Add SnapshotHash::Canonical machinery (no behavior change)#7419
AntoineToussaint merged 9 commits into
mainfrom
pr/canonical-hash-snapshot

Conversation

@AntoineToussaint
Copy link
Copy Markdown
Member

Summary

Foundation for the config-in-database work: introduce a content-stable structural hash (SnapshotHash::Canonical) that hashes StoredConfig via canonical JSON encoding, alongside the existing canonical-TOML-bytes hash (now tagged LegacyToml).

Pure addition. No DB columns, no wire format changes for legacy hashes, nothing on the inference path is affected. Subsequent PRs introduce the storage columns, wire-up, and default switch.

Why — context for the config-in-database work

The legacy SnapshotHash is blake3 over canonical-TOML bytes. That works fine when configs only ever live as TOML files on disk, but it's fragile in the world we're moving into where config rows are persisted across:

  • TOML crate version bumps (e.g. float reformatting 0.70.6999999988…)
  • Serializer / canonicalization rule changes
  • Round-trips through the planned config_jsonb JSONB column (where the input is a serde_json::Value, not TOML bytes)

Any of these can drift the hash and silently invalidate every inferences.snapshot_hash reference — which would break every historical-config view in the UI and every audit query that joins inferences back to the config they ran against.

The structural hash decouples identity from textual encoding: the same logical config produces the same hash regardless of which serializer round-trips it last. That property is what makes the JSONB column safe to add (PR 2) and what lets us flip Config.hash to canonical without breaking historical references (PR 4).

What this PR adds

Type-level changes (tensorzero-types/src/snapshot.rs)

  • SnapshotHashScheme { LegacyToml, Canonical } enum.
  • SnapshotHash carries a scheme tag; Display / FromStr / serde use a self-describing wire form: can:DECIMAL for canonical, bare decimal for legacy. The legacy bare-decimal form is unchanged from main, so existing inferences.snapshot_hash rows, autopilot tags, log lines, and URLs continue to parse without modification.
  • from_biguint_canonical / from_canonical_bytes constructors.
  • serialize_hash_bare_decimal / serialize_optional_hash_bare_decimal helpers for DB-row structs whose backing storage cannot accept the prefix (CH UInt256, PG NUMERIC(78,0)). Defined here, applied in PR 4.

The hash itself (config/snapshot/canonical_hash.rs)

Walks serde_json::Value with a self-describing tagged encoding:

  • 1-byte type tag per node (null/bool/number/string/array/object) — string "42" cannot collide with number 42.
  • 8-byte BE length prefix on strings/arrays/objects — ["ab"] cannot collide with ["a","b"].
  • Sorted object keys — JSON's lack of ordering can't drift the hash.
  • Numbers hashed as IEEE 754 BE bit pattern — robust to serde_json::Number text round-trips that may pick different decimal representations.

StoredConfig::canonical_hash() -> Result<SnapshotHash> is the entry point. Public but unused at runtime; tests are the only callers. The first runtime caller lands in the follow-up PR that adds the canonical_hash column to config_snapshots.

Tests

  • 11 inline unit tests in canonical_hash.rs: determinism, JSON/TOML/TOML→JSON roundtrip stability, key-order independence, type discrimination (string vs number), array-concat collision resistance, and divergence from the legacy hash.
  • Fixture-based round-trip tests (fixtures_tests.rs + 6 TOML fixtures): lock the canonical hash + serialization round-trip against committed configs (kitchen sink, multi-variant, models with multiple providers, tools + metrics, etc.). Any future change to canonicalization rules will fail these fixtures rather than silently drifting the persisted hash.
  • 7 SnapshotHash type tests in tensorzero-types/src/snapshot.rs: scheme distinguishability, FromStr round-trip, bare-decimal helper output, Display/serde scheme preservation.

51 tests pass total. No DB needed.

What this PR does NOT do

  • Does not change Config.hash (still legacy TOML-bytes).
  • Does not change ConfigSnapshot::hash (still legacy).
  • Does not add or modify any DB column.
  • Does not change how any inference / feedback / datapoint row is written.
  • Does not apply the bare-decimal serializers to any DB-row struct.
  • Does not change any wire API for inference.

In production, after merging this PR, no code path produces a Canonical-scheme hash. The Display impl still emits bare decimal for every legacy hash that exists today — so nothing downstream observes a behavior change.

Tiny supporting fix

Two call sites in db/clickhouse/workflow_evaluation_queries.rs relied on the now-removed Deref<Target=str> impl on SnapshotHash (&**snapshot_hash). They're updated to use the explicit to_decimal_string() accessor — same byte output, no scheme prefix because the surrounding toUInt256OrNull(...) literal couldn't accept one anyway.

Roadmap context

Part 1 of the config-in-database series. The remaining PRs build directly on this foundation:

PR Scope
1 (this PR) Canonical hash machinery
2 PG config_snapshots JSONB + canonical_hash columns
3 CH parity for the canonical_hash column
4 Config.hash → canonical by default; inference / feedback / datapoint rows reference canonical
5a Per-object metadata table + writers
5b REST CRUD endpoints over per-object tables
6 Config-in-database boot mode + zero-config bootstrap + new CI lanes
7 Functions UI overhaul + ModelPicker + TOML-editor cleanup

Each PR ships an independently testable deliverable. This one is fully covered by unit tests; later PRs add e2e coverage as they introduce DB and HTTP surfaces.

Test plan

  • cargo test -p tensorzero-types snapshot:: — 7 tests on the type
  • cargo test -p tensorzero-core config::snapshot:: — 51 tests including the new fixture suite
  • cargo clippy --all-targets --all-features -- -D warnings — clean
  • cargo fmt --all -- --check — clean
  • No production code path produces a Canonical-scheme hash; legacy hash wire form bit-identical to main

🤖 Generated with Claude Code

Foundation for the config-in-database work: introduce a content-stable
structural hash (`SnapshotHash::Canonical`) that hashes `StoredConfig`
via canonical JSON encoding, alongside the existing canonical-TOML-bytes
hash (now tagged `LegacyToml`). Pure addition — no DB columns, no wire
format changes for legacy hashes, nothing on the inference path is
affected. Subsequent PRs introduce the storage columns, wire-up, and
default switch.

## Why

The legacy hash is `blake3` over canonical-TOML bytes. That's fragile in
ways that matter when config rows are persisted across:

- TOML crate version bumps (e.g. float reformatting `0.7` →
  `0.6999999988…`)
- Serializer / canonicalization rule changes
- Round-trips through the planned `config_jsonb` JSONB column (where
  the input is a `serde_json::Value`, not TOML bytes)

Any of these can drift the hash and silently invalidate every
`inferences.snapshot_hash` reference — which would break every
historical-config view in the UI and every audit query. The structural
hash decouples identity from textual encoding: same logical config →
same hash, regardless of which serializer round-trips it last.

## What this PR adds

- `SnapshotHashScheme { LegacyToml, Canonical }` enum.
- `SnapshotHash` carries a scheme tag; `Display` / `FromStr` / serde
  use a self-describing wire form: `can:DECIMAL` for canonical, bare
  decimal for legacy. Bare-decimal is the *unchanged* legacy wire form,
  so existing `inferences.snapshot_hash` rows, autopilot tags, log
  lines, and URLs continue to parse without modification.
- `from_biguint_canonical` / `from_canonical_bytes` constructors.
- `serialize_hash_bare_decimal` / `serialize_optional_hash_bare_decimal`
  helpers for DB-row structs whose backing storage cannot accept the
  prefix (CH `UInt256`, PG `NUMERIC(78,0)`). Defined here, applied by a
  later PR.
- `crates/tensorzero-core/src/config/snapshot/canonical_hash.rs` —
  the structural-hash implementation. Walks `serde_json::Value` with a
  self-describing tagged encoding (1-byte type tag + length-prefixed
  strings/arrays/objects + sorted object keys + IEEE 754 BE bit pattern
  for numbers).
- `StoredConfig::canonical_hash() -> Result<SnapshotHash>` — the entry
  point. Public but currently unused at runtime; tests are the only
  callers. The first runtime call site lands in the follow-up PR that
  adds the `canonical_hash` column to `config_snapshots`.
- 11 inline unit tests covering determinism, JSON / TOML / TOML→JSON
  roundtrip stability, key-order independence, type discrimination
  (string vs number), array-concat collision resistance, and divergence
  from the legacy hash.
- Fixture-based round-trip tests (`fixtures_tests.rs` + 6 TOML
  fixtures) that lock the canonical hash + serialization round-trip
  against committed configs (kitchen sink, multi-variant, models with
  multiple providers, tools + metrics, etc.). Any future change to
  canonicalization rules will fail these fixtures rather than silently
  drifting the persisted hash.

## What it does NOT do

- Does not change `Config.hash` (still legacy TOML-bytes).
- Does not change `ConfigSnapshot::hash` (still legacy).
- Does not add or modify any DB column.
- Does not change how any inference / feedback / datapoint row is
  written.
- Does not apply the bare-decimal serializers to any DB-row struct.
- Does not change any wire API for inference.

In production, after merging this PR, no code path produces a
`Canonical`-scheme hash. The `Display` impl still emits bare decimal
for every legacy hash that exists today — so nothing downstream
observes a behavior change.

## How to test

```
cargo test -p tensorzero-types snapshot::
cargo test -p tensorzero-core config::snapshot::
```

51 tests pass with no DB needed.

## Tiny supporting fix

Two call sites in `db/clickhouse/workflow_evaluation_queries.rs`
relied on the now-removed `Deref<Target=str>` impl on `SnapshotHash`
(`&**snapshot_hash`). They're updated to use the explicit
`to_decimal_string()` accessor — same byte output, no scheme prefix
because the surrounding `toUInt256OrNull(...)` literal couldn't accept
one anyway.

## Roadmap context

Part 1 of the config-in-database series:

- **PR 1 (this one)**: canonical hash machinery
- PR 2: PG `config_snapshots` JSONB + `canonical_hash` columns
- PR 3: CH parity for the `canonical_hash` column
- PR 4: `Config.hash` → canonical by default; inference / feedback /
  datapoint rows reference canonical
- PR 5a: Per-object metadata table + writers
- PR 5b: REST CRUD endpoints over per-object tables
- PR 6: Config-in-database boot mode + zero-config bootstrap + new CI
  lanes
- PR 7: Functions UI overhaul + ModelPicker + TOML-editor cleanup

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…zers

Review feedback on the canonical-hash machinery:

- Fix doc comments that claimed `v1:` / `v2:` Display prefixes — the
  actual wire form is bare decimal (legacy) and `can:` (canonical).
  Updated `SnapshotHashScheme` variant docs and the `SnapshotHash`
  module-level doc to match what `prefix()` returns.

- Replace `(x.len() as u64).to_be_bytes()` length prefixes in the
  canonical encoding with an explicit `len_u64` helper that uses
  `u64::try_from(...).expect(...)`. The canonical hash bytes are the
  persisted identity of every config snapshot, so a silently-truncating
  cast on a future >64-bit platform would be unrecoverable. Annotated
  with `#[expect(clippy::expect_used)]` and a reason explaining why
  the panic is the safe choice.

- Hand-roll `PartialEq`, `Eq`, and `Hash` on `SnapshotHash` so
  identity is `(scheme, bytes)` only. `decimal_str` is a derived cache
  of `bytes`, so including it was both ~78 bytes of redundant hashing
  per call and an implicit invariant that the cache and bytes agree
  (which the constructors enforce by convention but nothing else
  prevents). Added a regression test that constructs the same hash
  via two different paths and asserts both `eq` and `hash` agree.

- Group the two `#[serde(serialize_with = ...)]` opt-out helpers into
  `pub mod serializers` (`bare_decimal` / `optional_bare_decimal`).
  Removes two free functions from the crate root and gives the helpers
  a discoverable home with collective module docs. Updated the inline
  test's `serialize_with = "..."` path to match.

- Replace `Deserialize`'s `String::deserialize(...)` with a
  `Visitor<'_>` implementing `visit_str` (with `visit_string` as
  fallback). Borrowed deserializers (serde_json, sqlx) now skip the
  per-row `String` allocation; owning deserializers still work
  unchanged via `visit_string`. This is the per-inference-row read
  path, so the saving compounds at scale.

- Drop the unused `SnapshotHashScheme::name()`. Doc-commented use case
  ("log/error messages") has no callers in this PR — `pub` API on a
  published crate that has no users ages poorly. Will reintroduce in a
  later PR if a real call-site needs it.

- Bump `from_biguint_with_scheme` and `from_bytes_with_scheme` from
  module-private (`fn`) to `pub(crate)`. The intra-module test calls
  them today; raising visibility makes the testing intent explicit
  and lets future integration tests under `tests/` use them without
  another visibility bump.

- Tiny: `hex::encode(&*self.bytes)` → `hex::encode(self.as_bytes())`,
  `entries.sort_by(|(a,_),(b,_)| a.cmp(b))` → `sort_unstable_by_key`.

8 SnapshotHash type tests + 51 snapshot module tests pass; clippy clean
under workspace lints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the new scheme-aware SnapshotHash foundation for config-in-database work by introducing a canonical structural hash alongside the existing legacy TOML-byte hash, while keeping current runtime behavior on the legacy path.

Changes:

  • Added SnapshotHashScheme and updated SnapshotHash parsing/formatting/serde to distinguish legacy vs canonical hashes.
  • Introduced canonical structural hashing for StoredConfig via serde_json::Value traversal plus new unit tests and fixture-based round-trip tests.
  • Updated two ClickHouse workflow-evaluation query call sites to use bare decimal accessors after removing Deref<Target = str> from SnapshotHash.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crates/tensorzero-types/src/snapshot.rs Adds scheme-aware snapshot hashes, new constructors/accessors, serde/display parsing changes, and type tests.
crates/tensorzero-core/src/db/clickhouse/workflow_evaluation_queries.rs Replaces implicit string deref usage with to_decimal_string() in ClickHouse parameter binding.
crates/tensorzero-core/src/config/snapshot/mod.rs Wires in the new canonical hash module and fixture test module.
crates/tensorzero-core/src/config/snapshot/fixtures/tools_and_metrics.toml Adds fixture coverage for tools and metrics config shapes.
crates/tensorzero-core/src/config/snapshot/fixtures/multi_variant_types.toml Adds fixture coverage for multiple function variant types.
crates/tensorzero-core/src/config/snapshot/fixtures/models_multi_provider.toml Adds fixture coverage for multiple model and embedding providers.
crates/tensorzero-core/src/config/snapshot/fixtures/kitchen_sink.toml Adds broad mixed-section fixture coverage.
crates/tensorzero-core/src/config/snapshot/fixtures/empty.toml Adds empty-config baseline fixture.
crates/tensorzero-core/src/config/snapshot/fixtures/chat_function_unversioned.toml Adds baseline chat-function fixture for stable round-tripping.
crates/tensorzero-core/src/config/snapshot/fixtures_tests.rs Adds fixture-driven round-trip assertions over representative stored configs.
crates/tensorzero-core/src/config/snapshot/canonical_hash.rs Implements canonical structural hashing for StoredConfig and adds focused unit tests.

Comment thread crates/tensorzero-core/src/config/snapshot/canonical_hash.rs Outdated
Comment thread crates/tensorzero-core/src/config/snapshot/fixtures_tests.rs
Comment thread crates/tensorzero-core/src/db/clickhouse/workflow_evaluation_queries.rs Outdated
Comment thread crates/tensorzero-core/src/db/clickhouse/workflow_evaluation_queries.rs Outdated
…tale comments

Three Copilot review findings on PR #7419, all real:

1. **`u64` values above 2^53 silently aliased through `as_f64()`**.
   `Number::as_f64()` is lossy for any integer beyond f64's 53-bit
   mantissa, so two distinct `u64` values that share an `f64`
   representation hashed to the same canonical bytes. `StoredConfig`
   already has `u64` fields (timeouts, capacities), so this would have
   collided in production once a value crossed 9 quadrillion.

   Fix: encode integers exactly. The number sub-tag layout is now:
   - `NUMBER_KIND_INT`   + 8 bytes BE for `as_i64()` results
   - `NUMBER_KIND_UINT`  + 8 bytes BE for `as_u64()` results that
     didn't fit `i64` (the upper half of the unsigned range)
   - `NUMBER_KIND_FLOAT` + 8 bytes BE for fractional `f64` values
   - `NUMBER_KIND_STRING` + length + bytes for serde_json's
     `arbitrary_precision` fallback (we don't enable it; the
     encoding stays well-defined if anyone does)

   Each kind carries its own sub-tag, so an `i64`-shaped 1 and an
   `f64`-shaped 1.0 hash differently — that's correct, they're
   different types in `StoredConfig`. Two new regression tests:
   - `large_u64_values_do_not_collide_via_f64_truncation` (uses
     `2^53` and `2^53 + 1`, which `as_f64()` collapses)
   - `integer_and_float_with_same_magnitude_hash_differently`

2. **Fixture suite never asserted the hash bytes**, so a canonical
   encoding change that preserved JSON shape would silently drift
   every persisted hash. Added `assert_canonical_hash_matches` and
   committed expected hex values for all 6 fixtures (empty,
   chat_function_unversioned, multi_variant_types,
   models_multi_provider, tools_and_metrics, kitchen_sink). Any
   future change to canonicalization rules in
   `canonical_hash.rs` now fails these assertions explicitly,
   forcing the author to either revert the encoding change or
   update the expected hexes deliberately (and review the
   inference-row impact).

3. **Stale `vN:DECIMAL` comments in `workflow_evaluation_queries.rs`**.
   The actual `Display` impl emits `can:DECIMAL` for canonical hashes;
   the `vN:` form was never shipped. Updated both call-sites' comments
   to match, mirroring the snapshot.rs doc-comment fix already in the
   polish commit.

53 snapshot tests pass (was 51 before the regression tests). Clippy
clean under workspace lints; fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AntoineToussaint AntoineToussaint enabled auto-merge May 4, 2026 18:25
@virajmehta
Copy link
Copy Markdown
Member

can you add some documentation over how this is expected to handle schema changes?

AntoineToussaint added a commit that referenced this pull request May 4, 2026
…inators

The new PG e2e tests inherited TOML fixtures with `version = N` lines
(both function- and variant-level). `version` doesn't exist on
`UninitializedFunctionConfig` / `UninitializedVariantConfig` until the
per-object metadata PR (#5a) adds it, so these fixtures fail to parse
on this branch with `unknown field 'version'`.

Fixes:

- `config_jsonb_column_is_populated_on_write_postgres` — use
  `description = "v4"` on the function as the JSONB-roundtrip
  discriminator instead of `version = 4`. Drop the variant-level
  `version` assertion (variants don't have a `description` field, and
  this test only needs to verify "JSONB column matches in-memory
  config"; the path-level assertion is enough).

- `snapshots_with_path_value_finds_matching_function_version` —
  renamed to `..._function_description`. Replaces the `version`
  discriminator with `description` and updates the JSONPath query
  accordingly. Same shape, same coverage, same edge cases (multiple
  matches, single match, no matches, decoy on a different function
  name).

- `snapshots_containing_finds_variant_version_via_explicit_fragment` —
  switches variants from `version` to `temperature` (an `f32` field
  that already exists on `UninitializedVariantConfig`). The JSONB
  containment query semantics are identical for floats vs ints.

- Decorative `version = N` lines in 4 other tests just go (no
  assertion referenced them).

Also folds in the merge of PR #7419's tip (which itself merged main's
Mistral-test disable from PR #7418), so this branch picks up the
upstream-test fix that unblocks the CH live-tests rollup.

`cargo clippy --all-targets --all-features -- -D warnings` clean;
`cargo fmt --all -- --check` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per @virajmehta's review request: spell out exactly what drifts the
canonical hash, what doesn't, what drift means in production, and the
rules for changing `StoredConfig` safely.

Two main sections added to `canonical_hash.rs`'s module docs:

1. **Drift table.** Categorizes every kind of change to `StoredConfig`
   (add Option, add non-Option, rename, retype, etc.) as drifting or
   not drifting the hash, with one-line rationales. Calls out the
   non-obvious cases (key reorder doesn't drift because we sort; float
   format doesn't drift because we use IEEE bits; `Option<T>` with
   `skip_serializing_if` and a `None` default doesn't drift).

2. **What drift means in production.**
   - Old `config_snapshots` rows are unaffected — their persisted
     `canonical_hash` column was written by the old serializer and is
     preserved verbatim. The read path (`from_stored_with_hash`) pairs
     row + stored hash; it does NOT recompute.
   - Old rows look up by their old hash, not by recomputation.
   - New writes get the new hash; old + new are different rows for
     the "same" logical config — fine, they reflect what was seen.
   - JSONB containment queries operate over the stored shape, so
     historical search across a cutover requires care.

Plus a "Rules for changing `StoredConfig`" subsection that points at
`stored/AGENTS.md`, recommends purely-additive changes, and reminds
PRs to run the fixture suite (which now locks the hash bytes — any
unintended drift fails CI).

No code change. Documentation-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AntoineToussaint
Copy link
Copy Markdown
Member Author

Good call — added schema-change docs in 61652f3. Covers the drift table (additive Option fields don't drift; renames / retypes / non-Option additions / mapping changes do), what drift actually means in production (old rows unaffected because the read path pairs row + stored hash via from_stored_with_hash and never recomputes; new writes get the new hash; old and new live as separate rows), and the rules for changing StoredConfig (purely additive when possible, run the fixture suite which now locks the hash bytes, follow stored/AGENTS.md). The doc also explains why we rely on "the stored hash is authoritative" rather than automatic migration of historical hashes — that model survives any schema evolution as long as old rows look up by their stored hash.

@Aaron1011
Copy link
Copy Markdown
Member

Any of these can drift the hash and silently invalidate every inferences.snapshot_hash reference — which would break every historical-config view in the UI and every audit query that joins inferences back to the config they ran against.

My understanding was that the config-snapshot writing and the inferences.snapshot_hash will always be performed from the same gateway (the snapshot could already exist from another gateway). If the gateway starts up and computes a slightly different hash (e.g. due to a new gateway having a chaned toml crate version), won't that have the same effect as a deliberate config modification by the user?

The opening claimed that hash drift "invalidates every
inferences.snapshot_hash reference" — that's wrong. As Aaron pointed
out: drift just writes a new snapshot row, equivalent to the user
editing the config file. Old rows keep their old hashes, old
inference references keep resolving via legacy-hash lookup, nothing
breaks.

Rewrite the opening to lead with what drift actually costs:

  1. Content-addressed dedupe — two gateway versions running the same
     logical config should write one row, not one per toml-crate
     version.
  2. Multi-gateway consistency — two gateways with different transitive
     dependencies running the same config against the same DB should
     agree on the hash.
  3. JSONB roundtrip identity — once `config_jsonb` is in the schema,
     re-deriving the hash from the stored JSON must match the stored
     hash, otherwise verification flows can't tell whether they got
     the same content back.

The structural hash addresses all three by operating on logical
content rather than serialized text.

The "what drift means in production" section below already had this
right (points 1-3 explicitly cover "old rows unaffected, new writes
get new hash, both live as separate rows for the same logical
config"); the opening just had a misleading lead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AntoineToussaint
Copy link
Copy Markdown
Member Author

@Aaron1011 you're right — drift just writes a new row, equivalent to the user editing the config; doesn't invalidate anything. Updated the doc framing in 9fe823723.

The real motivation is upstream of drift. Once we settled on storing the config as a structured JSON document (config_jsonb in #7422, plus the upcoming DB-config mode where the persisted form has no TOML at all), defining the hash over the structure is the obvious choice — and the easiest one to keep correct. The legacy basis goes Stored → toml::to_string → text → sort/canonicalize → blake3, which is a long path through a textual format we're moving away from anyway, with several spots a bug can hide (float reformatting, key-order normalization, escaping rules). The structural hash is Stored → serde_json::Value → walk the tree with one tight encoding → blake3. Same content, fewer steps, easier to reason about.

So drift-handling is a side benefit, not the motivation. The motivation is "we're storing structure, so we should hash structure."

AntoineToussaint and others added 3 commits May 4, 2026 17:12
CI reports the empty-fixture canonical hash drifts on Linux x86_64 but
matches on macOS (the hex was committed from a Mac dev machine). The
expected drift sources (Cargo.lock, StoredConfig shape, canonical
encoding rules) are all identical between local and CI. Push an
eprintln of the actual `serde_json::to_value(&empty StoredConfig)` so
the CI failure shows what Linux is producing — then I can reverse the
divergence vs the Mac dump.

Will revert once the cause is found.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dition

PR #7413 (`Propagate status code in FatalStreamError`) landed on main
and added `global_outbound_http_intra_stream_read_timeout_ms: Option<u64>`
to `StoredGatewayConfig`. With `Option::None` serializing as `null` (no
`skip_serializing_if`), the empty/default StoredConfig JSON gains a new
field — and every fixture's canonical hash shifts.

This is the fixture suite **working as designed**: a real schema change
on main drifted the hash, and CI's hash-stability assertions caught it
explicitly rather than letting it land silently. The dual-dispatch
read path means old `inferences.snapshot_hash` references continue to
resolve via the legacy hash column unchanged; only NEW canonical-hash
references will use the post-drift values.

Refreshed all 6 expected hashes to match the current `StoredConfig`
shape (locally + matches CI Linux Docker output bit-for-bit). Also
reverts the TEMP `EMPTY_JSON=…` eprintln debug in `fixture_empty`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@virajmehta
Copy link
Copy Markdown
Member

OK LGTM

@AntoineToussaint AntoineToussaint added this pull request to the merge queue May 4, 2026
Merged via the queue into main with commit cfebce3 May 4, 2026
83 checks passed
@AntoineToussaint AntoineToussaint deleted the pr/canonical-hash-snapshot branch May 4, 2026 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants