Files
tricu/docs/arborix-bundle-cbor-v1.md
James Eversole e3117e3ac8 Switch manifest serialization to CBOR
Replace JSON-based bundle manifest with a CBOR-encoded format. The manifest
is now a canonical CBOR map with order-strict key decoding, raw 32-byte hash
payloads (instead of hex-encoded JSON), and compact binary representation.
2026-05-07 21:41:50 -05:00

9.5 KiB

Arborix Portable Bundle v1 (CBOR Manifest Profile)

Status: Draft, implementation-aligned (derived from src/Wire.hs as of 2026-05-07)

This document specifies the actual on-wire format and validation behavior currently implemented by tricu for Arborix bundles, with a focus on the newer CBOR manifest path.


1. Scope

This profile defines:

  1. The binary container envelope (header + section directory + section payloads).
  2. The CBOR manifest section format.
  3. The Merkle node section format.
  4. Decode/verify/import behavior in Wire.hs.
  5. Known gaps and sane resolutions.

Non-goals:

  • tricu source parsing/lambda elimination/module semantics.
  • Signature systems / trust policy.
  • Compression codecs beyond none.

2. Container format

A bundle is a byte stream:

[32-byte header]
[section directory: section_count * 60 bytes]
[section payload bytes...]

2.1 Header (32 bytes)

Field Size Encoding Value / Notes
Magic 8 raw bytes 41 52 42 4f 52 49 58 00 ("ARBORIX\0")
Major 2 u16 BE Must be 1
Minor 2 u16 BE Currently 0
SectionCount 4 u32 BE Number of section directory entries
Flags 8 u64 BE Currently emitted as 0; not interpreted
DirectoryOffset 8 u64 BE Offset of section directory (currently 32)

Reader behavior:

  • Reject if total bytes < 32.
  • Reject bad magic.
  • Reject major != 1.

2.2 Section directory entry (60 bytes each)

Field Size Encoding Notes
Type 4 u32 BE e.g. 1=manifest, 2=nodes
Version 2 u16 BE Currently emitted as 1; not enforced on read
Flags 2 u16 BE bit0 = critical
Compression 2 u16 BE 0 = none (required)
DigestAlgorithm 2 u16 BE 1 = SHA-256 (required)
Offset 8 u64 BE Absolute byte offset
Length 8 u64 BE Section payload length
Digest 32 raw bytes SHA-256 of section bytes

Reader behavior:

  • Reject unknown critical section types.
  • Reject compression != 0.
  • Reject digest algorithm != 1.
  • Reject out-of-bounds sections.
  • Reject digest mismatch.

2.3 Required section types

Type Name Required
1 manifest yes
2 nodes yes

Decode currently rejects duplicate section type 1 or 2.


3. Manifest section (CBOR)

Manifest bytes are CBOR-encoded map data (using cborg).

3.1 Top-level manifest schema

Top-level map has exactly 8 keys in this exact decode order in current implementation:

  1. schema (text)
  2. bundleType (text)
  3. tree (map)
  4. runtime (map)
  5. closure (text: "complete"|"partial")
  6. roots (array)
  7. exports (array)
  8. metadata (map)

Important: Current decoder is order-strict; it expects keys in this sequence.

3.2 Nested structures

tree map (3 keys, order-strict)

  • calculus: text
  • nodeHash: map
  • nodePayload: text

nodeHash map (2 keys, order-strict):

  • algorithm: text
  • domain: text

runtime map (4 keys, order-strict)

  • semantics: text
  • evaluation: text
  • abi: text
  • capabilities: array(text)

roots array of maps

Each root map has 2 keys (order-strict):

  • hash: bytes (raw 32-byte hash payload encoded as CBOR byte string)
  • role: text

exports array of maps

Each export map has 4 keys (order-strict):

  • name: text
  • root: bytes (32-byte hash)
  • kind: text
  • abi: text

metadata map

Flexible key set; decoded as map(text -> text), then projected into optional fields:

  • package
  • version
  • description
  • license
  • createdBy

Unknown metadata keys are ignored.

3.3 Default emitted manifest values

Writers in Wire.hs currently emit:

  • schema = "arborix.bundle.manifest.v1"
  • bundleType = "tree-calculus-executable-object"
  • tree.calculus = "tree-calculus.v1"
  • tree.nodeHash.algorithm = "sha256"
  • tree.nodeHash.domain = "arborix.merkle.node.v1"
  • tree.nodePayload = "arborix.merkle.payload.v1"
  • runtime.semantics = "tree-calculus.v1"
  • runtime.evaluation = "normal-order"
  • runtime.abi = "arborix.abi.tree.v1"
  • runtime.capabilities = []
  • closure = "complete"
  • metadata.createdBy = "arborix"

4. Nodes section (binary)

Node section payload layout:

node_count: u64 BE
repeat node_count times:
  hash: 32 bytes
  payload_len: u32 BE
  payload: payload_len bytes

Node payload grammar:

  • 0x00 => Leaf
  • 0x01 || child_hash(32) => Stem
  • 0x02 || left_hash(32)||right(32) => Fork

Section decoder rejects:

  • duplicate node hashes,
  • truncated entries,
  • payload overruns,
  • trailing bytes after final node.

5. Verification behavior (verifyBundle)

verifyBundle enforces all of:

  1. bundle version >= 1.
  2. bundle has at least one node.
  3. manifest constants match hardcoded v1 values (schema/type/calculus/hash algo/domain/payload/runtime semantics/ABI).
  4. runtime capabilities must be empty.
  5. closure must be complete.
  6. manifest has at least one root and one export.
  7. root sets in bundleRoots and manifest.roots must match exactly.
  8. each root and export root exists in node map.
  9. each node payload deserializes and re-hashes to declared node hash.
  10. all referenced child hashes exist.
  11. full closure reachability from roots succeeds.

importBundle runs decode + verify before storing nodes.


6. Export/import semantics

6.1 Export

exportNamedBundle:

  • Traverses reachable nodes for each requested root hash.
  • Builds node map.
  • Builds default manifest and CBOR bytes.
  • Emits two sections (manifest, nodes).

exportBundle auto-names exports:

  • 1 root => root
  • N>1 => root0, root1, ...

6.2 Import

importBundle:

  1. Decode bundle.
  2. Verify bundle.
  3. Insert all node payloads into content store.
  4. For each manifest export: reconstruct tree by export root and store name binding in DB.
  5. Return bundle root list.

7. Determinism properties

Current implementation is deterministic for identical logical input because:

  • Node map serialized in ascending hash order (Map.toAscList).
  • Field order in manifest encoding is fixed by code.
  • Section ordering is fixed: manifest then nodes.

So repeated exports of same roots produce byte-identical bundles.


8. Known gaps and sane resolutions

These are important design gaps visible from current code.

Gap A: Node hash domain mismatch risk (critical)

Status: resolved in current codebase.

What was wrong:

  • Manifest declared tree.nodeHash.domain = "arborix.merkle.node.v1".
  • Hashing implementation previously used "tricu.merkle.node.v1".

Current state:

  • Haskell hashing now uses "arborix.merkle.node.v1".
  • JS reference runtime hashing now uses "arborix.merkle.node.v1".
  • JS manifest validation now requires "arborix.merkle.node.v1".

Remaining recommendation:

  • Keep hash-domain constants centralized/shared to prevent future drift.
  • Add explicit test vectors for Leaf/Stem/Fork hashes under the Arborix domain.

Gap B: CBOR decode is order-strict, not generic-map tolerant

Observed:

  • Decoder expects exact key order for most maps.

Impact:

  • Another canonical CBOR writer that reorders keys may decode-fail even if semantically equivalent.

Sane resolution:

  • For v1 compatibility, decode maps as unordered key/value collections, require key presence and types, and reject unknown keys only where desired.
  • Keep writer deterministic, but relax reader.

Gap C: “Canonical CBOR” claim is stronger than implementation

Observed:

  • Writer uses fixed order but does not explicitly sort keys per RFC 8949 canonical ordering rules.

Sane resolution:

  • Either (a) rename as “deterministic CBOR” profile, or (b) implement explicit canonical key ordering and canonical-length/minimal integer forms checks.

Gap D: Extra section preservation

Observed:

  • Decoder tolerates unknown non-critical sections, but Bundle model/encoder drops them on re-encode.

Sane resolution:

  • Add bundleExtraSections :: [SectionEntry+Bytes] if round-trip preservation is desired.

Gap E: Section version not enforced

Observed:

  • Section entry Version is parsed but unused.

Sane resolution:

  • Enforce known version matrix (e.g., manifest v1, nodes v1), or explicitly document “advisory only”.

Gap F: Runtime capability policy is hard fail

Observed:

  • Any non-empty capabilities list is rejected.

Sane resolution:

  • Keep strict for now, but define capability negotiation strategy for v1.1+ (unknown capabilities => reject unless explicitly allowed by host policy).

Gap G: Error handling style in import/export path

Observed:

  • Several paths throw error for malformed data/store misses.

Sane resolution:

  • Return Either-style typed errors through public API (decode, verify, import), reserve exceptions for truly internal faults.

9. Conformance checklist (v1 current)

A conforming v1 reader/writer for this profile should:

  • Implement the 32-byte header and 60-byte section records exactly.
  • Support required sections 1 and 2.
  • Verify section digests with SHA-256.
  • Decode/encode manifest CBOR matching the field model above.
  • Parse nodes section and validate node payload structure.
  • Recompute and verify node hashes.
  • Enforce complete closure for roots.
  • Enforce manifest/runtime constants used by v1.

10. Suggested follow-up docs

To stabilize interoperability, add:

  1. docs/arborix-bundle-test-vectors.md (golden header/manifest/nodes + expected hashes).
  2. docs/arborix-bundle-errors.md (normative error codes/strings).
  3. docs/arborix-bundle-evolution.md (rules for minor/major upgrades, capability negotiation, extra sections).