# Arborix Portable Bundle v1 (CBOR Manifest Profile) Status: **Draft, implementation-aligned** (derived from `src/Wire.hs` as of 2026-05-07) This document specifies the **actual on-wire format and validation behavior** currently implemented by `tricu` for Arborix bundles, with a focus on the newer CBOR manifest path. --- ## 1. Scope This profile defines: 1. The binary container envelope (header + section directory + section payloads). 2. The CBOR manifest section format. 3. The Merkle node section format. 4. Decode/verify/import behavior in `Wire.hs`. 5. Known gaps and sane resolutions. Non-goals: - tricu source parsing/lambda elimination/module semantics. - Signature systems / trust policy. - Compression codecs beyond `none`. --- ## 2. Container format A bundle is a byte stream: ``` [32-byte header] [section directory: section_count * 60 bytes] [section payload bytes...] ``` ### 2.1 Header (32 bytes) | Field | Size | Encoding | Value / Notes | |---|---:|---|---| | Magic | 8 | raw bytes | `41 52 42 4f 52 49 58 00` (`"ARBORIX\0"`) | | Major | 2 | u16 BE | Must be `1` | | Minor | 2 | u16 BE | Currently `0` | | SectionCount | 4 | u32 BE | Number of section directory entries | | Flags | 8 | u64 BE | Currently emitted as `0`; not interpreted | | DirectoryOffset | 8 | u64 BE | Offset of section directory (currently `32`) | Reader behavior: - Reject if total bytes < 32. - Reject bad magic. - Reject major != 1. ### 2.2 Section directory entry (60 bytes each) | Field | Size | Encoding | Notes | |---|---:|---|---| | Type | 4 | u32 BE | e.g. 1=manifest, 2=nodes | | Version | 2 | u16 BE | Currently emitted as `1`; not enforced on read | | Flags | 2 | u16 BE | bit0 = critical | | Compression | 2 | u16 BE | `0` = none (required) | | DigestAlgorithm | 2 | u16 BE | `1` = SHA-256 (required) | | Offset | 8 | u64 BE | Absolute byte offset | | Length | 8 | u64 BE | Section payload length | | Digest | 32 | raw bytes | SHA-256 of section bytes | Reader behavior: - Reject unknown **critical** section types. - Reject compression != 0. - Reject digest algorithm != 1. - Reject out-of-bounds sections. - Reject digest mismatch. ### 2.3 Required section types | Type | Name | Required | |---:|---|---| | 1 | manifest | yes | | 2 | nodes | yes | Decode currently rejects duplicate section type 1 or 2. --- ## 3. Manifest section (CBOR) Manifest bytes are CBOR-encoded map data (using `cborg`). ### 3.1 Top-level manifest schema Top-level map has **exactly 8 keys** in this exact decode order in current implementation: 1. `schema` (text) 2. `bundleType` (text) 3. `tree` (map) 4. `runtime` (map) 5. `closure` (text: `"complete"|"partial"`) 6. `roots` (array) 7. `exports` (array) 8. `metadata` (map) > Important: Current decoder is order-strict; it expects keys in this sequence. ### 3.2 Nested structures #### `tree` map (3 keys, order-strict) - `calculus`: text - `nodeHash`: map - `nodePayload`: text `nodeHash` map (2 keys, order-strict): - `algorithm`: text - `domain`: text #### `runtime` map (4 keys, order-strict) - `semantics`: text - `evaluation`: text - `abi`: text - `capabilities`: array(text) #### `roots` array of maps Each root map has 2 keys (order-strict): - `hash`: bytes (raw 32-byte hash payload encoded as CBOR byte string) - `role`: text #### `exports` array of maps Each export map has 4 keys (order-strict): - `name`: text - `root`: bytes (32-byte hash) - `kind`: text - `abi`: text #### `metadata` map Flexible key set; decoded as map(text -> text), then projected into optional fields: - `package` - `version` - `description` - `license` - `createdBy` Unknown metadata keys are ignored. ### 3.3 Default emitted manifest values Writers in `Wire.hs` currently emit: - `schema = "arborix.bundle.manifest.v1"` - `bundleType = "tree-calculus-executable-object"` - `tree.calculus = "tree-calculus.v1"` - `tree.nodeHash.algorithm = "sha256"` - `tree.nodeHash.domain = "arborix.merkle.node.v1"` - `tree.nodePayload = "arborix.merkle.payload.v1"` - `runtime.semantics = "tree-calculus.v1"` - `runtime.evaluation = "normal-order"` - `runtime.abi = "arborix.abi.tree.v1"` - `runtime.capabilities = []` - `closure = "complete"` - `metadata.createdBy = "arborix"` --- ## 4. Nodes section (binary) Node section payload layout: ``` node_count: u64 BE repeat node_count times: hash: 32 bytes payload_len: u32 BE payload: payload_len bytes ``` Node payload grammar: - `0x00` => Leaf - `0x01 || child_hash(32)` => Stem - `0x02 || left_hash(32)||right(32)` => Fork Section decoder rejects: - duplicate node hashes, - truncated entries, - payload overruns, - trailing bytes after final node. --- ## 5. Verification behavior (`verifyBundle`) `verifyBundle` enforces all of: 1. bundle version >= 1. 2. bundle has at least one node. 3. manifest constants match hardcoded v1 values (schema/type/calculus/hash algo/domain/payload/runtime semantics/ABI). 4. runtime capabilities must be empty. 5. closure must be `complete`. 6. manifest has at least one root and one export. 7. root sets in `bundleRoots` and `manifest.roots` must match exactly. 8. each root and export root exists in node map. 9. each node payload deserializes and re-hashes to declared node hash. 10. all referenced child hashes exist. 11. full closure reachability from roots succeeds. `importBundle` runs decode + verify before storing nodes. --- ## 6. Export/import semantics ### 6.1 Export `exportNamedBundle`: - Traverses reachable nodes for each requested root hash. - Builds node map. - Builds default manifest and CBOR bytes. - Emits two sections (manifest, nodes). `exportBundle` auto-names exports: - 1 root => `root` - N>1 => `root0`, `root1`, ... ### 6.2 Import `importBundle`: 1. Decode bundle. 2. Verify bundle. 3. Insert all node payloads into content store. 4. For each manifest export: reconstruct tree by export root and store name binding in DB. 5. Return bundle root list. --- ## 7. Determinism properties Current implementation is deterministic for identical logical input because: - Node map serialized in ascending hash order (`Map.toAscList`). - Field order in manifest encoding is fixed by code. - Section ordering is fixed: manifest then nodes. So repeated exports of same roots produce byte-identical bundles. --- ## 8. Known gaps and sane resolutions These are important design gaps visible from current code. ### Gap A: Node hash domain mismatch risk (critical) Status: **resolved in current codebase**. What was wrong: - Manifest declared `tree.nodeHash.domain = "arborix.merkle.node.v1"`. - Hashing implementation previously used `"tricu.merkle.node.v1"`. Current state: - Haskell hashing now uses `"arborix.merkle.node.v1"`. - JS reference runtime hashing now uses `"arborix.merkle.node.v1"`. - JS manifest validation now requires `"arborix.merkle.node.v1"`. Remaining recommendation: - Keep hash-domain constants centralized/shared to prevent future drift. - Add explicit test vectors for Leaf/Stem/Fork hashes under the Arborix domain. ### Gap B: CBOR decode is order-strict, not generic-map tolerant Observed: - Decoder expects exact key order for most maps. Impact: - Another canonical CBOR writer that reorders keys may decode-fail even if semantically equivalent. Sane resolution: - For v1 compatibility, decode maps as unordered key/value collections, require key presence and types, and reject unknown keys only where desired. - Keep writer deterministic, but relax reader. ### Gap C: “Canonical CBOR” claim is stronger than implementation Observed: - Writer uses fixed order but does not explicitly sort keys per RFC 8949 canonical ordering rules. Sane resolution: - Either (a) rename as “deterministic CBOR” profile, or (b) implement explicit canonical key ordering and canonical-length/minimal integer forms checks. ### Gap D: Extra section preservation Observed: - Decoder tolerates unknown non-critical sections, but `Bundle` model/encoder drops them on re-encode. Sane resolution: - Add `bundleExtraSections :: [SectionEntry+Bytes]` if round-trip preservation is desired. ### Gap E: Section version not enforced Observed: - Section entry `Version` is parsed but unused. Sane resolution: - Enforce known version matrix (e.g., manifest v1, nodes v1), or explicitly document “advisory only”. ### Gap F: Runtime capability policy is hard fail Observed: - Any non-empty capabilities list is rejected. Sane resolution: - Keep strict for now, but define capability negotiation strategy for v1.1+ (unknown capabilities => reject unless explicitly allowed by host policy). ### Gap G: Error handling style in import/export path Observed: - Several paths throw `error` for malformed data/store misses. Sane resolution: - Return `Either`-style typed errors through public API (`decode`, `verify`, `import`), reserve exceptions for truly internal faults. --- ## 9. Conformance checklist (v1 current) A conforming v1 reader/writer for this profile should: - Implement the 32-byte header and 60-byte section records exactly. - Support required sections 1 and 2. - Verify section digests with SHA-256. - Decode/encode manifest CBOR matching the field model above. - Parse nodes section and validate node payload structure. - Recompute and verify node hashes. - Enforce complete closure for roots. - Enforce manifest/runtime constants used by v1. --- ## 10. Suggested follow-up docs To stabilize interoperability, add: 1. `docs/arborix-bundle-test-vectors.md` (golden header/manifest/nodes + expected hashes). 2. `docs/arborix-bundle-errors.md` (normative error codes/strings). 3. `docs/arborix-bundle-evolution.md` (rules for minor/major upgrades, capability negotiation, extra sections).