Replace JSON-based bundle manifest with a CBOR-encoded format. The manifest is now a canonical CBOR map with order-strict key decoding, raw 32-byte hash payloads (instead of hex-encoded JSON), and compact binary representation.
9.5 KiB
Arborix Portable Bundle v1 (CBOR Manifest Profile)
Status: Draft, implementation-aligned (derived from src/Wire.hs as of 2026-05-07)
This document specifies the actual on-wire format and validation behavior currently implemented by tricu for Arborix bundles, with a focus on the newer CBOR manifest path.
1. Scope
This profile defines:
- The binary container envelope (header + section directory + section payloads).
- The CBOR manifest section format.
- The Merkle node section format.
- Decode/verify/import behavior in
Wire.hs. - Known gaps and sane resolutions.
Non-goals:
- tricu source parsing/lambda elimination/module semantics.
- Signature systems / trust policy.
- Compression codecs beyond
none.
2. Container format
A bundle is a byte stream:
[32-byte header]
[section directory: section_count * 60 bytes]
[section payload bytes...]
2.1 Header (32 bytes)
| Field | Size | Encoding | Value / Notes |
|---|---|---|---|
| Magic | 8 | raw bytes | 41 52 42 4f 52 49 58 00 ("ARBORIX\0") |
| Major | 2 | u16 BE | Must be 1 |
| Minor | 2 | u16 BE | Currently 0 |
| SectionCount | 4 | u32 BE | Number of section directory entries |
| Flags | 8 | u64 BE | Currently emitted as 0; not interpreted |
| DirectoryOffset | 8 | u64 BE | Offset of section directory (currently 32) |
Reader behavior:
- Reject if total bytes < 32.
- Reject bad magic.
- Reject major != 1.
2.2 Section directory entry (60 bytes each)
| Field | Size | Encoding | Notes |
|---|---|---|---|
| Type | 4 | u32 BE | e.g. 1=manifest, 2=nodes |
| Version | 2 | u16 BE | Currently emitted as 1; not enforced on read |
| Flags | 2 | u16 BE | bit0 = critical |
| Compression | 2 | u16 BE | 0 = none (required) |
| DigestAlgorithm | 2 | u16 BE | 1 = SHA-256 (required) |
| Offset | 8 | u64 BE | Absolute byte offset |
| Length | 8 | u64 BE | Section payload length |
| Digest | 32 | raw bytes | SHA-256 of section bytes |
Reader behavior:
- Reject unknown critical section types.
- Reject compression != 0.
- Reject digest algorithm != 1.
- Reject out-of-bounds sections.
- Reject digest mismatch.
2.3 Required section types
| Type | Name | Required |
|---|---|---|
| 1 | manifest | yes |
| 2 | nodes | yes |
Decode currently rejects duplicate section type 1 or 2.
3. Manifest section (CBOR)
Manifest bytes are CBOR-encoded map data (using cborg).
3.1 Top-level manifest schema
Top-level map has exactly 8 keys in this exact decode order in current implementation:
schema(text)bundleType(text)tree(map)runtime(map)closure(text:"complete"|"partial")roots(array)exports(array)metadata(map)
Important: Current decoder is order-strict; it expects keys in this sequence.
3.2 Nested structures
tree map (3 keys, order-strict)
calculus: textnodeHash: mapnodePayload: text
nodeHash map (2 keys, order-strict):
algorithm: textdomain: text
runtime map (4 keys, order-strict)
semantics: textevaluation: textabi: textcapabilities: array(text)
roots array of maps
Each root map has 2 keys (order-strict):
hash: bytes (raw 32-byte hash payload encoded as CBOR byte string)role: text
exports array of maps
Each export map has 4 keys (order-strict):
name: textroot: bytes (32-byte hash)kind: textabi: text
metadata map
Flexible key set; decoded as map(text -> text), then projected into optional fields:
packageversiondescriptionlicensecreatedBy
Unknown metadata keys are ignored.
3.3 Default emitted manifest values
Writers in Wire.hs currently emit:
schema = "arborix.bundle.manifest.v1"bundleType = "tree-calculus-executable-object"tree.calculus = "tree-calculus.v1"tree.nodeHash.algorithm = "sha256"tree.nodeHash.domain = "arborix.merkle.node.v1"tree.nodePayload = "arborix.merkle.payload.v1"runtime.semantics = "tree-calculus.v1"runtime.evaluation = "normal-order"runtime.abi = "arborix.abi.tree.v1"runtime.capabilities = []closure = "complete"metadata.createdBy = "arborix"
4. Nodes section (binary)
Node section payload layout:
node_count: u64 BE
repeat node_count times:
hash: 32 bytes
payload_len: u32 BE
payload: payload_len bytes
Node payload grammar:
0x00=> Leaf0x01 || child_hash(32)=> Stem0x02 || left_hash(32)||right(32)=> Fork
Section decoder rejects:
- duplicate node hashes,
- truncated entries,
- payload overruns,
- trailing bytes after final node.
5. Verification behavior (verifyBundle)
verifyBundle enforces all of:
- bundle version >= 1.
- bundle has at least one node.
- manifest constants match hardcoded v1 values (schema/type/calculus/hash algo/domain/payload/runtime semantics/ABI).
- runtime capabilities must be empty.
- closure must be
complete. - manifest has at least one root and one export.
- root sets in
bundleRootsandmanifest.rootsmust match exactly. - each root and export root exists in node map.
- each node payload deserializes and re-hashes to declared node hash.
- all referenced child hashes exist.
- full closure reachability from roots succeeds.
importBundle runs decode + verify before storing nodes.
6. Export/import semantics
6.1 Export
exportNamedBundle:
- Traverses reachable nodes for each requested root hash.
- Builds node map.
- Builds default manifest and CBOR bytes.
- Emits two sections (manifest, nodes).
exportBundle auto-names exports:
- 1 root =>
root - N>1 =>
root0,root1, ...
6.2 Import
importBundle:
- Decode bundle.
- Verify bundle.
- Insert all node payloads into content store.
- For each manifest export: reconstruct tree by export root and store name binding in DB.
- Return bundle root list.
7. Determinism properties
Current implementation is deterministic for identical logical input because:
- Node map serialized in ascending hash order (
Map.toAscList). - Field order in manifest encoding is fixed by code.
- Section ordering is fixed: manifest then nodes.
So repeated exports of same roots produce byte-identical bundles.
8. Known gaps and sane resolutions
These are important design gaps visible from current code.
Gap A: Node hash domain mismatch risk (critical)
Status: resolved in current codebase.
What was wrong:
- Manifest declared
tree.nodeHash.domain = "arborix.merkle.node.v1". - Hashing implementation previously used
"tricu.merkle.node.v1".
Current state:
- Haskell hashing now uses
"arborix.merkle.node.v1". - JS reference runtime hashing now uses
"arborix.merkle.node.v1". - JS manifest validation now requires
"arborix.merkle.node.v1".
Remaining recommendation:
- Keep hash-domain constants centralized/shared to prevent future drift.
- Add explicit test vectors for Leaf/Stem/Fork hashes under the Arborix domain.
Gap B: CBOR decode is order-strict, not generic-map tolerant
Observed:
- Decoder expects exact key order for most maps.
Impact:
- Another canonical CBOR writer that reorders keys may decode-fail even if semantically equivalent.
Sane resolution:
- For v1 compatibility, decode maps as unordered key/value collections, require key presence and types, and reject unknown keys only where desired.
- Keep writer deterministic, but relax reader.
Gap C: “Canonical CBOR” claim is stronger than implementation
Observed:
- Writer uses fixed order but does not explicitly sort keys per RFC 8949 canonical ordering rules.
Sane resolution:
- Either (a) rename as “deterministic CBOR” profile, or (b) implement explicit canonical key ordering and canonical-length/minimal integer forms checks.
Gap D: Extra section preservation
Observed:
- Decoder tolerates unknown non-critical sections, but
Bundlemodel/encoder drops them on re-encode.
Sane resolution:
- Add
bundleExtraSections :: [SectionEntry+Bytes]if round-trip preservation is desired.
Gap E: Section version not enforced
Observed:
- Section entry
Versionis parsed but unused.
Sane resolution:
- Enforce known version matrix (e.g., manifest v1, nodes v1), or explicitly document “advisory only”.
Gap F: Runtime capability policy is hard fail
Observed:
- Any non-empty capabilities list is rejected.
Sane resolution:
- Keep strict for now, but define capability negotiation strategy for v1.1+ (unknown capabilities => reject unless explicitly allowed by host policy).
Gap G: Error handling style in import/export path
Observed:
- Several paths throw
errorfor malformed data/store misses.
Sane resolution:
- Return
Either-style typed errors through public API (decode,verify,import), reserve exceptions for truly internal faults.
9. Conformance checklist (v1 current)
A conforming v1 reader/writer for this profile should:
- Implement the 32-byte header and 60-byte section records exactly.
- Support required sections 1 and 2.
- Verify section digests with SHA-256.
- Decode/encode manifest CBOR matching the field model above.
- Parse nodes section and validate node payload structure.
- Recompute and verify node hashes.
- Enforce complete closure for roots.
- Enforce manifest/runtime constants used by v1.
10. Suggested follow-up docs
To stabilize interoperability, add:
docs/arborix-bundle-test-vectors.md(golden header/manifest/nodes + expected hashes).docs/arborix-bundle-errors.md(normative error codes/strings).docs/arborix-bundle-evolution.md(rules for minor/major upgrades, capability negotiation, extra sections).