Arborix -> Arboricx rename

This commit is contained in:
2026-05-08 09:12:20 -05:00
parent e3117e3ac8
commit 343ecbf4c4
29 changed files with 315 additions and 324 deletions

View File

@@ -0,0 +1,339 @@
# Arboricx Portable Bundle v1 (CBOR Manifest Profile)
Status: **Draft, implementation-aligned** (derived from `src/Wire.hs` as of 2026-05-07)
This document specifies the **actual on-wire format and validation behavior** currently implemented by `tricu` for Arboricx bundles, with a focus on the newer CBOR manifest path.
---
## 1. Scope
This profile defines:
1. The binary container envelope (header + section directory + section payloads).
2. The CBOR manifest section format.
3. The Merkle node section format.
4. Decode/verify/import behavior in `Wire.hs`.
5. Known gaps and sane resolutions.
Non-goals:
- tricu source parsing/lambda elimination/module semantics.
- Signature systems / trust policy.
- Compression codecs beyond `none`.
---
## 2. Container format
A bundle is a byte stream:
```
[32-byte header]
[section directory: section_count * 60 bytes]
[section payload bytes...]
```
### 2.1 Header (32 bytes)
| Field | Size | Encoding | Value / Notes |
|---|---:|---|---|
| Magic | 8 | raw bytes | `41 52 42 4f 52 49 58 00` (`"ARBORICX"`) |
| Major | 2 | u16 BE | Must be `1` |
| Minor | 2 | u16 BE | Currently `0` |
| SectionCount | 4 | u32 BE | Number of section directory entries |
| Flags | 8 | u64 BE | Currently emitted as `0`; not interpreted |
| DirectoryOffset | 8 | u64 BE | Offset of section directory (currently `32`) |
Reader behavior:
- Reject if total bytes < 32.
- Reject bad magic.
- Reject major != 1.
### 2.2 Section directory entry (60 bytes each)
| Field | Size | Encoding | Notes |
|---|---:|---|---|
| Type | 4 | u32 BE | e.g. 1=manifest, 2=nodes |
| Version | 2 | u16 BE | Currently emitted as `1`; not enforced on read |
| Flags | 2 | u16 BE | bit0 = critical |
| Compression | 2 | u16 BE | `0` = none (required) |
| DigestAlgorithm | 2 | u16 BE | `1` = SHA-256 (required) |
| Offset | 8 | u64 BE | Absolute byte offset |
| Length | 8 | u64 BE | Section payload length |
| Digest | 32 | raw bytes | SHA-256 of section bytes |
Reader behavior:
- Reject unknown **critical** section types.
- Reject compression != 0.
- Reject digest algorithm != 1.
- Reject out-of-bounds sections.
- Reject digest mismatch.
### 2.3 Required section types
| Type | Name | Required |
|---:|---|---|
| 1 | manifest | yes |
| 2 | nodes | yes |
Decode currently rejects duplicate section type 1 or 2.
---
## 3. Manifest section (CBOR)
Manifest bytes are CBOR-encoded map data (using `cborg`).
### 3.1 Top-level manifest schema
Top-level map has **exactly 8 keys** in this exact decode order in current implementation:
1. `schema` (text)
2. `bundleType` (text)
3. `tree` (map)
4. `runtime` (map)
5. `closure` (text: `"complete"|"partial"`)
6. `roots` (array)
7. `exports` (array)
8. `metadata` (map)
> Important: Current decoder is order-strict; it expects keys in this sequence.
### 3.2 Nested structures
#### `tree` map (3 keys, order-strict)
- `calculus`: text
- `nodeHash`: map
- `nodePayload`: text
`nodeHash` map (2 keys, order-strict):
- `algorithm`: text
- `domain`: text
#### `runtime` map (4 keys, order-strict)
- `semantics`: text
- `evaluation`: text
- `abi`: text
- `capabilities`: array(text)
#### `roots` array of maps
Each root map has 2 keys (order-strict):
- `hash`: bytes (raw 32-byte hash payload encoded as CBOR byte string)
- `role`: text
#### `exports` array of maps
Each export map has 4 keys (order-strict):
- `name`: text
- `root`: bytes (32-byte hash)
- `kind`: text
- `abi`: text
#### `metadata` map
Flexible key set; decoded as map(text -> text), then projected into optional fields:
- `package`
- `version`
- `description`
- `license`
- `createdBy`
Unknown metadata keys are ignored.
### 3.3 Default emitted manifest values
Writers in `Wire.hs` currently emit:
- `schema = "arboricx.bundle.manifest.v1"`
- `bundleType = "tree-calculus-executable-object"`
- `tree.calculus = "tree-calculus.v1"`
- `tree.nodeHash.algorithm = "sha256"`
- `tree.nodeHash.domain = "arboricx.merkle.node.v1"`
- `tree.nodePayload = "arboricx.merkle.payload.v1"`
- `runtime.semantics = "tree-calculus.v1"`
- `runtime.evaluation = "normal-order"`
- `runtime.abi = "arboricx.abi.tree.v1"`
- `runtime.capabilities = []`
- `closure = "complete"`
- `metadata.createdBy = "arboricx"`
---
## 4. Nodes section (binary)
Node section payload layout:
```
node_count: u64 BE
repeat node_count times:
hash: 32 bytes
payload_len: u32 BE
payload: payload_len bytes
```
Node payload grammar:
- `0x00` => Leaf
- `0x01 || child_hash(32)` => Stem
- `0x02 || left_hash(32)||right(32)` => Fork
Section decoder rejects:
- duplicate node hashes,
- truncated entries,
- payload overruns,
- trailing bytes after final node.
---
## 5. Verification behavior (`verifyBundle`)
`verifyBundle` enforces all of:
1. bundle version >= 1.
2. bundle has at least one node.
3. manifest constants match hardcoded v1 values (schema/type/calculus/hash algo/domain/payload/runtime semantics/ABI).
4. runtime capabilities must be empty.
5. closure must be `complete`.
6. manifest has at least one root and one export.
7. root sets in `bundleRoots` and `manifest.roots` must match exactly.
8. each root and export root exists in node map.
9. each node payload deserializes and re-hashes to declared node hash.
10. all referenced child hashes exist.
11. full closure reachability from roots succeeds.
`importBundle` runs decode + verify before storing nodes.
---
## 6. Export/import semantics
### 6.1 Export
`exportNamedBundle`:
- Traverses reachable nodes for each requested root hash.
- Builds node map.
- Builds default manifest and CBOR bytes.
- Emits two sections (manifest, nodes).
`exportBundle` auto-names exports:
- 1 root => `root`
- N>1 => `root0`, `root1`, ...
### 6.2 Import
`importBundle`:
1. Decode bundle.
2. Verify bundle.
3. Insert all node payloads into content store.
4. For each manifest export: reconstruct tree by export root and store name binding in DB.
5. Return bundle root list.
---
## 7. Determinism properties
Current implementation is deterministic for identical logical input because:
- Node map serialized in ascending hash order (`Map.toAscList`).
- Field order in manifest encoding is fixed by code.
- Section ordering is fixed: manifest then nodes.
So repeated exports of same roots produce byte-identical bundles.
---
## 8. Known gaps and sane resolutions
These are important design gaps visible from current code.
### Gap A: Node hash domain mismatch risk (critical)
Status: **resolved in current codebase**.
What was wrong:
- Manifest declared `tree.nodeHash.domain = "arboricx.merkle.node.v1"`.
- Hashing implementation previously used `"tricu.merkle.node.v1"`.
Current state:
- Haskell hashing now uses `"arboricx.merkle.node.v1"`.
- JS reference runtime hashing now uses `"arboricx.merkle.node.v1"`.
- JS manifest validation now requires `"arboricx.merkle.node.v1"`.
Remaining recommendation:
- Keep hash-domain constants centralized/shared to prevent future drift.
- Add explicit test vectors for Leaf/Stem/Fork hashes under the Arboricx domain.
### Gap B: CBOR decode is order-strict, not generic-map tolerant
Observed:
- Decoder expects exact key order for most maps.
Impact:
- Another canonical CBOR writer that reorders keys may decode-fail even if semantically equivalent.
Sane resolution:
- For v1 compatibility, decode maps as unordered key/value collections, require key presence and types, and reject unknown keys only where desired.
- Keep writer deterministic, but relax reader.
### Gap C: “Canonical CBOR” claim is stronger than implementation
Observed:
- Writer uses fixed order but does not explicitly sort keys per RFC 8949 canonical ordering rules.
Sane resolution:
- Either (a) rename as “deterministic CBOR” profile, or (b) implement explicit canonical key ordering and canonical-length/minimal integer forms checks.
### Gap D: Extra section preservation
Observed:
- Decoder tolerates unknown non-critical sections, but `Bundle` model/encoder drops them on re-encode.
Sane resolution:
- Add `bundleExtraSections :: [SectionEntry+Bytes]` if round-trip preservation is desired.
### Gap E: Section version not enforced
Observed:
- Section entry `Version` is parsed but unused.
Sane resolution:
- Enforce known version matrix (e.g., manifest v1, nodes v1), or explicitly document “advisory only”.
### Gap F: Runtime capability policy is hard fail
Observed:
- Any non-empty capabilities list is rejected.
Sane resolution:
- Keep strict for now, but define capability negotiation strategy for v1.1+ (unknown capabilities => reject unless explicitly allowed by host policy).
### Gap G: Error handling style in import/export path
Observed:
- Several paths throw `error` for malformed data/store misses.
Sane resolution:
- Return `Either`-style typed errors through public API (`decode`, `verify`, `import`), reserve exceptions for truly internal faults.
---
## 9. Conformance checklist (v1 current)
A conforming v1 reader/writer for this profile should:
- Implement the 32-byte header and 60-byte section records exactly.
- Support required sections 1 and 2.
- Verify section digests with SHA-256.
- Decode/encode manifest CBOR matching the field model above.
- Parse nodes section and validate node payload structure.
- Recompute and verify node hashes.
- Enforce complete closure for roots.
- Enforce manifest/runtime constants used by v1.
---
## 10. Suggested follow-up docs
To stabilize interoperability, add:
1. `docs/arboricx-bundle-test-vectors.md` (golden header/manifest/nodes + expected hashes).
2. `docs/arboricx-bundle-errors.md` (normative error codes/strings).
3. `docs/arboricx-bundle-evolution.md` (rules for minor/major upgrades, capability negotiation, extra sections).