Drop CBOR for simple custom manifest
This commit is contained in:
@@ -1,339 +0,0 @@
|
||||
# Arboricx Portable Bundle v1 (CBOR Manifest Profile)
|
||||
|
||||
Status: **Draft, implementation-aligned** (derived from `src/Wire.hs` as of 2026-05-07)
|
||||
|
||||
This document specifies the **actual on-wire format and validation behavior** currently implemented by `tricu` for Arboricx bundles, with a focus on the newer CBOR manifest path.
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope
|
||||
|
||||
This profile defines:
|
||||
|
||||
1. The binary container envelope (header + section directory + section payloads).
|
||||
2. The CBOR manifest section format.
|
||||
3. The Merkle node section format.
|
||||
4. Decode/verify/import behavior in `Wire.hs`.
|
||||
5. Known gaps and sane resolutions.
|
||||
|
||||
Non-goals:
|
||||
|
||||
- tricu source parsing/lambda elimination/module semantics.
|
||||
- Signature systems / trust policy.
|
||||
- Compression codecs beyond `none`.
|
||||
|
||||
---
|
||||
|
||||
## 2. Container format
|
||||
|
||||
A bundle is a byte stream:
|
||||
|
||||
```
|
||||
[32-byte header]
|
||||
[section directory: section_count * 60 bytes]
|
||||
[section payload bytes...]
|
||||
```
|
||||
|
||||
### 2.1 Header (32 bytes)
|
||||
|
||||
| Field | Size | Encoding | Value / Notes |
|
||||
|---|---:|---|---|
|
||||
| Magic | 8 | raw bytes | `41 52 42 4f 52 49 58 00` (`"ARBORICX"`) |
|
||||
| Major | 2 | u16 BE | Must be `1` |
|
||||
| Minor | 2 | u16 BE | Currently `0` |
|
||||
| SectionCount | 4 | u32 BE | Number of section directory entries |
|
||||
| Flags | 8 | u64 BE | Currently emitted as `0`; not interpreted |
|
||||
| DirectoryOffset | 8 | u64 BE | Offset of section directory (currently `32`) |
|
||||
|
||||
Reader behavior:
|
||||
- Reject if total bytes < 32.
|
||||
- Reject bad magic.
|
||||
- Reject major != 1.
|
||||
|
||||
### 2.2 Section directory entry (60 bytes each)
|
||||
|
||||
| Field | Size | Encoding | Notes |
|
||||
|---|---:|---|---|
|
||||
| Type | 4 | u32 BE | e.g. 1=manifest, 2=nodes |
|
||||
| Version | 2 | u16 BE | Currently emitted as `1`; not enforced on read |
|
||||
| Flags | 2 | u16 BE | bit0 = critical |
|
||||
| Compression | 2 | u16 BE | `0` = none (required) |
|
||||
| DigestAlgorithm | 2 | u16 BE | `1` = SHA-256 (required) |
|
||||
| Offset | 8 | u64 BE | Absolute byte offset |
|
||||
| Length | 8 | u64 BE | Section payload length |
|
||||
| Digest | 32 | raw bytes | SHA-256 of section bytes |
|
||||
|
||||
Reader behavior:
|
||||
- Reject unknown **critical** section types.
|
||||
- Reject compression != 0.
|
||||
- Reject digest algorithm != 1.
|
||||
- Reject out-of-bounds sections.
|
||||
- Reject digest mismatch.
|
||||
|
||||
### 2.3 Required section types
|
||||
|
||||
| Type | Name | Required |
|
||||
|---:|---|---|
|
||||
| 1 | manifest | yes |
|
||||
| 2 | nodes | yes |
|
||||
|
||||
Decode currently rejects duplicate section type 1 or 2.
|
||||
|
||||
---
|
||||
|
||||
## 3. Manifest section (CBOR)
|
||||
|
||||
Manifest bytes are CBOR-encoded map data (using `cborg`).
|
||||
|
||||
### 3.1 Top-level manifest schema
|
||||
|
||||
Top-level map has **exactly 8 keys** in this exact decode order in current implementation:
|
||||
|
||||
1. `schema` (text)
|
||||
2. `bundleType` (text)
|
||||
3. `tree` (map)
|
||||
4. `runtime` (map)
|
||||
5. `closure` (text: `"complete"|"partial"`)
|
||||
6. `roots` (array)
|
||||
7. `exports` (array)
|
||||
8. `metadata` (map)
|
||||
|
||||
> Important: Current decoder is order-strict; it expects keys in this sequence.
|
||||
|
||||
### 3.2 Nested structures
|
||||
|
||||
#### `tree` map (3 keys, order-strict)
|
||||
- `calculus`: text
|
||||
- `nodeHash`: map
|
||||
- `nodePayload`: text
|
||||
|
||||
`nodeHash` map (2 keys, order-strict):
|
||||
- `algorithm`: text
|
||||
- `domain`: text
|
||||
|
||||
#### `runtime` map (4 keys, order-strict)
|
||||
- `semantics`: text
|
||||
- `evaluation`: text
|
||||
- `abi`: text
|
||||
- `capabilities`: array(text)
|
||||
|
||||
#### `roots` array of maps
|
||||
Each root map has 2 keys (order-strict):
|
||||
- `hash`: bytes (raw 32-byte hash payload encoded as CBOR byte string)
|
||||
- `role`: text
|
||||
|
||||
#### `exports` array of maps
|
||||
Each export map has 4 keys (order-strict):
|
||||
- `name`: text
|
||||
- `root`: bytes (32-byte hash)
|
||||
- `kind`: text
|
||||
- `abi`: text
|
||||
|
||||
#### `metadata` map
|
||||
Flexible key set; decoded as map(text -> text), then projected into optional fields:
|
||||
- `package`
|
||||
- `version`
|
||||
- `description`
|
||||
- `license`
|
||||
- `createdBy`
|
||||
|
||||
Unknown metadata keys are ignored.
|
||||
|
||||
### 3.3 Default emitted manifest values
|
||||
|
||||
Writers in `Wire.hs` currently emit:
|
||||
|
||||
- `schema = "arboricx.bundle.manifest.v1"`
|
||||
- `bundleType = "tree-calculus-executable-object"`
|
||||
- `tree.calculus = "tree-calculus.v1"`
|
||||
- `tree.nodeHash.algorithm = "sha256"`
|
||||
- `tree.nodeHash.domain = "arboricx.merkle.node.v1"`
|
||||
- `tree.nodePayload = "arboricx.merkle.payload.v1"`
|
||||
- `runtime.semantics = "tree-calculus.v1"`
|
||||
- `runtime.evaluation = "normal-order"`
|
||||
- `runtime.abi = "arboricx.abi.tree.v1"`
|
||||
- `runtime.capabilities = []`
|
||||
- `closure = "complete"`
|
||||
- `metadata.createdBy = "arboricx"`
|
||||
|
||||
---
|
||||
|
||||
## 4. Nodes section (binary)
|
||||
|
||||
Node section payload layout:
|
||||
|
||||
```
|
||||
node_count: u64 BE
|
||||
repeat node_count times:
|
||||
hash: 32 bytes
|
||||
payload_len: u32 BE
|
||||
payload: payload_len bytes
|
||||
```
|
||||
|
||||
Node payload grammar:
|
||||
|
||||
- `0x00` => Leaf
|
||||
- `0x01 || child_hash(32)` => Stem
|
||||
- `0x02 || left_hash(32)||right(32)` => Fork
|
||||
|
||||
Section decoder rejects:
|
||||
- duplicate node hashes,
|
||||
- truncated entries,
|
||||
- payload overruns,
|
||||
- trailing bytes after final node.
|
||||
|
||||
---
|
||||
|
||||
## 5. Verification behavior (`verifyBundle`)
|
||||
|
||||
`verifyBundle` enforces all of:
|
||||
|
||||
1. bundle version >= 1.
|
||||
2. bundle has at least one node.
|
||||
3. manifest constants match hardcoded v1 values (schema/type/calculus/hash algo/domain/payload/runtime semantics/ABI).
|
||||
4. runtime capabilities must be empty.
|
||||
5. closure must be `complete`.
|
||||
6. manifest has at least one root and one export.
|
||||
7. root sets in `bundleRoots` and `manifest.roots` must match exactly.
|
||||
8. each root and export root exists in node map.
|
||||
9. each node payload deserializes and re-hashes to declared node hash.
|
||||
10. all referenced child hashes exist.
|
||||
11. full closure reachability from roots succeeds.
|
||||
|
||||
`importBundle` runs decode + verify before storing nodes.
|
||||
|
||||
---
|
||||
|
||||
## 6. Export/import semantics
|
||||
|
||||
### 6.1 Export
|
||||
|
||||
`exportNamedBundle`:
|
||||
- Traverses reachable nodes for each requested root hash.
|
||||
- Builds node map.
|
||||
- Builds default manifest and CBOR bytes.
|
||||
- Emits two sections (manifest, nodes).
|
||||
|
||||
`exportBundle` auto-names exports:
|
||||
- 1 root => `root`
|
||||
- N>1 => `root0`, `root1`, ...
|
||||
|
||||
### 6.2 Import
|
||||
|
||||
`importBundle`:
|
||||
1. Decode bundle.
|
||||
2. Verify bundle.
|
||||
3. Insert all node payloads into content store.
|
||||
4. For each manifest export: reconstruct tree by export root and store name binding in DB.
|
||||
5. Return bundle root list.
|
||||
|
||||
---
|
||||
|
||||
## 7. Determinism properties
|
||||
|
||||
Current implementation is deterministic for identical logical input because:
|
||||
- Node map serialized in ascending hash order (`Map.toAscList`).
|
||||
- Field order in manifest encoding is fixed by code.
|
||||
- Section ordering is fixed: manifest then nodes.
|
||||
|
||||
So repeated exports of same roots produce byte-identical bundles.
|
||||
|
||||
---
|
||||
|
||||
## 8. Known gaps and sane resolutions
|
||||
|
||||
These are important design gaps visible from current code.
|
||||
|
||||
### Gap A: Node hash domain mismatch risk (critical)
|
||||
|
||||
Status: **resolved in current codebase**.
|
||||
|
||||
What was wrong:
|
||||
- Manifest declared `tree.nodeHash.domain = "arboricx.merkle.node.v1"`.
|
||||
- Hashing implementation previously used `"tricu.merkle.node.v1"`.
|
||||
|
||||
Current state:
|
||||
- Haskell hashing now uses `"arboricx.merkle.node.v1"`.
|
||||
- JS reference runtime hashing now uses `"arboricx.merkle.node.v1"`.
|
||||
- JS manifest validation now requires `"arboricx.merkle.node.v1"`.
|
||||
|
||||
Remaining recommendation:
|
||||
- Keep hash-domain constants centralized/shared to prevent future drift.
|
||||
- Add explicit test vectors for Leaf/Stem/Fork hashes under the Arboricx domain.
|
||||
|
||||
### Gap B: CBOR decode is order-strict, not generic-map tolerant
|
||||
|
||||
Observed:
|
||||
- Decoder expects exact key order for most maps.
|
||||
|
||||
Impact:
|
||||
- Another canonical CBOR writer that reorders keys may decode-fail even if semantically equivalent.
|
||||
|
||||
Sane resolution:
|
||||
- For v1 compatibility, decode maps as unordered key/value collections, require key presence and types, and reject unknown keys only where desired.
|
||||
- Keep writer deterministic, but relax reader.
|
||||
|
||||
### Gap C: “Canonical CBOR” claim is stronger than implementation
|
||||
|
||||
Observed:
|
||||
- Writer uses fixed order but does not explicitly sort keys per RFC 8949 canonical ordering rules.
|
||||
|
||||
Sane resolution:
|
||||
- Either (a) rename as “deterministic CBOR” profile, or (b) implement explicit canonical key ordering and canonical-length/minimal integer forms checks.
|
||||
|
||||
### Gap D: Extra section preservation
|
||||
|
||||
Observed:
|
||||
- Decoder tolerates unknown non-critical sections, but `Bundle` model/encoder drops them on re-encode.
|
||||
|
||||
Sane resolution:
|
||||
- Add `bundleExtraSections :: [SectionEntry+Bytes]` if round-trip preservation is desired.
|
||||
|
||||
### Gap E: Section version not enforced
|
||||
|
||||
Observed:
|
||||
- Section entry `Version` is parsed but unused.
|
||||
|
||||
Sane resolution:
|
||||
- Enforce known version matrix (e.g., manifest v1, nodes v1), or explicitly document “advisory only”.
|
||||
|
||||
### Gap F: Runtime capability policy is hard fail
|
||||
|
||||
Observed:
|
||||
- Any non-empty capabilities list is rejected.
|
||||
|
||||
Sane resolution:
|
||||
- Keep strict for now, but define capability negotiation strategy for v1.1+ (unknown capabilities => reject unless explicitly allowed by host policy).
|
||||
|
||||
### Gap G: Error handling style in import/export path
|
||||
|
||||
Observed:
|
||||
- Several paths throw `error` for malformed data/store misses.
|
||||
|
||||
Sane resolution:
|
||||
- Return `Either`-style typed errors through public API (`decode`, `verify`, `import`), reserve exceptions for truly internal faults.
|
||||
|
||||
---
|
||||
|
||||
## 9. Conformance checklist (v1 current)
|
||||
|
||||
A conforming v1 reader/writer for this profile should:
|
||||
|
||||
- Implement the 32-byte header and 60-byte section records exactly.
|
||||
- Support required sections 1 and 2.
|
||||
- Verify section digests with SHA-256.
|
||||
- Decode/encode manifest CBOR matching the field model above.
|
||||
- Parse nodes section and validate node payload structure.
|
||||
- Recompute and verify node hashes.
|
||||
- Enforce complete closure for roots.
|
||||
- Enforce manifest/runtime constants used by v1.
|
||||
|
||||
---
|
||||
|
||||
## 10. Suggested follow-up docs
|
||||
|
||||
To stabilize interoperability, add:
|
||||
|
||||
1. `docs/arboricx-bundle-test-vectors.md` (golden header/manifest/nodes + expected hashes).
|
||||
2. `docs/arboricx-bundle-errors.md` (normative error codes/strings).
|
||||
3. `docs/arboricx-bundle-evolution.md` (rules for minor/major upgrades, capability negotiation, extra sections).
|
||||
419
docs/arboricx-bundle-format.md
Normal file
419
docs/arboricx-bundle-format.md
Normal file
@@ -0,0 +1,419 @@
|
||||
# Arboricx Portable Bundle Format Specification
|
||||
|
||||
**Version:** 0.1
|
||||
**Status:** Exploratory
|
||||
**Author:** A range of slopmachines guided by James Eversole
|
||||
**Human Review Status:** 5 minute scan-through - this is an evolving and malleable document
|
||||
|
||||
The Arboricx Portable Bundle is a self-contained, content-addressed binary format for distributing Tree Calculus programs and their associated Merkle DAGs. It provides:
|
||||
|
||||
- A fixed binary container with header, section directory, and typed sections
|
||||
- A language-neutral Merkle node layer for content-addressed tree values
|
||||
- A fixed-order binary manifest for semantic metadata, exports, and optional extensions
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Top-Level Container Layout](#1-top-level-container-layout)
|
||||
2. [Header](#2-header)
|
||||
3. [Section Directory](#3-section-directory)
|
||||
4. [Section: Manifest (type 1)](#4-section-manifest-type-1)
|
||||
5. [Section: Nodes (type 2)](#5-section-nodes-type-2)
|
||||
6. [Merkle Node Payload Format](#6-merkle-node-payload-format)
|
||||
7. [Merkle Hash Computation](#7-merkle-hash-computation)
|
||||
8. [Tree Calculus Reduction Semantics](#8-tree-calculus-reduction-semantics)
|
||||
9. [Binary Primitives](#9-binary-primitives)
|
||||
10. [Bundle Verification](#10-bundle-verification)
|
||||
11. [Known Section Types](#11-known-section-types)
|
||||
|
||||
---
|
||||
|
||||
## 1. Top-Level Container Layout
|
||||
|
||||
An Arboricx bundle is a flat binary blob with the following layout:
|
||||
|
||||
```
|
||||
+------------------+------------------+------------------+------------------+
|
||||
| Header | Section Directory| Manifest Section | Nodes Section |
|
||||
| (32 bytes) | (N × 60 bytes) | (variable) | (variable) |
|
||||
+------------------+------------------+------------------+------------------+
|
||||
```
|
||||
|
||||
The container uses **big-endian** byte order for all multi-byte integers.
|
||||
|
||||
Total bundle size = 32 + (sectionCount × 60) + manifestSize + nodesSize
|
||||
|
||||
---
|
||||
|
||||
## 2. Header
|
||||
|
||||
| Offset | Size | Field | Description |
|
||||
|--------|------|-------|-------------|
|
||||
| 0 | 8 bytes | Magic | ASCII `"ARBORICX"` (`0x41 0x52 0x42 0x4F 0x52 0x49 0x43 0x58`) |
|
||||
| 8 | 2 bytes | Major version | `u16` BE. Currently `1` |
|
||||
| 10 | 2 bytes | Minor version | `u16` BE. Currently `0` |
|
||||
| 12 | 4 bytes | Section count | `u32` BE. Number of entries in the section directory |
|
||||
| 16 | 8 bytes | Flags | `u64` BE. Reserved; currently all zeros |
|
||||
| 24 | 8 bytes | Directory offset | `u64` BE. Byte offset from the start of the bundle to the section directory |
|
||||
|
||||
**Constraints:**
|
||||
- Major version must be `1`. Bundles with unsupported major versions are rejected.
|
||||
- The directory offset must point to a valid location within the bundle.
|
||||
- The directory offset is always `32` for bundles with the current layout (header immediately followed by the directory).
|
||||
|
||||
---
|
||||
|
||||
## 3. Section Directory
|
||||
|
||||
The section directory is an array of `N` entries, where `N` is the section count from the header. Each entry is exactly **60 bytes**.
|
||||
|
||||
| Offset (within entry) | Size | Field | Description |
|
||||
|----------------------|------|-------|-------------|
|
||||
| 0 | 4 bytes | Type | `u32` BE. Section type identifier (see [Known Section Types](#11-known-section-types)) |
|
||||
| 4 | 2 bytes | Version | `u16` BE. Section-specific version |
|
||||
| 6 | 2 bytes | Flags | `u16` BE. Bit flags: bit 0 (`0x0001`) = critical section |
|
||||
| 8 | 2 bytes | Compression | `u16` BE. Compression codec (currently only `0` = none) |
|
||||
| 10 | 2 bytes | Digest algorithm | `u16` BE. Hash algorithm (currently only `1` = SHA-256) |
|
||||
| 12 | 8 bytes | Offset | `u64` BE. Byte offset from the start of the bundle to the section data |
|
||||
| 20 | 8 bytes | Length | `u64` BE. Length of the section data in bytes |
|
||||
| 28 | 32 bytes | SHA-256 digest | Raw digest of the section data |
|
||||
|
||||
**Verification:**
|
||||
- Unknown critical sections (flags & `0x0001`) are rejected.
|
||||
- Compression must be `0` (none).
|
||||
- Digest algorithm must be `1` (SHA-256).
|
||||
- The SHA-256 digest in the directory entry must match `SHA256(section_data)`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Section: Manifest (type 1)
|
||||
|
||||
The manifest is a binary encoding of bundle metadata. It uses a **fixed-order core** layout followed by an optional **TLV tail** for extensibility.
|
||||
|
||||
### 4.1 Format
|
||||
|
||||
```
|
||||
Manifest =
|
||||
magic 8 bytes "ARBMNFST"
|
||||
major u16 BE Manifest major version (1)
|
||||
minor u16 BE Manifest minor version (0)
|
||||
|
||||
schema string Length-prefixed UTF-8 text
|
||||
bundleType string Length-prefixed UTF-8 text
|
||||
|
||||
treeCalculus string Length-prefixed UTF-8 text
|
||||
treeHashAlgorithm string Length-prefixed UTF-8 text
|
||||
treeHashDomain string Length-prefixed UTF-8 text
|
||||
treeNodePayload string Length-prefixed UTF-8 text
|
||||
|
||||
runtimeSemantics string Length-prefixed UTF-8 text
|
||||
runtimeEvaluation string Length-prefixed UTF-8 text
|
||||
runtimeAbi string Length-prefixed UTF-8 text
|
||||
capabilityCount u32 BE Number of capability strings
|
||||
capabilities string[] Array of length-prefixed UTF-8 capability strings
|
||||
|
||||
closure u8 0 = complete, 1 = partial
|
||||
rootCount u32 BE Number of root entries
|
||||
roots Root[] Array of root entries
|
||||
exportCount u32 BE Number of export entries
|
||||
exports Export[] Array of export entries
|
||||
|
||||
metadataFieldCount u32 BE Number of metadata TLV entries
|
||||
metadataFields TLV[] Metadata tag-value entries
|
||||
extensionFieldCount u32 BE Number of extension TLV entries
|
||||
extensionFields TLV[] Extension tag-value entries (skipped by parsers)
|
||||
```
|
||||
|
||||
**Trailing bytes after the manifest must be zero** (no leftover data).
|
||||
|
||||
### 4.2 String Format
|
||||
|
||||
Every `string` field uses the same encoding:
|
||||
|
||||
```
|
||||
string =
|
||||
length u32 BE Number of UTF-8 bytes in the string (not the number of characters)
|
||||
bytes byte[length] UTF-8 encoded string content
|
||||
```
|
||||
|
||||
The length field carries the byte count, so parsers can skip strings without decoding UTF-8.
|
||||
|
||||
### 4.3 Root Entry
|
||||
|
||||
```
|
||||
Root =
|
||||
hash 32 bytes Raw SHA-256 hash of the Merkle node
|
||||
role string Length-prefixed UTF-8 text ("default" for the first root, "root" for others)
|
||||
```
|
||||
|
||||
The hash is stored as **raw bytes** (not hex-encoded). It corresponds to the Merkle hash of the node.
|
||||
|
||||
### 4.4 Export Entry
|
||||
|
||||
```
|
||||
Export =
|
||||
name string Length-prefixed UTF-8 text (export identifier)
|
||||
root 32 bytes Raw SHA-256 hash of the Merkle node
|
||||
kind string Length-prefixed UTF-8 text (currently "term")
|
||||
abi string Length-prefixed UTF-8 text (ABI string)
|
||||
```
|
||||
|
||||
### 4.5 TLV Entry
|
||||
|
||||
```
|
||||
TLV =
|
||||
tag u16 BE Tag identifier (type)
|
||||
length u32 BE Number of bytes in the value
|
||||
value byte[length] Raw bytes
|
||||
```
|
||||
|
||||
TLV entries support variable-length values and are skippable by parsers that do not recognize a tag: read the `u32` length and advance by `2 + 4 + length` bytes.
|
||||
|
||||
### 4.6 Metadata Tags
|
||||
|
||||
| Tag | Name | Value |
|
||||
|-----|------|-------|
|
||||
| 1 | package | UTF-8 text: package name |
|
||||
| 2 | version | UTF-8 text: version string |
|
||||
| 3 | description | UTF-8 text: description |
|
||||
| 4 | license | UTF-8 text: license identifier or text |
|
||||
| 5 | createdBy | UTF-8 text: creator identifier |
|
||||
|
||||
Unknown metadata tags are ignored. Unknown extension tags are skipped by length.
|
||||
|
||||
### 4.7 Semantic Constraints
|
||||
|
||||
A valid bundle manifest must satisfy:
|
||||
|
||||
| Constraint | Value |
|
||||
|-----------|-------|
|
||||
| `schema` | `"arboricx.bundle.manifest.v1"` |
|
||||
| `bundleType` | `"tree-calculus-executable-object"` |
|
||||
| `treeCalculus` | `"tree-calculus.v1"` |
|
||||
| `treeHashAlgorithm` | `"sha256"` |
|
||||
| `treeHashDomain` | `"arboricx.merkle.node.v1"` |
|
||||
| `treeNodePayload` | `"arboricx.merkle.payload.v1"` |
|
||||
| `runtimeSemantics` | `"tree-calculus.v1"` |
|
||||
| `runtimeAbi` | `"arboricx.abi.tree.v1"` |
|
||||
| `runtimeCapabilities` | Empty array |
|
||||
| `closure` | `0` (complete) |
|
||||
| `rootCount` | At least 1 |
|
||||
| `exportCount` | At least 1 |
|
||||
| Export names | Non-empty |
|
||||
| Export roots | Non-empty (32 bytes each) |
|
||||
|
||||
---
|
||||
|
||||
## 5. Section: Nodes (type 2)
|
||||
|
||||
The nodes section contains all Merkle DAG nodes referenced by the manifest. It is a sequence of node entries preceded by a count.
|
||||
|
||||
```
|
||||
NodesSection =
|
||||
nodeCount u64 BE Total number of node entries
|
||||
entries NodeEntry[]
|
||||
```
|
||||
|
||||
Each node entry:
|
||||
|
||||
```
|
||||
NodeEntry =
|
||||
hash 32 bytes Raw SHA-256 hash of this node
|
||||
payloadLen u32 BE Length of the payload in bytes
|
||||
payload byte[payloadLen] Node payload (see Section 6)
|
||||
```
|
||||
|
||||
The node count is `u64` to support large bundles. Entries are stored in the order produced by the exporter (typically sorted by hash for determinism).
|
||||
|
||||
---
|
||||
|
||||
## 6. Merkle Node Payload Format
|
||||
|
||||
Each node in the Merkle DAG is one of three types. The payload is a single byte type tag followed by hash references:
|
||||
|
||||
### Leaf
|
||||
|
||||
```
|
||||
Payload = 0x00
|
||||
```
|
||||
|
||||
A leaf has no children. The payload is exactly 1 byte.
|
||||
|
||||
### Stem
|
||||
|
||||
```
|
||||
Payload = 0x01 || child_hash (32 bytes raw)
|
||||
```
|
||||
|
||||
A stem has exactly one child. The payload is 33 bytes.
|
||||
|
||||
### Fork
|
||||
|
||||
```
|
||||
Payload = 0x02 || left_hash (32 bytes raw) || right_hash (32 bytes raw)
|
||||
```
|
||||
|
||||
A fork has exactly two children. The payload is 65 bytes.
|
||||
|
||||
**Validation:**
|
||||
- Leaf payloads must be exactly 1 byte (`0x00`).
|
||||
- Stem payloads must be exactly 33 bytes.
|
||||
- Fork payloads must be exactly 65 bytes.
|
||||
- Unknown type bytes are rejected.
|
||||
|
||||
---
|
||||
|
||||
## 7. Merkle Hash Computation
|
||||
|
||||
Each node is identified by a SHA-256 hash of its canonical payload:
|
||||
|
||||
```
|
||||
hash = SHA256( domain_tag || 0x00 || payload )
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
| Component | Value |
|
||||
|-----------|-------|
|
||||
| `domain_tag` | `"arboricx.merkle.node.v1"` as UTF-8 bytes |
|
||||
| Separator | `0x00` (one zero byte) |
|
||||
| `payload` | The node's canonical serialization from Section 6 |
|
||||
|
||||
**Examples:**
|
||||
|
||||
- **Leaf:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x00)`
|
||||
- **Stem:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x01 || child_hash_bytes)`
|
||||
- **Fork:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x02 || left_hash_bytes || right_hash_bytes)`
|
||||
|
||||
The resulting SHA-256 hash is stored as a hex-encoded string in the manifest (64 hex characters). Within the nodes section, it is stored as raw bytes.
|
||||
|
||||
---
|
||||
|
||||
## 8. Tree Calculus Reduction Semantics
|
||||
|
||||
The bundle represents a **Tree Calculus** term as a Merkle DAG. The reduction rules are:
|
||||
|
||||
### Apply Rules
|
||||
|
||||
```
|
||||
apply(Fork(Leaf, a), _) = a
|
||||
apply(Fork(Stem(a), b), c) = apply(apply(a, c), apply(b, c))
|
||||
apply(Fork(Fork, _, _), Leaf) = left of inner Fork
|
||||
apply(Fork(Fork, _, _), Stem) = right of inner Fork
|
||||
apply(Fork(Fork, _, _), Fork) = apply(apply(c, u), v) where c = Fork(u, v)
|
||||
apply(Leaf, b) = Stem(b)
|
||||
apply(Stem(a), b) = Fork(a, b)
|
||||
```
|
||||
|
||||
### Internal Representation
|
||||
|
||||
In the reduction engine, Fork nodes use a `[right, left]` (stack) ordering:
|
||||
- `Fork = [right_child, left_child]`
|
||||
- `Stem = [child]`
|
||||
- `Leaf = []`
|
||||
|
||||
This ordering supports stack-based reduction: pop two terms, apply, push results back.
|
||||
|
||||
### Closure
|
||||
|
||||
The bundle declares `closure = "complete"`, meaning all nodes reachable from export roots are present in the nodes section. No external references exist.
|
||||
|
||||
---
|
||||
|
||||
## 9. Binary Primitives
|
||||
|
||||
All multi-byte integers use **big-endian** byte order.
|
||||
|
||||
### u16 (2 bytes)
|
||||
|
||||
```
|
||||
byte[0] | byte[1]
|
||||
value = (byte[0] << 8) | byte[1]
|
||||
```
|
||||
|
||||
### u32 (4 bytes)
|
||||
|
||||
```
|
||||
byte[0] | byte[1] | byte[2] | byte[3]
|
||||
value = (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3]
|
||||
```
|
||||
|
||||
### u64 (8 bytes)
|
||||
|
||||
```
|
||||
byte[0] ... byte[7]
|
||||
value = (byte[0] << 56) | ... | byte[7]
|
||||
```
|
||||
|
||||
### u8 (1 byte)
|
||||
|
||||
A single byte, value `0-255`.
|
||||
|
||||
---
|
||||
|
||||
## 10. Bundle Verification
|
||||
|
||||
A complete bundle verification proceeds in this order:
|
||||
|
||||
1. **Magic check:** First 8 bytes must be `"ARBORICX"`.
|
||||
2. **Version check:** Major version must be `1`.
|
||||
3. **Section directory:** Parse all entries; reject unknown critical sections.
|
||||
4. **Digest verification:** For each section, compute `SHA256(section_data)` and compare with the digest in the directory entry.
|
||||
5. **Manifest parsing:** Decode the fixed-order manifest; validate semantic constraints.
|
||||
6. **Node section:** Parse all node entries; reject duplicates.
|
||||
7. **Root verification:** All root hashes from the manifest must exist in the node map.
|
||||
8. **Export verification:** All export root hashes must exist in the node map.
|
||||
9. **Node hash verification:** For each node, compute `SHA256(domain || 0x00 || payload)` and compare with the stored hash.
|
||||
10. **Children verification:** For each Stem/Fork node, both child hashes must exist in the node map.
|
||||
11. **Closure verification:** Starting from each root hash, traverse the DAG and confirm all reachable nodes are present.
|
||||
|
||||
---
|
||||
|
||||
## 11. Known Section Types
|
||||
|
||||
| Type | Name | Required | Version | Description |
|
||||
|------|------|----------|---------|-------------|
|
||||
| 1 | Manifest | Yes | 1 | Bundle metadata in fixed-order binary format |
|
||||
| 2 | Nodes | Yes | 1 | Merkle DAG node entries |
|
||||
|
||||
Unknown section types are permitted if not marked as critical (flags bit 0 is not set).
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Complete Example Layout (id.arboricx)
|
||||
|
||||
A minimal `id.arboricx` bundle has:
|
||||
|
||||
```
|
||||
+---------------------------------------------------+
|
||||
| Header (32 bytes) |
|
||||
| Magic: "ARBORICX" |
|
||||
| Major: 1, Minor: 0 |
|
||||
| Section count: 2 |
|
||||
| Flags: 0 |
|
||||
| Dir offset: 32 |
|
||||
+---------------------------------------------------+
|
||||
| Section Directory (120 bytes = 2 × 60) |
|
||||
| Entry 0: type=1 (manifest), offset=152, len=375 |
|
||||
| Entry 1: type=2 (nodes), offset=527, len=284 |
|
||||
+---------------------------------------------------+
|
||||
| Manifest Section (375 bytes) |
|
||||
| Magic: "ARBMNFST" |
|
||||
| Version: 1.0 |
|
||||
| Core strings (schema, bundleType, tree spec, |
|
||||
| runtime spec, capabilities, closure, roots, |
|
||||
| exports, metadata TLVs, extension fields) |
|
||||
+---------------------------------------------------+
|
||||
| Nodes Section (284 bytes) |
|
||||
| Node count: 2 |
|
||||
| Node entry 1: hash + payload (Leaf) |
|
||||
| Node entry 2: hash + payload (Fork) |
|
||||
+---------------------------------------------------+
|
||||
```
|
||||
|
||||
The manifest section starts at byte 152 (0x98) and the nodes section at byte 527 (0x20F).
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: File Extension
|
||||
|
||||
Bundles produced by the `tricu` tool use the `.arboricx` file extension. The `.tri` extension is used for plain source files; the `.arboricx` extension identifies the portable binary format.
|
||||
Reference in New Issue
Block a user