Drop CBOR for simple custom manifest

This commit is contained in:
2026-05-09 12:30:30 -05:00
parent 343ecbf4c4
commit 6dd4c3e607
13 changed files with 939 additions and 863 deletions

118
AGENTS.md
View File

@@ -128,114 +128,18 @@ hash = SHA256("arboricx.merkle.node.v1" <> 0x00 <> serialized_node)
This is stored in SQLite via `ContentStore.hs`. Hash suffixes on identifiers (e.g., `foo_abc123...`) are validated: 1664 hex characters (SHA256).
## 7. Arboricx Portable Wire Format
## 7. Arboricx Portable Bundles (`.arboricx`)
The **Arboricx wire format** (module `Wire.hs`) defines a portable binary bundle for exchanging Tree Calculus terms, their Merkle DAGs, and associated metadata. It is versioned and schema-driven.
Portable executable bundles are generated via `Wire.hs`. See `docs/arboricx-bundle-format.md` for the full binary format spec.
### Header
```bash
# Export a bundle from the content store
./result/bin/tricu export -o myterm.arboricx myterm
# Run a bundle (requires TRICU_DB_PATH)
./result/bin/tricu import -f lib/list.tri
TRICU_DB_PATH=/tmp/tricu.db ./result/bin/tricu export -o list_ops.arboricx append
```
+------------------+-----------------+------------------+----------------+
| Magic (8 bytes) | Major (2 bytes) | Minor (2 bytes) | Section Count |
| | | | (4 bytes) |
+------------------+-----------------+------------------+----------------+
| Flags (8 bytes) | Dir Offset (8 bytes)
+------------------+-----------------+------------------+
```
- **Magic**: `ARBORICX` (`0x41 0x52 0x42 0x4f 0x52 0x49 0x43 0x58`)
- **Header length**: 32 bytes
- **Major version**: `1` | **Minor version**: `0`
### Section Directory
Immediately follows the header. Each section entry is 60 bytes:
```
+------------------+------------------+-----------------+------------------+
| Type (4 bytes) | Version (2 bytes)| Flags (2 bytes) | Compression (2) |
+------------------+------------------+-----------------+------------------+
| Digest Algo (2) | Offset (8 bytes) | Length (8 bytes)| SHA256 digest (32)|
+------------------+------------------+-----------------+------------------+
```
Known section types:
| Type | Name | Required | Description |
|------|-----------|----------|-------------|
| 1 | manifest | Yes | JSON manifest metadata |
| 2 | nodes | Yes | Binary Merkle node payloads |
### Section 1 — Manifest (JSON)
The manifest describes the bundle's semantics, exports, and schema. Key fields:
| Field | Value | Description |
|-------|-------|-------------|
| `schema` | `"arboricx.bundle.manifest.v1"` | Manifest schema version |
| `bundleType` | `"tree-calculus-executable-object"` | Bundle category |
| `tree.calculus` | `"tree-calculus.v1"` | Tree calculus version |
| `tree.nodeHash.algorithm` | `"sha256"` | Hash algorithm |
| `tree.nodeHash.domain` | `"arboricx.merkle.node.v1"` | Hash domain string |
| `tree.nodePayload` | `"arboricx.merkle.payload.v1"` | Payload encoding |
| `runtime.semantics` | `"tree-calculus.v1"` | Evaluation semantics |
| `runtime.abi` | `"arboricx.abi.tree.v1"` | Runtime ABI |
| `closure` | `"complete"` | Bundle must be a complete DAG |
| `roots` | `[{"hash": "...", "role": "..."}]` | Named root hashes |
| `exports` | `[{"name": "...", "root": "..."}]` | Export aliases for roots |
| `metadata.createdBy` | `"arboricx"` | Originator |
### Section 2 — Nodes (Binary)
```
+------------------+-------------------+-------------------+-----------------+
| Node Count (8) | Hash (32 bytes) | Payload Len (4) | Payload (N) |
+------------------+-------------------+-------------------+-----------------+
```
Each node entry contains:
- 32-byte Merkle hash (hex-encoded in identifiers, raw in binary)
- 4-byte big-endian payload length
- N bytes of serialized node payload (`0x00` for Leaf, `0x01 || hash` for Stem, `0x02 || left || right` for Fork)
### Bundle verification flow
1. Check magic bytes
2. Validate major version
3. Parse section directory
4. For each section: verify SHA256 digest against actual bytes
5. Decode JSON manifest
6. Decode binary node entries into Merkle DAG
7. Verify all root hashes present in manifest exist in node map
8. Verify export root hashes present
9. Verify children references are complete (no dangling nodes)
10. Reject unknown critical sections
### Data types (Wire.hs)
| Type | Purpose |
|------|---------|
| `Bundle` | Top-level bundle: version, roots, nodes map, manifest |
| `BundleManifest` | JSON metadata: schema, tree spec, runtime spec, roots, exports |
| `TreeSpec` | Tree calculus version + hash algorithm + payload encoding |
| `NodeHashSpec` | Hash algorithm and domain string |
| `RuntimeSpec` | Semantics, evaluation order, ABI, capabilities |
| `BundleRoot` | Root hash + role (`"default"` or `"root"`) |
| `BundleExport` | Export name + root hash + kind + ABI |
| `BundleMetadata` | Optional package, version, description, license, createdBy |
| `ClosureMode` | `ClosureComplete` or `ClosurePartial` |
### Key functions
| Function | Signature | Purpose |
|----------|-----------|---------|
| `encodeBundle` | `Bundle → ByteString` | Serialize bundle to wire bytes |
| `decodeBundle` | `ByteString → Either String Bundle` | Parse wire bytes into Bundle |
| `verifyBundle` | `Bundle → Either String ()` | Validate DAG, manifest, roots |
| `collectReachableNodes` | `Connection → MerkleHash → IO [(MerkleHash, ByteString)]` | Traverse DAG from root |
| `exportBundle` | `Connection → [MerkleHash] → IO ByteString` | Build bundle from content store |
| `exportNamedBundle` | `Connection → [(Text, MerkleHash)] → IO ByteString` | Build with named roots |
| `importBundle` | `Connection → ByteString → IO [MerkleHash]` | Import bundle into content store |
## 8. Directory Layout
@@ -273,12 +177,12 @@ tricu/
## 9. JS Arboricx Runtime
A JavaScript implementation of the Arboricx portable bundle runtime lives in `ext/js/`.
It is a reference implementation — not a tricu source parser. It reads `.tri.bundle` files produced by the Haskell toolchain, verifies Merkle node hashes, reconstructs tree values, and reduces them.
It is a reference implementation — not a tricu source parser. It reads `.arboricx` files produced by the Haskell toolchain, verifies Merkle node hashes, reconstructs tree values, and reduces them.
From project root:
```bash
node ext/js/src/cli.js inspect test/fixtures/id.tri.bundle
node ext/js/src/cli.js run test/fixtures/true.tri.bundle
node ext/js/src/cli.js inspect test/fixtures/id.arboricx
node ext/js/src/cli.js run test/fixtures/true.arboricx
```
The JS runtime implements:

View File

@@ -1,339 +0,0 @@
# Arboricx Portable Bundle v1 (CBOR Manifest Profile)
Status: **Draft, implementation-aligned** (derived from `src/Wire.hs` as of 2026-05-07)
This document specifies the **actual on-wire format and validation behavior** currently implemented by `tricu` for Arboricx bundles, with a focus on the newer CBOR manifest path.
---
## 1. Scope
This profile defines:
1. The binary container envelope (header + section directory + section payloads).
2. The CBOR manifest section format.
3. The Merkle node section format.
4. Decode/verify/import behavior in `Wire.hs`.
5. Known gaps and sane resolutions.
Non-goals:
- tricu source parsing/lambda elimination/module semantics.
- Signature systems / trust policy.
- Compression codecs beyond `none`.
---
## 2. Container format
A bundle is a byte stream:
```
[32-byte header]
[section directory: section_count * 60 bytes]
[section payload bytes...]
```
### 2.1 Header (32 bytes)
| Field | Size | Encoding | Value / Notes |
|---|---:|---|---|
| Magic | 8 | raw bytes | `41 52 42 4f 52 49 58 00` (`"ARBORICX"`) |
| Major | 2 | u16 BE | Must be `1` |
| Minor | 2 | u16 BE | Currently `0` |
| SectionCount | 4 | u32 BE | Number of section directory entries |
| Flags | 8 | u64 BE | Currently emitted as `0`; not interpreted |
| DirectoryOffset | 8 | u64 BE | Offset of section directory (currently `32`) |
Reader behavior:
- Reject if total bytes < 32.
- Reject bad magic.
- Reject major != 1.
### 2.2 Section directory entry (60 bytes each)
| Field | Size | Encoding | Notes |
|---|---:|---|---|
| Type | 4 | u32 BE | e.g. 1=manifest, 2=nodes |
| Version | 2 | u16 BE | Currently emitted as `1`; not enforced on read |
| Flags | 2 | u16 BE | bit0 = critical |
| Compression | 2 | u16 BE | `0` = none (required) |
| DigestAlgorithm | 2 | u16 BE | `1` = SHA-256 (required) |
| Offset | 8 | u64 BE | Absolute byte offset |
| Length | 8 | u64 BE | Section payload length |
| Digest | 32 | raw bytes | SHA-256 of section bytes |
Reader behavior:
- Reject unknown **critical** section types.
- Reject compression != 0.
- Reject digest algorithm != 1.
- Reject out-of-bounds sections.
- Reject digest mismatch.
### 2.3 Required section types
| Type | Name | Required |
|---:|---|---|
| 1 | manifest | yes |
| 2 | nodes | yes |
Decode currently rejects duplicate section type 1 or 2.
---
## 3. Manifest section (CBOR)
Manifest bytes are CBOR-encoded map data (using `cborg`).
### 3.1 Top-level manifest schema
Top-level map has **exactly 8 keys** in this exact decode order in current implementation:
1. `schema` (text)
2. `bundleType` (text)
3. `tree` (map)
4. `runtime` (map)
5. `closure` (text: `"complete"|"partial"`)
6. `roots` (array)
7. `exports` (array)
8. `metadata` (map)
> Important: Current decoder is order-strict; it expects keys in this sequence.
### 3.2 Nested structures
#### `tree` map (3 keys, order-strict)
- `calculus`: text
- `nodeHash`: map
- `nodePayload`: text
`nodeHash` map (2 keys, order-strict):
- `algorithm`: text
- `domain`: text
#### `runtime` map (4 keys, order-strict)
- `semantics`: text
- `evaluation`: text
- `abi`: text
- `capabilities`: array(text)
#### `roots` array of maps
Each root map has 2 keys (order-strict):
- `hash`: bytes (raw 32-byte hash payload encoded as CBOR byte string)
- `role`: text
#### `exports` array of maps
Each export map has 4 keys (order-strict):
- `name`: text
- `root`: bytes (32-byte hash)
- `kind`: text
- `abi`: text
#### `metadata` map
Flexible key set; decoded as map(text -> text), then projected into optional fields:
- `package`
- `version`
- `description`
- `license`
- `createdBy`
Unknown metadata keys are ignored.
### 3.3 Default emitted manifest values
Writers in `Wire.hs` currently emit:
- `schema = "arboricx.bundle.manifest.v1"`
- `bundleType = "tree-calculus-executable-object"`
- `tree.calculus = "tree-calculus.v1"`
- `tree.nodeHash.algorithm = "sha256"`
- `tree.nodeHash.domain = "arboricx.merkle.node.v1"`
- `tree.nodePayload = "arboricx.merkle.payload.v1"`
- `runtime.semantics = "tree-calculus.v1"`
- `runtime.evaluation = "normal-order"`
- `runtime.abi = "arboricx.abi.tree.v1"`
- `runtime.capabilities = []`
- `closure = "complete"`
- `metadata.createdBy = "arboricx"`
---
## 4. Nodes section (binary)
Node section payload layout:
```
node_count: u64 BE
repeat node_count times:
hash: 32 bytes
payload_len: u32 BE
payload: payload_len bytes
```
Node payload grammar:
- `0x00` => Leaf
- `0x01 || child_hash(32)` => Stem
- `0x02 || left_hash(32)||right(32)` => Fork
Section decoder rejects:
- duplicate node hashes,
- truncated entries,
- payload overruns,
- trailing bytes after final node.
---
## 5. Verification behavior (`verifyBundle`)
`verifyBundle` enforces all of:
1. bundle version >= 1.
2. bundle has at least one node.
3. manifest constants match hardcoded v1 values (schema/type/calculus/hash algo/domain/payload/runtime semantics/ABI).
4. runtime capabilities must be empty.
5. closure must be `complete`.
6. manifest has at least one root and one export.
7. root sets in `bundleRoots` and `manifest.roots` must match exactly.
8. each root and export root exists in node map.
9. each node payload deserializes and re-hashes to declared node hash.
10. all referenced child hashes exist.
11. full closure reachability from roots succeeds.
`importBundle` runs decode + verify before storing nodes.
---
## 6. Export/import semantics
### 6.1 Export
`exportNamedBundle`:
- Traverses reachable nodes for each requested root hash.
- Builds node map.
- Builds default manifest and CBOR bytes.
- Emits two sections (manifest, nodes).
`exportBundle` auto-names exports:
- 1 root => `root`
- N>1 => `root0`, `root1`, ...
### 6.2 Import
`importBundle`:
1. Decode bundle.
2. Verify bundle.
3. Insert all node payloads into content store.
4. For each manifest export: reconstruct tree by export root and store name binding in DB.
5. Return bundle root list.
---
## 7. Determinism properties
Current implementation is deterministic for identical logical input because:
- Node map serialized in ascending hash order (`Map.toAscList`).
- Field order in manifest encoding is fixed by code.
- Section ordering is fixed: manifest then nodes.
So repeated exports of same roots produce byte-identical bundles.
---
## 8. Known gaps and sane resolutions
These are important design gaps visible from current code.
### Gap A: Node hash domain mismatch risk (critical)
Status: **resolved in current codebase**.
What was wrong:
- Manifest declared `tree.nodeHash.domain = "arboricx.merkle.node.v1"`.
- Hashing implementation previously used `"tricu.merkle.node.v1"`.
Current state:
- Haskell hashing now uses `"arboricx.merkle.node.v1"`.
- JS reference runtime hashing now uses `"arboricx.merkle.node.v1"`.
- JS manifest validation now requires `"arboricx.merkle.node.v1"`.
Remaining recommendation:
- Keep hash-domain constants centralized/shared to prevent future drift.
- Add explicit test vectors for Leaf/Stem/Fork hashes under the Arboricx domain.
### Gap B: CBOR decode is order-strict, not generic-map tolerant
Observed:
- Decoder expects exact key order for most maps.
Impact:
- Another canonical CBOR writer that reorders keys may decode-fail even if semantically equivalent.
Sane resolution:
- For v1 compatibility, decode maps as unordered key/value collections, require key presence and types, and reject unknown keys only where desired.
- Keep writer deterministic, but relax reader.
### Gap C: “Canonical CBOR” claim is stronger than implementation
Observed:
- Writer uses fixed order but does not explicitly sort keys per RFC 8949 canonical ordering rules.
Sane resolution:
- Either (a) rename as “deterministic CBOR” profile, or (b) implement explicit canonical key ordering and canonical-length/minimal integer forms checks.
### Gap D: Extra section preservation
Observed:
- Decoder tolerates unknown non-critical sections, but `Bundle` model/encoder drops them on re-encode.
Sane resolution:
- Add `bundleExtraSections :: [SectionEntry+Bytes]` if round-trip preservation is desired.
### Gap E: Section version not enforced
Observed:
- Section entry `Version` is parsed but unused.
Sane resolution:
- Enforce known version matrix (e.g., manifest v1, nodes v1), or explicitly document “advisory only”.
### Gap F: Runtime capability policy is hard fail
Observed:
- Any non-empty capabilities list is rejected.
Sane resolution:
- Keep strict for now, but define capability negotiation strategy for v1.1+ (unknown capabilities => reject unless explicitly allowed by host policy).
### Gap G: Error handling style in import/export path
Observed:
- Several paths throw `error` for malformed data/store misses.
Sane resolution:
- Return `Either`-style typed errors through public API (`decode`, `verify`, `import`), reserve exceptions for truly internal faults.
---
## 9. Conformance checklist (v1 current)
A conforming v1 reader/writer for this profile should:
- Implement the 32-byte header and 60-byte section records exactly.
- Support required sections 1 and 2.
- Verify section digests with SHA-256.
- Decode/encode manifest CBOR matching the field model above.
- Parse nodes section and validate node payload structure.
- Recompute and verify node hashes.
- Enforce complete closure for roots.
- Enforce manifest/runtime constants used by v1.
---
## 10. Suggested follow-up docs
To stabilize interoperability, add:
1. `docs/arboricx-bundle-test-vectors.md` (golden header/manifest/nodes + expected hashes).
2. `docs/arboricx-bundle-errors.md` (normative error codes/strings).
3. `docs/arboricx-bundle-evolution.md` (rules for minor/major upgrades, capability negotiation, extra sections).

View File

@@ -0,0 +1,419 @@
# Arboricx Portable Bundle Format Specification
**Version:** 0.1
**Status:** Exploratory
**Author:** A range of slopmachines guided by James Eversole
**Human Review Status:** 5 minute scan-through - this is an evolving and malleable document
The Arboricx Portable Bundle is a self-contained, content-addressed binary format for distributing Tree Calculus programs and their associated Merkle DAGs. It provides:
- A fixed binary container with header, section directory, and typed sections
- A language-neutral Merkle node layer for content-addressed tree values
- A fixed-order binary manifest for semantic metadata, exports, and optional extensions
## Table of Contents
1. [Top-Level Container Layout](#1-top-level-container-layout)
2. [Header](#2-header)
3. [Section Directory](#3-section-directory)
4. [Section: Manifest (type 1)](#4-section-manifest-type-1)
5. [Section: Nodes (type 2)](#5-section-nodes-type-2)
6. [Merkle Node Payload Format](#6-merkle-node-payload-format)
7. [Merkle Hash Computation](#7-merkle-hash-computation)
8. [Tree Calculus Reduction Semantics](#8-tree-calculus-reduction-semantics)
9. [Binary Primitives](#9-binary-primitives)
10. [Bundle Verification](#10-bundle-verification)
11. [Known Section Types](#11-known-section-types)
---
## 1. Top-Level Container Layout
An Arboricx bundle is a flat binary blob with the following layout:
```
+------------------+------------------+------------------+------------------+
| Header | Section Directory| Manifest Section | Nodes Section |
| (32 bytes) | (N × 60 bytes) | (variable) | (variable) |
+------------------+------------------+------------------+------------------+
```
The container uses **big-endian** byte order for all multi-byte integers.
Total bundle size = 32 + (sectionCount × 60) + manifestSize + nodesSize
---
## 2. Header
| Offset | Size | Field | Description |
|--------|------|-------|-------------|
| 0 | 8 bytes | Magic | ASCII `"ARBORICX"` (`0x41 0x52 0x42 0x4F 0x52 0x49 0x43 0x58`) |
| 8 | 2 bytes | Major version | `u16` BE. Currently `1` |
| 10 | 2 bytes | Minor version | `u16` BE. Currently `0` |
| 12 | 4 bytes | Section count | `u32` BE. Number of entries in the section directory |
| 16 | 8 bytes | Flags | `u64` BE. Reserved; currently all zeros |
| 24 | 8 bytes | Directory offset | `u64` BE. Byte offset from the start of the bundle to the section directory |
**Constraints:**
- Major version must be `1`. Bundles with unsupported major versions are rejected.
- The directory offset must point to a valid location within the bundle.
- The directory offset is always `32` for bundles with the current layout (header immediately followed by the directory).
---
## 3. Section Directory
The section directory is an array of `N` entries, where `N` is the section count from the header. Each entry is exactly **60 bytes**.
| Offset (within entry) | Size | Field | Description |
|----------------------|------|-------|-------------|
| 0 | 4 bytes | Type | `u32` BE. Section type identifier (see [Known Section Types](#11-known-section-types)) |
| 4 | 2 bytes | Version | `u16` BE. Section-specific version |
| 6 | 2 bytes | Flags | `u16` BE. Bit flags: bit 0 (`0x0001`) = critical section |
| 8 | 2 bytes | Compression | `u16` BE. Compression codec (currently only `0` = none) |
| 10 | 2 bytes | Digest algorithm | `u16` BE. Hash algorithm (currently only `1` = SHA-256) |
| 12 | 8 bytes | Offset | `u64` BE. Byte offset from the start of the bundle to the section data |
| 20 | 8 bytes | Length | `u64` BE. Length of the section data in bytes |
| 28 | 32 bytes | SHA-256 digest | Raw digest of the section data |
**Verification:**
- Unknown critical sections (flags & `0x0001`) are rejected.
- Compression must be `0` (none).
- Digest algorithm must be `1` (SHA-256).
- The SHA-256 digest in the directory entry must match `SHA256(section_data)`.
---
## 4. Section: Manifest (type 1)
The manifest is a binary encoding of bundle metadata. It uses a **fixed-order core** layout followed by an optional **TLV tail** for extensibility.
### 4.1 Format
```
Manifest =
magic 8 bytes "ARBMNFST"
major u16 BE Manifest major version (1)
minor u16 BE Manifest minor version (0)
schema string Length-prefixed UTF-8 text
bundleType string Length-prefixed UTF-8 text
treeCalculus string Length-prefixed UTF-8 text
treeHashAlgorithm string Length-prefixed UTF-8 text
treeHashDomain string Length-prefixed UTF-8 text
treeNodePayload string Length-prefixed UTF-8 text
runtimeSemantics string Length-prefixed UTF-8 text
runtimeEvaluation string Length-prefixed UTF-8 text
runtimeAbi string Length-prefixed UTF-8 text
capabilityCount u32 BE Number of capability strings
capabilities string[] Array of length-prefixed UTF-8 capability strings
closure u8 0 = complete, 1 = partial
rootCount u32 BE Number of root entries
roots Root[] Array of root entries
exportCount u32 BE Number of export entries
exports Export[] Array of export entries
metadataFieldCount u32 BE Number of metadata TLV entries
metadataFields TLV[] Metadata tag-value entries
extensionFieldCount u32 BE Number of extension TLV entries
extensionFields TLV[] Extension tag-value entries (skipped by parsers)
```
**Trailing bytes after the manifest must be zero** (no leftover data).
### 4.2 String Format
Every `string` field uses the same encoding:
```
string =
length u32 BE Number of UTF-8 bytes in the string (not the number of characters)
bytes byte[length] UTF-8 encoded string content
```
The length field carries the byte count, so parsers can skip strings without decoding UTF-8.
### 4.3 Root Entry
```
Root =
hash 32 bytes Raw SHA-256 hash of the Merkle node
role string Length-prefixed UTF-8 text ("default" for the first root, "root" for others)
```
The hash is stored as **raw bytes** (not hex-encoded). It corresponds to the Merkle hash of the node.
### 4.4 Export Entry
```
Export =
name string Length-prefixed UTF-8 text (export identifier)
root 32 bytes Raw SHA-256 hash of the Merkle node
kind string Length-prefixed UTF-8 text (currently "term")
abi string Length-prefixed UTF-8 text (ABI string)
```
### 4.5 TLV Entry
```
TLV =
tag u16 BE Tag identifier (type)
length u32 BE Number of bytes in the value
value byte[length] Raw bytes
```
TLV entries support variable-length values and are skippable by parsers that do not recognize a tag: read the `u32` length and advance by `2 + 4 + length` bytes.
### 4.6 Metadata Tags
| Tag | Name | Value |
|-----|------|-------|
| 1 | package | UTF-8 text: package name |
| 2 | version | UTF-8 text: version string |
| 3 | description | UTF-8 text: description |
| 4 | license | UTF-8 text: license identifier or text |
| 5 | createdBy | UTF-8 text: creator identifier |
Unknown metadata tags are ignored. Unknown extension tags are skipped by length.
### 4.7 Semantic Constraints
A valid bundle manifest must satisfy:
| Constraint | Value |
|-----------|-------|
| `schema` | `"arboricx.bundle.manifest.v1"` |
| `bundleType` | `"tree-calculus-executable-object"` |
| `treeCalculus` | `"tree-calculus.v1"` |
| `treeHashAlgorithm` | `"sha256"` |
| `treeHashDomain` | `"arboricx.merkle.node.v1"` |
| `treeNodePayload` | `"arboricx.merkle.payload.v1"` |
| `runtimeSemantics` | `"tree-calculus.v1"` |
| `runtimeAbi` | `"arboricx.abi.tree.v1"` |
| `runtimeCapabilities` | Empty array |
| `closure` | `0` (complete) |
| `rootCount` | At least 1 |
| `exportCount` | At least 1 |
| Export names | Non-empty |
| Export roots | Non-empty (32 bytes each) |
---
## 5. Section: Nodes (type 2)
The nodes section contains all Merkle DAG nodes referenced by the manifest. It is a sequence of node entries preceded by a count.
```
NodesSection =
nodeCount u64 BE Total number of node entries
entries NodeEntry[]
```
Each node entry:
```
NodeEntry =
hash 32 bytes Raw SHA-256 hash of this node
payloadLen u32 BE Length of the payload in bytes
payload byte[payloadLen] Node payload (see Section 6)
```
The node count is `u64` to support large bundles. Entries are stored in the order produced by the exporter (typically sorted by hash for determinism).
---
## 6. Merkle Node Payload Format
Each node in the Merkle DAG is one of three types. The payload is a single byte type tag followed by hash references:
### Leaf
```
Payload = 0x00
```
A leaf has no children. The payload is exactly 1 byte.
### Stem
```
Payload = 0x01 || child_hash (32 bytes raw)
```
A stem has exactly one child. The payload is 33 bytes.
### Fork
```
Payload = 0x02 || left_hash (32 bytes raw) || right_hash (32 bytes raw)
```
A fork has exactly two children. The payload is 65 bytes.
**Validation:**
- Leaf payloads must be exactly 1 byte (`0x00`).
- Stem payloads must be exactly 33 bytes.
- Fork payloads must be exactly 65 bytes.
- Unknown type bytes are rejected.
---
## 7. Merkle Hash Computation
Each node is identified by a SHA-256 hash of its canonical payload:
```
hash = SHA256( domain_tag || 0x00 || payload )
```
Where:
| Component | Value |
|-----------|-------|
| `domain_tag` | `"arboricx.merkle.node.v1"` as UTF-8 bytes |
| Separator | `0x00` (one zero byte) |
| `payload` | The node's canonical serialization from Section 6 |
**Examples:**
- **Leaf:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x00)`
- **Stem:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x01 || child_hash_bytes)`
- **Fork:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x02 || left_hash_bytes || right_hash_bytes)`
The resulting SHA-256 hash is stored as a hex-encoded string in the manifest (64 hex characters). Within the nodes section, it is stored as raw bytes.
---
## 8. Tree Calculus Reduction Semantics
The bundle represents a **Tree Calculus** term as a Merkle DAG. The reduction rules are:
### Apply Rules
```
apply(Fork(Leaf, a), _) = a
apply(Fork(Stem(a), b), c) = apply(apply(a, c), apply(b, c))
apply(Fork(Fork, _, _), Leaf) = left of inner Fork
apply(Fork(Fork, _, _), Stem) = right of inner Fork
apply(Fork(Fork, _, _), Fork) = apply(apply(c, u), v) where c = Fork(u, v)
apply(Leaf, b) = Stem(b)
apply(Stem(a), b) = Fork(a, b)
```
### Internal Representation
In the reduction engine, Fork nodes use a `[right, left]` (stack) ordering:
- `Fork = [right_child, left_child]`
- `Stem = [child]`
- `Leaf = []`
This ordering supports stack-based reduction: pop two terms, apply, push results back.
### Closure
The bundle declares `closure = "complete"`, meaning all nodes reachable from export roots are present in the nodes section. No external references exist.
---
## 9. Binary Primitives
All multi-byte integers use **big-endian** byte order.
### u16 (2 bytes)
```
byte[0] | byte[1]
value = (byte[0] << 8) | byte[1]
```
### u32 (4 bytes)
```
byte[0] | byte[1] | byte[2] | byte[3]
value = (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3]
```
### u64 (8 bytes)
```
byte[0] ... byte[7]
value = (byte[0] << 56) | ... | byte[7]
```
### u8 (1 byte)
A single byte, value `0-255`.
---
## 10. Bundle Verification
A complete bundle verification proceeds in this order:
1. **Magic check:** First 8 bytes must be `"ARBORICX"`.
2. **Version check:** Major version must be `1`.
3. **Section directory:** Parse all entries; reject unknown critical sections.
4. **Digest verification:** For each section, compute `SHA256(section_data)` and compare with the digest in the directory entry.
5. **Manifest parsing:** Decode the fixed-order manifest; validate semantic constraints.
6. **Node section:** Parse all node entries; reject duplicates.
7. **Root verification:** All root hashes from the manifest must exist in the node map.
8. **Export verification:** All export root hashes must exist in the node map.
9. **Node hash verification:** For each node, compute `SHA256(domain || 0x00 || payload)` and compare with the stored hash.
10. **Children verification:** For each Stem/Fork node, both child hashes must exist in the node map.
11. **Closure verification:** Starting from each root hash, traverse the DAG and confirm all reachable nodes are present.
---
## 11. Known Section Types
| Type | Name | Required | Version | Description |
|------|------|----------|---------|-------------|
| 1 | Manifest | Yes | 1 | Bundle metadata in fixed-order binary format |
| 2 | Nodes | Yes | 1 | Merkle DAG node entries |
Unknown section types are permitted if not marked as critical (flags bit 0 is not set).
---
## Appendix A: Complete Example Layout (id.arboricx)
A minimal `id.arboricx` bundle has:
```
+---------------------------------------------------+
| Header (32 bytes) |
| Magic: "ARBORICX" |
| Major: 1, Minor: 0 |
| Section count: 2 |
| Flags: 0 |
| Dir offset: 32 |
+---------------------------------------------------+
| Section Directory (120 bytes = 2 × 60) |
| Entry 0: type=1 (manifest), offset=152, len=375 |
| Entry 1: type=2 (nodes), offset=527, len=284 |
+---------------------------------------------------+
| Manifest Section (375 bytes) |
| Magic: "ARBMNFST" |
| Version: 1.0 |
| Core strings (schema, bundleType, tree spec, |
| runtime spec, capabilities, closure, roots, |
| exports, metadata TLVs, extension fields) |
+---------------------------------------------------+
| Nodes Section (284 bytes) |
| Node count: 2 |
| Node entry 1: hash + payload (Leaf) |
| Node entry 2: hash + payload (Fork) |
+---------------------------------------------------+
```
The manifest section starts at byte 152 (0x98) and the nodes section at byte 527 (0x20F).
---
## Appendix B: File Extension
Bundles produced by the `tricu` tool use the `.arboricx` file extension. The `.tri` extension is used for plain source files; the `.arboricx` extension identifies the portable binary format.

View File

@@ -18,12 +18,12 @@
* Offset 8B u64 BE
* Length 8B u64 BE
* SHA256Digest 32B raw
* Manifest: canonical CBOR-encoded map (cborg output from Haskell)
* Manifest: fixed-order core + TLV tail (ARBMNFST magic)
* Nodes: binary section
*/
import { createHash } from "node:crypto";
import { decodeCbor } from "./cbor.js";
import { decodeManifest } from "./manifest.js";
// ── Constants ───────────────────────────────────────────────────────────────
@@ -173,37 +173,12 @@ export function parseBundle(buffer) {
}
/**
* Post-process a CBOR-decoded manifest to normalize hash fields
* from raw bytes to hex strings (matching the old JSON wire format).
*/
function normalizeManifest(raw) {
const tree = raw.tree;
if (tree && tree.nodeHash && tree.nodeHash.domain) {
tree.nodeHash.domain = tree.nodeHash.domain;
}
// Convert root hashes from raw bytes to hex
const roots = (raw.roots || []).map((r) => ({
...r,
hash: r.hash instanceof Uint8Array ? Buffer.from(r.hash).toString("hex") : r.hash,
}));
// Convert export root hashes from raw bytes to hex
const exports = (raw.exports || []).map((e) => ({
...e,
root: e.root instanceof Uint8Array ? Buffer.from(e.root).toString("hex") : e.root,
}));
return { ...raw, roots, exports };
}
/**
* Convenience: parse and return the manifest from CBOR.
* Convenience: parse and return the manifest from the fixed-order binary format.
*/
export function parseManifest(buffer) {
const bundle = parseBundle(buffer);
const manifestEntry = bundle.sections.get(SECTION_MANIFEST);
return normalizeManifest(decodeCbor(manifestEntry.data));
return decodeManifest(manifestEntry.data);
}
/**

View File

@@ -1,130 +0,0 @@
/**
* cbor.js — Minimal CBOR decoder for the Arboricx manifest format.
*
* Decodes the canonical CBOR produced by the Haskell cborg library:
* - Maps: major type 5 (0xa0 + length)
* - Arrays: major type 4 (0x80 + length)
* - Text strings: major type 3, UTF-8 encoded
* - Byte strings: major type 2
* - Unsigned ints: major type 0
* - Simple values: 0xc2 = false, 0xc3 = true
*
* Only covers the subset needed for the manifest.
*/
// ── Decoding state ──────────────────────────────────────────────────────────
/**
* @param {Buffer} data
* @returns {number} remaining buffer
*/
function makeDecoder(data) {
let offset = 0;
return {
/** @returns {number} current offset */
getPos() { return offset; },
/** @returns {number} remaining bytes */
remaining() { return data.length - offset; },
/** @returns {number} total length */
length() { return data.length; },
/** Read N bytes and advance */
read(n) {
if (offset + n > data.length) {
throw new Error(`CBOR read: expected ${n} bytes, ${data.length - offset} remaining at offset ${offset}`);
}
const slice = data.slice(offset, offset + n);
offset += n;
return slice;
},
/** Read a single byte */
readByte() {
if (offset >= data.length) {
throw new Error(`CBOR readByte: no bytes remaining at offset ${offset}`);
}
return data[offset++];
},
};
}
// ── CBOR helpers ────────────────────────────────────────────────────────────
/**
* Read a CBOR length (major type initial byte encodes length for values < 24).
* For 24+, reads additional bytes per spec.
* @returns {number}
*/
function cborReadLength(dec, startByte) {
const additional = startByte & 0x1f;
if (additional < 24) return additional;
if (additional === 24) return dec.read(1)[0];
if (additional === 25) return dec.read(2).readUint16BE(0);
if (additional === 26) return dec.read(4).readUint32BE(0);
throw new Error(`CBOR: unsupported additional info ${additional}`);
}
// ── Top-level decode ────────────────────────────────────────────────────────
/**
* Decode a single CBOR value from buffer bytes.
* @param {Buffer} buf
* @returns {*}
*/
export function decodeCbor(buf) {
const dec = makeDecoder(buf);
const result = cborDecode(dec);
return result;
}
function cborDecode(dec) {
const first = dec.readByte();
const major = (first >> 5) & 0x07;
const info = first & 0x1f;
switch (major) {
case 0: // unsigned int
case 1: // negative int
return cborReadLength(dec, first);
case 2: // byte string
return dec.read(cborReadLength(dec, first));
case 3: // text string (UTF-8)
const len = cborReadLength(dec, first);
return dec.read(len).toString("utf-8");
case 4: // array
const arrLen = cborReadLength(dec, first);
const arr = [];
for (let i = 0; i < arrLen; i++) {
arr.push(cborDecode(dec));
}
return arr;
case 5: // map
const mapLen = cborReadLength(dec, first);
const map = {};
for (let i = 0; i < mapLen; i++) {
const key = cborDecode(dec);
const val = cborDecode(dec);
map[key] = val;
}
return map;
case 7: // simple values / floats
if (info === 20) return false;
if (info === 21) return true;
if (info === 22) return null; // undefined
if (info === 23) return null; // break (shouldn't appear in definite-length)
// 0xf9-fb are half/float/double floats — not used by our writer
throw new Error(`CBOR: unsupported simple value ${info}`);
default:
// Tags (major 6) and break (0xff) — not used in our manifest
throw new Error(`CBOR: unsupported major type ${major}, info ${info}`);
}
}

View File

@@ -1,13 +1,220 @@
/**
* manifest.js — Minimal manifest parsing and export lookup.
* manifest.js — Fixed-order manifest parsing and export lookup.
*
* The manifest is a JSON object with fields:
* schema, bundleType, tree, runtime, closure, roots, exports,
* imports, sections, metadata
* The manifest binary format (ManifestV1):
* magic(8) + major(u16) + minor(u16)
* + schema(string) + bundleType(string)
* + treeCalculus(string) + treeHashAlgorithm(string) + treeHashDomain(string) + treeNodePayload(string)
* + runtimeSemantics(string) + runtimeEvaluation(string) + runtimeAbi(string)
* + capabilityCount(u32) + capabilities(string[])
* + closure(u8)
* + rootCount(u32) + roots[]
* + exportCount(u32) + exports[]
* + metadataFieldCount(u32) + metadataTLVs[]
* + extensionFieldCount(u32) + extensionTLVs[]
*
* We parse only what we need for runtime entrypoint selection.
* String format: u32 BE length + UTF-8 bytes.
* Root: 32 bytes raw hash + role(string).
* Export: name(string) + 32 bytes raw root hash + kind(string) + abi(string).
* TLV: u16 tag + u32 length + value bytes.
*/
// ── Constants ───────────────────────────────────────────────────────────────
const MANIFEST_MAGIC = "ARBMNFST";
const MANIFEST_MAJOR = 1;
const MANIFEST_MINOR = 0;
// Metadata TLV tags
const TAG_PACKAGE = 1;
const TAG_VERSION = 2;
const TAG_DESCRIPTION = 3;
const TAG_LICENSE = 4;
const TAG_CREATED_BY = 5;
// Closure bytes
const CLOSURE_COMPLETE = 0;
const CLOSURE_PARTIAL = 1;
// ── Binary helpers ──────────────────────────────────────────────────────────
function u16(buf, off) {
if (off + 2 > buf.length) throw new Error("manifest: not enough bytes for u16");
return { value: buf.readUint16BE(off), next: off + 2 };
}
function u32(buf, off) {
if (off + 4 > buf.length) throw new Error("manifest: not enough bytes for u32");
return { value: buf.readUint32BE(off), next: off + 4 };
}
function u8(buf, off) {
if (off >= buf.length) throw new Error("manifest: not enough bytes for u8");
return { value: buf.readUint8(off), next: off + 1 };
}
/**
* Read a length-prefixed UTF-8 string: u32 BE length + UTF-8 bytes.
* Returns { text, next }.
*/
function readStr(buf, off) {
const { value: len, next: afterLen } = u32(buf, off);
if (afterLen + len > buf.length) throw new Error("manifest: string extends beyond input");
return { text: buf.toString("utf-8", afterLen, afterLen + len), next: afterLen + len };
}
/**
* Read raw bytes of given length.
* Returns { bytes, next }.
*/
function readRaw(buf, off, n) {
if (off + n > buf.length) throw new Error(`manifest: not enough bytes for ${n}-byte read`);
return { value: buf.slice(off, off + n), next: off + n };
}
// ── Manifest decoder ────────────────────────────────────────────────────────
/**
* Decode the manifest binary from a Buffer.
*
* Returns a normalized manifest object matching the shape expected
* by validateManifest / selectExport.
*/
export function decodeManifest(buf) {
let off = 0;
// Magic (8 bytes)
const magic = buf.toString("utf-8", 0, 8);
if (magic !== MANIFEST_MAGIC) {
throw new Error(`invalid manifest magic: expected ${MANIFEST_MAGIC}, got "${magic}"`);
}
off = 8;
// Version
const { value: major } = u16(buf, off);
if (major !== MANIFEST_MAJOR) throw new Error(`unsupported manifest major version: ${major}`);
off += 4; // u16 major + u16 minor
// Helper: read length-prefixed text
const readText = () => {
const { text, next } = readStr(buf, off);
off = next;
return text;
};
// Core strings
const schema = readText();
const bundleType = readText();
const treeCalculus = readText();
const treeHashAlgorithm = readText();
const treeHashDomain = readText();
const treeNodePayload = readText();
const runtimeSemantics = readText();
const runtimeEvaluation = readText();
const runtimeAbi = readText();
// Capabilities (u32 count + string[])
const { value: capCount } = u32(buf, off);
off += 4;
const capabilities = [];
for (let i = 0; i < capCount; i++) {
capabilities.push(readText());
}
// Closure (u8)
const { value: closureByte } = u8(buf, off);
off += 1;
const closure = closureByte === CLOSURE_COMPLETE ? "complete" : "partial";
// Roots (u32 count + Root[])
// Root: 32 bytes raw hash + role(string)
const { value: rootCount } = u32(buf, off);
off += 4;
const roots = [];
for (let i = 0; i < rootCount; i++) {
const { value: hashRaw } = readRaw(buf, off, 32);
off += 32;
const { text: role, next: rOff } = readStr(buf, off);
off = rOff;
roots.push({ hash: hashRaw.toString("hex"), role });
}
// Exports (u32 count + Export[])
// Export: name(string) + 32 bytes raw root hash + kind(string) + abi(string)
const { value: exportCount } = u32(buf, off);
off += 4;
const exports = [];
for (let i = 0; i < exportCount; i++) {
const { text: name, next: nOff } = readStr(buf, off);
off = nOff;
const { value: expHashRaw } = readRaw(buf, off, 32);
off += 32;
const { text: kind, next: kOff } = readStr(buf, off);
off = kOff;
const { text: abi, next: aOff } = readStr(buf, off);
off = aOff;
exports.push({ name, root: expHashRaw.toString("hex"), kind, abi });
}
// Metadata (u32 count + TLV[])
// TLV: u16 tag + u32 length + value bytes
const { value: metaCount } = u32(buf, off);
off += 4;
const metadata = {};
for (let i = 0; i < metaCount; i++) {
const { value: tag } = u16(buf, off);
off += 2;
const { value: tlvLen } = u32(buf, off);
off += 4;
const { value: tlvRaw } = readRaw(buf, off, tlvLen);
off += tlvLen;
const val = tlvRaw.toString("utf-8");
switch (tag) {
case TAG_PACKAGE: metadata.package = val; break;
case TAG_VERSION: metadata.version = val; break;
case TAG_DESCRIPTION: metadata.description = val; break;
case TAG_LICENSE: metadata.license = val; break;
case TAG_CREATED_BY: metadata.createdBy = val; break;
}
}
// Extensions (u32 count + TLV[] — skip all)
const { value: extCount } = u32(buf, off);
off += 4;
for (let i = 0; i < extCount; i++) {
const { value: _tag } = u16(buf, off);
off += 2;
const { value: tlvLen } = u32(buf, off);
off += 4;
off += tlvLen; // skip value
}
return {
schema,
bundleType,
tree: {
calculus: treeCalculus,
nodeHash: {
algorithm: treeHashAlgorithm,
domain: treeHashDomain,
},
nodePayload: treeNodePayload,
},
runtime: {
semantics: runtimeSemantics,
evaluation: runtimeEvaluation,
abi: runtimeAbi,
capabilities,
},
closure,
roots,
exports,
metadata: Object.keys(metadata).length > 0 ? metadata : undefined,
};
}
// ── Validation ──────────────────────────────────────────────────────────────
/**
* Validate the manifest against the runtime profile requirements.
* Throws on violation.

View File

@@ -24,40 +24,22 @@ module Wire
import ContentStore (getNodeMerkle, loadTree, putMerkleNode, storeTerm)
import Research
import Codec.CBOR.Decoding ( Decoder
, decodeString
, decodeBytes
, decodeListLen
, decodeMapLen
)
import Control.Monad (replicateM, forM)
import Codec.CBOR.Encoding ( Encoding
, encodeMapLen
, encodeListLen
, encodeString
, encodeBytes
)
import Codec.CBOR.Write (toLazyByteString)
import Data.Monoid (mconcat)
import Codec.CBOR.Read (deserialiseFromBytes, DeserialiseFailure(..))
import Control.Exception (SomeException, evaluate, try)
import Control.Monad (foldM, unless, when)
import Crypto.Hash (Digest, SHA256, hash)
import Data.Bits ((.&.), (.|.), shiftL, shiftR)
import Data.Bits ((.|.), (.&.), shiftL, shiftR)
import Data.ByteArray (convert)
import Data.ByteString (ByteString)
import Data.Foldable (traverse_)
import Data.Map (Map)
import Data.Text (Text, unpack)
import Data.Text.Encoding (decodeUtf8, encodeUtf8)
import Data.Word (Word16, Word32, Word64)
import Data.Text.Encoding (decodeUtf8, decodeUtf8', encodeUtf8)
import Data.Word (Word16, Word32, Word64, Word8)
import Database.SQLite.Simple (Connection)
import GHC.Generics (Generic)
import qualified Data.ByteString as BS
import qualified Data.ByteString.Base16 as Base16
import qualified Data.ByteString.Lazy as BL
import qualified Data.Map as Map
import qualified Data.Set as Set
import qualified Data.Text as T
@@ -91,92 +73,316 @@ compressionNone = 0
digestSha256 = 1
-- ---------------------------------------------------------------------------
-- CBOR encoding helpers
-- Manifest binary constants
-- ---------------------------------------------------------------------------
-- | Canonical CBOR map length encoder.
cmkLen :: Int -> Encoding
cmkLen n = encodeMapLen (fromIntegral n)
-- | Magic prefix identifying the fixed-order manifest v1 format.
manifestMagic :: ByteString
manifestMagic = "ARBMNFST"
-- | Decode a CBOR array of n elements.
decodeListN :: Decoder s a -> Int -> Decoder s [a]
decodeListN dec n = replicateM n dec
-- | Manifest major version.
manifestMajorVersion :: Word16
manifestMajorVersion = 1
-- | Decode a CBOR map (sequence of key-value pairs).
decodeMapN :: Decoder s a -> Decoder s b -> Int -> Decoder s [(a, b)]
decodeMapN keyDec valDec n = forM [1..n] $ \_ ->
keyDec >>= \k -> valDec >>= \v -> pure (k, v)
-- | Manifest minor version.
manifestMinorVersion :: Word16
manifestMinorVersion = 0
decodeKey :: Text -> Decoder s ()
decodeKey expected = do
actual <- decodeString
unless (actual == expected) $
fail $ "expected key " ++ show expected ++ ", got " ++ show actual
-- | Closure mode to byte.
closureToByte :: ClosureMode -> Word8
closureToByte = \case
ClosureComplete -> 0
ClosurePartial -> 1
-- | Canonical CBOR array length encoder.
cakLen :: Int -> Encoding
cakLen n = encodeListLen (fromIntegral n)
closureFromByte :: Word8 -> Either String ClosureMode
closureFromByte = \case
0 -> Right ClosureComplete
1 -> Right ClosurePartial
n -> Left $ "unsupported closure byte: " ++ show n
-- | Encode a canonical CBOR map with key-value pairs as flat sequence.
cmkPairs :: [(Text, Encoding)] -> Encoding
cmkPairs [] = cmkLen 0
cmkPairs kvs = cmkLen (length kvs) <> mconcat [encodeString k <> v | (k, v) <- kvs]
-- | Encode a canonical CBOR array.
cakSeq :: [Encoding] -> Encoding
cakSeq [] = cakLen 0
cakSeq xs = cakLen (length xs) <> mconcat xs
-- | Encode a canonical CBOR text string.
encText :: Text -> Encoding
encText = encodeString
-- | Encode a canonical CBOR byte string.
encBytes :: ByteString -> Encoding
encBytes = encodeBytes
-- | Metadata tag constants.
tagPackage, tagVersion, tagDescription, tagLicense, tagCreatedBy :: Word16
tagPackage = 1
tagVersion = 2
tagDescription = 3
tagLicense = 4
tagCreatedBy = 5
-- ---------------------------------------------------------------------------
-- Data types with CBOR instances
-- Fixed-order manifest binary helpers
-- ---------------------------------------------------------------------------
-- | Encode a UTF-8 text string as: u32 length + UTF-8 bytes.
encodeLengthPrefixedText :: Text -> ByteString
encodeLengthPrefixedText t = encode32 (fromIntegral $ BS.length bs) <> bs
where bs = encodeUtf8 t
-- | Decode a length-prefixed UTF-8 text string.
-- Returns the decoded Text and the remaining ByteString.
decodeLengthPrefixedText :: ByteString -> Either String (Text, ByteString)
decodeLengthPrefixedText bs =
case decode32be "text_length" bs of
Left err -> Left $ "decodeLengthPrefixedText: " ++ err
Right (len, rest) -> do
let payloadLen = fromIntegral len
when (BS.length rest < payloadLen) $
Left "decodeLengthPrefixedText: string extends beyond input"
let (textBytes, after) = BS.splitAt payloadLen rest
case decodeUtf8' textBytes of
Right txt -> Right (txt, after)
Left _ -> Left "decodeLengthPrefixedText: invalid UTF-8"
-- | Encode a metadata value as a TLV entry: u16 tag + u32 length + raw bytes.
encodeMetadataTLV :: Word16 -> ByteString -> ByteString
encodeMetadataTLV tag val = encode16 tag <> encode32 (fromIntegral $ BS.length val) <> val
-- ---------------------------------------------------------------------------
-- Fixed-order manifest encoders
-- ---------------------------------------------------------------------------
-- | Encode the entire manifest in fixed-order core + TLV tail layout.
encodeManifest :: BundleManifest -> ByteString
encodeManifest m =
manifestMagic
<> encode16 manifestMajorVersion
<> encode16 manifestMinorVersion
<> encodeLengthPrefixedText (manifestSchema m)
<> encodeLengthPrefixedText (manifestBundleType m)
<> encodeLengthPrefixedText (treeCalculus (manifestTree m))
<> encodeLengthPrefixedText (nodeHashAlgorithm (treeNodeHash (manifestTree m)))
<> encodeLengthPrefixedText (nodeHashDomain (treeNodeHash (manifestTree m)))
<> encodeLengthPrefixedText (treeNodePayload (manifestTree m))
<> encodeLengthPrefixedText (runtimeSemantics (manifestRuntime m))
<> encodeLengthPrefixedText (runtimeEvaluation (manifestRuntime m))
<> encodeLengthPrefixedText (runtimeAbi (manifestRuntime m))
<> encode32 (fromIntegral $ length (runtimeCapabilities (manifestRuntime m)))
<> encodeCapabilities (runtimeCapabilities (manifestRuntime m))
<> BS.pack [closureToByte (manifestClosure m)]
<> encode32 (fromIntegral $ length (manifestRoots m))
<> encodeRoots (manifestRoots m)
<> encode32 (fromIntegral $ length (manifestExports m))
<> encodeExports (manifestExports m)
<> encodeMetadataTLVs (manifestMetadata m)
<> encode32 0 -- zero extension fields
encodeCapabilities :: [Text] -> ByteString
encodeCapabilities caps = mconcat (map encodeLengthPrefixedText caps)
encodeRoots :: [BundleRoot] -> ByteString
encodeRoots = mconcat . map encodeRoot
encodeRoot :: BundleRoot -> ByteString
encodeRoot root =
merkleHashToRaw (rootHash root)
<> encodeLengthPrefixedText (rootRole root)
encodeExports :: [BundleExport] -> ByteString
encodeExports = mconcat . map encodeExport
encodeExport :: BundleExport -> ByteString
encodeExport exp =
encodeLengthPrefixedText (exportName exp)
<> merkleHashToRaw (exportRoot exp)
<> encodeLengthPrefixedText (exportKind exp)
<> encodeLengthPrefixedText (exportAbi exp)
-- | Encode metadata as: u32 field count + TLV entries for present fields.
-- Metadata TLV values are raw UTF-8 bytes; the TLV length already carries size.
encodeMetadataTLVs :: BundleMetadata -> ByteString
encodeMetadataTLVs m =
let entries = metadataTLVEntries m
in encode32 (fromIntegral $ length entries) <> encodeTLVs entries
metadataTLVEntries :: BundleMetadata -> [(Word16, ByteString)]
metadataTLVEntries m =
maybeEntry tagPackage (metadataPackage m)
++ maybeEntry tagVersion (metadataVersion m)
++ maybeEntry tagDescription (metadataDescription m)
++ maybeEntry tagLicense (metadataLicense m)
++ maybeEntry tagCreatedBy (metadataCreatedBy m)
where
maybeEntry _ Nothing = []
maybeEntry tag (Just value) = [(tag, encodeUtf8 value)]
encodeTLVs :: [(Word16, ByteString)] -> ByteString
encodeTLVs tlvs = mconcat (map (uncurry encodeMetadataTLV) tlvs)
-- ---------------------------------------------------------------------------
-- Fixed-order manifest decoders
-- ---------------------------------------------------------------------------
-- | Decode the manifest from fixed-order core + TLV tail bytes.
-- All remaining bytes after the core fields are treated as the TLV tail.
decodeManifest :: ByteString -> Either String BundleManifest
decodeManifest bs = do
-- Header
when (BS.length bs < 8) $ Left "manifest too short for magic"
when (BS.take 8 bs /= manifestMagic) $ Left "invalid manifest magic"
let rest = BS.drop 8 bs
(major, rest') <- decode16be "major" rest
when (major /= manifestMajorVersion) $ Left $ "unsupported manifest major version: " ++ show major
(_minor, rest'') <- decode16be "minor" rest'
-- Core strings
(schema, rest''') <- decodeLengthPrefixedText rest''
(bundleType, rest'''') <- decodeLengthPrefixedText rest'''
-- Tree spec fields (flat)
(calc, rest1) <- decodeLengthPrefixedText rest''''
(alg, rest2) <- decodeLengthPrefixedText rest1
(domain, rest3) <- decodeLengthPrefixedText rest2
(payload, rest4) <- decodeLengthPrefixedText rest3
-- Runtime spec fields (flat)
(sem, restR1) <- decodeLengthPrefixedText rest4
(eval, restR2) <- decodeLengthPrefixedText restR1
(abi, restR3) <- decodeLengthPrefixedText restR2
(capCount, restR4) <- decode32be "capability_count" restR3
let capLen = fromIntegral capCount
(caps, restR5) <- decodeCapabilities capLen restR4
-- Closure
when (BS.length restR5 < 1) $ Left "manifest truncated: missing closure byte"
let (closureByte, restR6) = BS.splitAt 1 restR5
closure <- closureFromByte (head $ BS.unpack closureByte)
-- Roots
(rootCount, restR7) <- decode32be "root_count" restR6
let rootCountInt = fromIntegral rootCount
(roots, restR8) <- decodeRoots rootCountInt restR7
-- Exports
(exportCount, restR9) <- decode32be "export_count" restR8
let exportCountInt = fromIntegral exportCount
(exports, restR10) <- decodeExports exportCountInt restR9
-- TLV tail
(metadata, _ext) <- decodeMetadataAndExtensions restR10
pure BundleManifest
{ manifestSchema = schema
, manifestBundleType = bundleType
, manifestTree = TreeSpec
{ treeCalculus = calc
, treeNodeHash = NodeHashSpec
{ nodeHashAlgorithm = alg
, nodeHashDomain = domain
}
, treeNodePayload = payload
}
, manifestRuntime = RuntimeSpec
{ runtimeSemantics = sem
, runtimeEvaluation = eval
, runtimeAbi = abi
, runtimeCapabilities = caps
}
, manifestClosure = closure
, manifestRoots = roots
, manifestExports = exports
, manifestMetadata = metadata
}
-- | Decode length-prefixed capability strings.
decodeCapabilities :: Int -> ByteString -> Either String ([Text], ByteString)
decodeCapabilities 0 bs = Right ([], bs)
decodeCapabilities n bs = do
(txt, rest) <- decodeLengthPrefixedText bs
(restTxts, restFinal) <- decodeCapabilities (n - 1) rest
Right (txt : restTxts, restFinal)
-- | Decode root entries.
decodeRoots :: Int -> ByteString -> Either String ([BundleRoot], ByteString)
decodeRoots 0 bs = Right ([], bs)
decodeRoots n bs = do
when (BS.length bs < 32) $ Left "decodeRoots: truncated root hash"
let (hashBytes, rest) = BS.splitAt 32 bs
role <- decodeLengthPrefixedText rest
(restRoots, restFinal) <- decodeRoots (n - 1) (snd role)
Right (BundleRoot (rawToMerkleHash hashBytes) (fst role) : restRoots, restFinal)
-- | Decode export entries.
decodeExports :: Int -> ByteString -> Either String ([BundleExport], ByteString)
decodeExports 0 bs = Right ([], bs)
decodeExports n bs = do
name <- decodeLengthPrefixedText bs
when (BS.length (snd name) < 32) $ Left "decodeExports: truncated export root hash"
let (hashBytes, rest) = BS.splitAt 32 (snd name)
kind <- decodeLengthPrefixedText rest
abi <- decodeLengthPrefixedText (snd kind)
(restExports, restFinal) <- decodeExports (n - 1) (snd abi)
Right (BundleExport (fst name) (rawToMerkleHash hashBytes) (fst kind) (fst abi) : restExports, restFinal)
-- | Decode TLV tail into metadata and extensions.
-- Layout: u32 metadata-count, metadata TLVs, u32 extension-count, extension TLVs.
-- For now, known metadata tags are decoded and extension TLVs are skipped.
decodeMetadataAndExtensions :: ByteString -> Either String (BundleMetadata, ByteString)
decodeMetadataAndExtensions bs = do
(metadataCount, rest1) <- decode32be "metadata_field_count" bs
(metadataTlvs, rest2) <- decodeTLVs (fromIntegral metadataCount) rest1
metadata <- decodeMetadataTLVs metadataTlvs
(extensionCount, rest3) <- decode32be "extension_field_count" rest2
(_extensionTlvs, rest4) <- decodeTLVs (fromIntegral extensionCount) rest3
unless (BS.null rest4) $ Left "trailing bytes after manifest TLV tail"
Right (metadata, rest4)
-- | Decode a fixed number of TLV entries.
decodeTLVs :: Int -> ByteString -> Either String ([TLVEntry], ByteString)
decodeTLVs 0 bs = Right ([], bs)
decodeTLVs n bs = do
(tag, rest1) <- decode16be "tlv_tag" bs
(len, rest2) <- decode32be "tlv_length" rest1
let payloadLen = fromIntegral len
when (BS.length rest2 < payloadLen) $ Left "TLV value extends beyond input"
let (value, after) = BS.splitAt payloadLen rest2
(restTlvs, restFinal) <- decodeTLVs (n - 1) after
Right ((tag, value) : restTlvs, restFinal)
-- | Decode known metadata TLV entries into BundleMetadata.
-- Unknown tags are ignored.
decodeMetadataTLVs :: [(Word16, ByteString)] -> Either String BundleMetadata
decodeMetadataTLVs tlvs = do
pkg <- decodeOptionalMetadataText tagPackage
ver <- decodeOptionalMetadataText tagVersion
desc <- decodeOptionalMetadataText tagDescription
lic <- decodeOptionalMetadataText tagLicense
by <- decodeOptionalMetadataText tagCreatedBy
pure BundleMetadata
{ metadataPackage = pkg
, metadataVersion = ver
, metadataDescription = desc
, metadataLicense = lic
, metadataCreatedBy = by
}
where
lookupTag t = go t tlvs
go _ [] = Nothing
go t ((tag, val):rest)
| tag == t = Just val
| otherwise = go t rest
decodeOptionalMetadataText tag =
case lookupTag tag of
Nothing -> Right Nothing
Just raw -> case decodeUtf8' raw of
Right txt -> Right (Just txt)
Left _ -> Left $ "metadata TLV has invalid UTF-8 for tag " ++ show tag
type TLVEntry = (Word16, ByteString)
-- ---------------------------------------------------------------------------
-- Data types
-- ---------------------------------------------------------------------------
-- | Closure declaration.
data ClosureMode = ClosureComplete | ClosurePartial
deriving (Show, Eq, Ord, Generic)
toCBORClosure :: ClosureMode -> Encoding
toCBORClosure = encText . \case
ClosureComplete -> "complete"
ClosurePartial -> "partial"
closureFromCBOR :: Decoder s ClosureMode
closureFromCBOR = decodeString >>= \case
"complete" -> pure ClosureComplete
"partial" -> pure ClosurePartial
other -> fail $ "ClosureMode: " ++ show other
-- | Hash specification (algorithm + domain strings).
data NodeHashSpec = NodeHashSpec
{ nodeHashAlgorithm :: Text
, nodeHashDomain :: Text
} deriving (Show, Eq, Ord, Generic)
toCBORNodeHashSpec :: NodeHashSpec -> Encoding
toCBORNodeHashSpec (NodeHashSpec alg dom) =
cmkPairs
[ ("algorithm", encText alg)
, ("domain", encText dom)
]
nodeHashSpecFromCBOR :: Decoder s NodeHashSpec
nodeHashSpecFromCBOR = do
n <- decodeMapLen
unless (n == 2) $ fail "NodeHashSpec: must have exactly 2 entries"
decodeKey "algorithm"
alg <- decodeString
decodeKey "domain"
dom <- decodeString
pure (NodeHashSpec alg dom)
-- | Tree specification.
data TreeSpec = TreeSpec
{ treeCalculus :: Text
@@ -184,26 +390,6 @@ data TreeSpec = TreeSpec
, treeNodePayload :: Text
} deriving (Show, Eq, Ord, Generic)
toCBORTreeSpec :: TreeSpec -> Encoding
toCBORTreeSpec (TreeSpec calc hspec payload) =
cmkPairs
[ ("calculus", encText calc)
, ("nodeHash", toCBORNodeHashSpec hspec)
, ("nodePayload", encText payload)
]
treeSpecFromCBOR :: Decoder s TreeSpec
treeSpecFromCBOR = do
n <- decodeMapLen
unless (n == 3) $ fail "TreeSpec: must have exactly 3 entries"
decodeKey "calculus"
calc <- decodeString
decodeKey "nodeHash"
hspec <- nodeHashSpecFromCBOR
decodeKey "nodePayload"
payload <- decodeString
pure (TreeSpec calc hspec payload)
-- | Runtime specification.
data RuntimeSpec = RuntimeSpec
{ runtimeSemantics :: Text
@@ -212,53 +398,12 @@ data RuntimeSpec = RuntimeSpec
, runtimeCapabilities :: [Text]
} deriving (Show, Eq, Ord, Generic)
toCBORRuntimeSpec :: RuntimeSpec -> Encoding
toCBORRuntimeSpec (RuntimeSpec sem eval abi caps) =
cmkPairs
[ ("semantics", encText sem)
, ("evaluation", encText eval)
, ("abi", encText abi)
, ("capabilities", cakSeq (map encText caps))
]
runtimeSpecFromCBOR :: Decoder s RuntimeSpec
runtimeSpecFromCBOR = do
n <- decodeMapLen
unless (n == 4) $ fail "RuntimeSpec: must have exactly 4 entries"
decodeKey "semantics"
sem <- decodeString
decodeKey "evaluation"
eval <- decodeString
decodeKey "abi"
abi <- decodeString
decodeKey "capabilities"
clen <- decodeListLen
caps <- decodeListN decodeString clen
pure (RuntimeSpec sem eval abi caps)
-- | A root hash reference.
data BundleRoot = BundleRoot
{ rootHash :: MerkleHash
, rootRole :: Text
} deriving (Show, Eq, Ord, Generic)
toCBORBundleRoot :: BundleRoot -> Encoding
toCBORBundleRoot (BundleRoot h role) =
cmkPairs
[ ("hash", encBytes (merkleHashToRaw h))
, ("role", encText role)
]
bundleRootFromCBOR :: Decoder s BundleRoot
bundleRootFromCBOR = do
n <- decodeMapLen
unless (n == 2) $ fail "BundleRoot: must have exactly 2 entries"
decodeKey "hash"
hRaw <- decodeBytes
decodeKey "role"
role <- decodeString
pure (BundleRoot (rawToMerkleHash hRaw) role)
-- | An export entry.
data BundleExport = BundleExport
{ exportName :: Text
@@ -267,29 +412,6 @@ data BundleExport = BundleExport
, exportAbi :: Text
} deriving (Show, Eq, Ord, Generic)
toCBORBundleExport :: BundleExport -> Encoding
toCBORBundleExport (BundleExport name h kind abi) =
cmkPairs
[ ("name", encText name)
, ("root", encBytes (merkleHashToRaw h))
, ("kind", encText kind)
, ("abi", encText abi)
]
bundleExportFromCBOR :: Decoder s BundleExport
bundleExportFromCBOR = do
n <- decodeMapLen
unless (n == 4) $ fail "BundleExport: must have exactly 4 entries"
decodeKey "name"
name <- decodeString
decodeKey "root"
hRaw <- decodeBytes
decodeKey "kind"
kind <- decodeString
decodeKey "abi"
abi <- decodeString
pure (BundleExport name (rawToMerkleHash hRaw) kind abi)
-- | Optional package metadata.
data BundleMetadata = BundleMetadata
{ metadataPackage :: Maybe Text
@@ -299,33 +421,6 @@ data BundleMetadata = BundleMetadata
, metadataCreatedBy :: Maybe Text
} deriving (Show, Eq, Ord, Generic)
metadataFromCBOR :: Decoder s BundleMetadata
metadataFromCBOR = do
mlen <- decodeMapLen
entries <- decodeMapN decodeString decodeString mlen
let lookupText k = go k entries
go _ [] = Nothing
go k ((k', v):rest)
| k == k' = Just v
| otherwise = go k rest
pure BundleMetadata
{ metadataPackage = lookupText "package"
, metadataVersion = lookupText "version"
, metadataDescription = lookupText "description"
, metadataLicense = lookupText "license"
, metadataCreatedBy = lookupText "createdBy"
}
metadataToCBOR :: BundleMetadata -> Encoding
metadataToCBOR (BundleMetadata pkg ver desc lic by) =
let pairs =
maybe [] (\v -> [("package", encText v)]) pkg
++ maybe [] (\v -> [("version", encText v)]) ver
++ maybe [] (\v -> [("description", encText v)]) desc
++ maybe [] (\v -> [("license", encText v)]) lic
++ maybe [] (\v -> [("createdBy", encText v)]) by
in cmkPairs pairs
-- | The manifest: top-level bundle metadata.
data BundleManifest = BundleManifest
{ manifestSchema :: Text
@@ -338,43 +433,6 @@ data BundleManifest = BundleManifest
, manifestMetadata :: BundleMetadata
} deriving (Show, Eq, Generic)
manifestToCBOR :: BundleManifest -> Encoding
manifestToCBOR m =
cmkPairs
[ ("schema", encText (manifestSchema m))
, ("bundleType", encText (manifestBundleType m))
, ("tree", toCBORTreeSpec (manifestTree m))
, ("runtime", toCBORRuntimeSpec (manifestRuntime m))
, ("closure", toCBORClosure (manifestClosure m))
, ("roots", cakSeq (map toCBORBundleRoot (manifestRoots m)))
, ("exports", cakSeq (map toCBORBundleExport (manifestExports m)))
, ("metadata", metadataToCBOR (manifestMetadata m))
]
manifestFromCBOR :: Decoder s BundleManifest
manifestFromCBOR = do
n <- decodeMapLen
unless (n == 8) $ fail "BundleManifest: must have exactly 8 entries"
decodeKey "schema"
schema <- decodeString
decodeKey "bundleType"
bundleType <- decodeString
decodeKey "tree"
tree <- treeSpecFromCBOR
decodeKey "runtime"
runtime <- runtimeSpecFromCBOR
decodeKey "closure"
closure <- closureFromCBOR
decodeKey "roots"
rlen <- decodeListLen
roots <- decodeListN bundleRootFromCBOR rlen
decodeKey "exports"
elen <- decodeListLen
exports <- decodeListN bundleExportFromCBOR elen
decodeKey "metadata"
metadata <- metadataFromCBOR
pure (BundleManifest schema bundleType tree runtime closure roots exports metadata)
-- | Portable executable-object bundle.
--
-- Merkle node payloads remain the language-neutral executable core:
@@ -388,28 +446,12 @@ data Bundle = Bundle
, bundleManifestBytes :: ByteString
} deriving (Show, Eq)
-- ---------------------------------------------------------------------------
-- CBOR manifest serialization
-- ---------------------------------------------------------------------------
-- | Encode the manifest as canonical CBOR.
encodeManifest :: BundleManifest -> ByteString
encodeManifest m = BL.toStrict (toLazyByteString (manifestToCBOR m))
-- | Decode a manifest from CBOR bytes.
decodeManifest :: ByteString -> Either String BundleManifest
decodeManifest bs =
case deserialiseFromBytes manifestFromCBOR (BL.fromStrict bs) of
Right (rest, m)
| BS.null (BL.toStrict rest) -> Right m
| otherwise -> Left "trailing bytes after manifest CBOR"
Left (DeserialiseFailure _ msg) -> Left msg
-- ---------------------------------------------------------------------------
-- Bundle encoding
-- ---------------------------------------------------------------------------
-- | Encode a Bundle to portable Bundle v1 bytes.
-- The manifest is serialized using the fixed-order core + TLV tail format.
encodeBundle :: Bundle -> ByteString
encodeBundle bundle =
let nodeSection = encodeNodeSection (bundleNodes bundle)

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -41,7 +41,6 @@ executable tricu
, base16-bytestring
, base64-bytestring
, bytestring
, cborg
, cmdargs
, containers
, cryptonite
@@ -94,7 +93,6 @@ test-suite tricu-tests
, base16-bytestring
, base64-bytestring
, bytestring
, cborg
, cmdargs
, containers
, cryptonite