diff --git a/AGENTS.md b/AGENTS.md index bcb4467..e7a2594 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -128,114 +128,18 @@ hash = SHA256("arboricx.merkle.node.v1" <> 0x00 <> serialized_node) This is stored in SQLite via `ContentStore.hs`. Hash suffixes on identifiers (e.g., `foo_abc123...`) are validated: 16–64 hex characters (SHA256). -## 7. Arboricx Portable Wire Format +## 7. Arboricx Portable Bundles (`.arboricx`) -The **Arboricx wire format** (module `Wire.hs`) defines a portable binary bundle for exchanging Tree Calculus terms, their Merkle DAGs, and associated metadata. It is versioned and schema-driven. +Portable executable bundles are generated via `Wire.hs`. See `docs/arboricx-bundle-format.md` for the full binary format spec. -### Header +```bash +# Export a bundle from the content store +./result/bin/tricu export -o myterm.arboricx myterm +# Run a bundle (requires TRICU_DB_PATH) +./result/bin/tricu import -f lib/list.tri +TRICU_DB_PATH=/tmp/tricu.db ./result/bin/tricu export -o list_ops.arboricx append ``` -+------------------+-----------------+------------------+----------------+ -| Magic (8 bytes) | Major (2 bytes) | Minor (2 bytes) | Section Count | -| | | | (4 bytes) | -+------------------+-----------------+------------------+----------------+ -| Flags (8 bytes) | Dir Offset (8 bytes) -+------------------+-----------------+------------------+ -``` - -- **Magic**: `ARBORICX` (`0x41 0x52 0x42 0x4f 0x52 0x49 0x43 0x58`) -- **Header length**: 32 bytes -- **Major version**: `1` | **Minor version**: `0` - -### Section Directory - -Immediately follows the header. Each section entry is 60 bytes: - -``` -+------------------+------------------+-----------------+------------------+ -| Type (4 bytes) | Version (2 bytes)| Flags (2 bytes) | Compression (2) | -+------------------+------------------+-----------------+------------------+ -| Digest Algo (2) | Offset (8 bytes) | Length (8 bytes)| SHA256 digest (32)| -+------------------+------------------+-----------------+------------------+ -``` - -Known section types: - -| Type | Name | Required | Description | -|------|-----------|----------|-------------| -| 1 | manifest | Yes | JSON manifest metadata | -| 2 | nodes | Yes | Binary Merkle node payloads | - -### Section 1 — Manifest (JSON) - -The manifest describes the bundle's semantics, exports, and schema. Key fields: - -| Field | Value | Description | -|-------|-------|-------------| -| `schema` | `"arboricx.bundle.manifest.v1"` | Manifest schema version | -| `bundleType` | `"tree-calculus-executable-object"` | Bundle category | -| `tree.calculus` | `"tree-calculus.v1"` | Tree calculus version | -| `tree.nodeHash.algorithm` | `"sha256"` | Hash algorithm | -| `tree.nodeHash.domain` | `"arboricx.merkle.node.v1"` | Hash domain string | -| `tree.nodePayload` | `"arboricx.merkle.payload.v1"` | Payload encoding | -| `runtime.semantics` | `"tree-calculus.v1"` | Evaluation semantics | -| `runtime.abi` | `"arboricx.abi.tree.v1"` | Runtime ABI | -| `closure` | `"complete"` | Bundle must be a complete DAG | -| `roots` | `[{"hash": "...", "role": "..."}]` | Named root hashes | -| `exports` | `[{"name": "...", "root": "..."}]` | Export aliases for roots | -| `metadata.createdBy` | `"arboricx"` | Originator | - -### Section 2 — Nodes (Binary) - -``` -+------------------+-------------------+-------------------+-----------------+ -| Node Count (8) | Hash (32 bytes) | Payload Len (4) | Payload (N) | -+------------------+-------------------+-------------------+-----------------+ -``` - -Each node entry contains: -- 32-byte Merkle hash (hex-encoded in identifiers, raw in binary) -- 4-byte big-endian payload length -- N bytes of serialized node payload (`0x00` for Leaf, `0x01 || hash` for Stem, `0x02 || left || right` for Fork) - -### Bundle verification flow - -1. Check magic bytes -2. Validate major version -3. Parse section directory -4. For each section: verify SHA256 digest against actual bytes -5. Decode JSON manifest -6. Decode binary node entries into Merkle DAG -7. Verify all root hashes present in manifest exist in node map -8. Verify export root hashes present -9. Verify children references are complete (no dangling nodes) -10. Reject unknown critical sections - -### Data types (Wire.hs) - -| Type | Purpose | -|------|---------| -| `Bundle` | Top-level bundle: version, roots, nodes map, manifest | -| `BundleManifest` | JSON metadata: schema, tree spec, runtime spec, roots, exports | -| `TreeSpec` | Tree calculus version + hash algorithm + payload encoding | -| `NodeHashSpec` | Hash algorithm and domain string | -| `RuntimeSpec` | Semantics, evaluation order, ABI, capabilities | -| `BundleRoot` | Root hash + role (`"default"` or `"root"`) | -| `BundleExport` | Export name + root hash + kind + ABI | -| `BundleMetadata` | Optional package, version, description, license, createdBy | -| `ClosureMode` | `ClosureComplete` or `ClosurePartial` | - -### Key functions - -| Function | Signature | Purpose | -|----------|-----------|---------| -| `encodeBundle` | `Bundle → ByteString` | Serialize bundle to wire bytes | -| `decodeBundle` | `ByteString → Either String Bundle` | Parse wire bytes into Bundle | -| `verifyBundle` | `Bundle → Either String ()` | Validate DAG, manifest, roots | -| `collectReachableNodes` | `Connection → MerkleHash → IO [(MerkleHash, ByteString)]` | Traverse DAG from root | -| `exportBundle` | `Connection → [MerkleHash] → IO ByteString` | Build bundle from content store | -| `exportNamedBundle` | `Connection → [(Text, MerkleHash)] → IO ByteString` | Build with named roots | -| `importBundle` | `Connection → ByteString → IO [MerkleHash]` | Import bundle into content store | ## 8. Directory Layout @@ -273,12 +177,12 @@ tricu/ ## 9. JS Arboricx Runtime A JavaScript implementation of the Arboricx portable bundle runtime lives in `ext/js/`. -It is a reference implementation — not a tricu source parser. It reads `.tri.bundle` files produced by the Haskell toolchain, verifies Merkle node hashes, reconstructs tree values, and reduces them. +It is a reference implementation — not a tricu source parser. It reads `.arboricx` files produced by the Haskell toolchain, verifies Merkle node hashes, reconstructs tree values, and reduces them. From project root: ```bash -node ext/js/src/cli.js inspect test/fixtures/id.tri.bundle -node ext/js/src/cli.js run test/fixtures/true.tri.bundle +node ext/js/src/cli.js inspect test/fixtures/id.arboricx +node ext/js/src/cli.js run test/fixtures/true.arboricx ``` The JS runtime implements: diff --git a/docs/arboricx-bundle-cbor-v1.md b/docs/arboricx-bundle-cbor-v1.md deleted file mode 100644 index d2cf1bc..0000000 --- a/docs/arboricx-bundle-cbor-v1.md +++ /dev/null @@ -1,339 +0,0 @@ -# Arboricx Portable Bundle v1 (CBOR Manifest Profile) - -Status: **Draft, implementation-aligned** (derived from `src/Wire.hs` as of 2026-05-07) - -This document specifies the **actual on-wire format and validation behavior** currently implemented by `tricu` for Arboricx bundles, with a focus on the newer CBOR manifest path. - ---- - -## 1. Scope - -This profile defines: - -1. The binary container envelope (header + section directory + section payloads). -2. The CBOR manifest section format. -3. The Merkle node section format. -4. Decode/verify/import behavior in `Wire.hs`. -5. Known gaps and sane resolutions. - -Non-goals: - -- tricu source parsing/lambda elimination/module semantics. -- Signature systems / trust policy. -- Compression codecs beyond `none`. - ---- - -## 2. Container format - -A bundle is a byte stream: - -``` -[32-byte header] -[section directory: section_count * 60 bytes] -[section payload bytes...] -``` - -### 2.1 Header (32 bytes) - -| Field | Size | Encoding | Value / Notes | -|---|---:|---|---| -| Magic | 8 | raw bytes | `41 52 42 4f 52 49 58 00` (`"ARBORICX"`) | -| Major | 2 | u16 BE | Must be `1` | -| Minor | 2 | u16 BE | Currently `0` | -| SectionCount | 4 | u32 BE | Number of section directory entries | -| Flags | 8 | u64 BE | Currently emitted as `0`; not interpreted | -| DirectoryOffset | 8 | u64 BE | Offset of section directory (currently `32`) | - -Reader behavior: -- Reject if total bytes < 32. -- Reject bad magic. -- Reject major != 1. - -### 2.2 Section directory entry (60 bytes each) - -| Field | Size | Encoding | Notes | -|---|---:|---|---| -| Type | 4 | u32 BE | e.g. 1=manifest, 2=nodes | -| Version | 2 | u16 BE | Currently emitted as `1`; not enforced on read | -| Flags | 2 | u16 BE | bit0 = critical | -| Compression | 2 | u16 BE | `0` = none (required) | -| DigestAlgorithm | 2 | u16 BE | `1` = SHA-256 (required) | -| Offset | 8 | u64 BE | Absolute byte offset | -| Length | 8 | u64 BE | Section payload length | -| Digest | 32 | raw bytes | SHA-256 of section bytes | - -Reader behavior: -- Reject unknown **critical** section types. -- Reject compression != 0. -- Reject digest algorithm != 1. -- Reject out-of-bounds sections. -- Reject digest mismatch. - -### 2.3 Required section types - -| Type | Name | Required | -|---:|---|---| -| 1 | manifest | yes | -| 2 | nodes | yes | - -Decode currently rejects duplicate section type 1 or 2. - ---- - -## 3. Manifest section (CBOR) - -Manifest bytes are CBOR-encoded map data (using `cborg`). - -### 3.1 Top-level manifest schema - -Top-level map has **exactly 8 keys** in this exact decode order in current implementation: - -1. `schema` (text) -2. `bundleType` (text) -3. `tree` (map) -4. `runtime` (map) -5. `closure` (text: `"complete"|"partial"`) -6. `roots` (array) -7. `exports` (array) -8. `metadata` (map) - -> Important: Current decoder is order-strict; it expects keys in this sequence. - -### 3.2 Nested structures - -#### `tree` map (3 keys, order-strict) -- `calculus`: text -- `nodeHash`: map -- `nodePayload`: text - -`nodeHash` map (2 keys, order-strict): -- `algorithm`: text -- `domain`: text - -#### `runtime` map (4 keys, order-strict) -- `semantics`: text -- `evaluation`: text -- `abi`: text -- `capabilities`: array(text) - -#### `roots` array of maps -Each root map has 2 keys (order-strict): -- `hash`: bytes (raw 32-byte hash payload encoded as CBOR byte string) -- `role`: text - -#### `exports` array of maps -Each export map has 4 keys (order-strict): -- `name`: text -- `root`: bytes (32-byte hash) -- `kind`: text -- `abi`: text - -#### `metadata` map -Flexible key set; decoded as map(text -> text), then projected into optional fields: -- `package` -- `version` -- `description` -- `license` -- `createdBy` - -Unknown metadata keys are ignored. - -### 3.3 Default emitted manifest values - -Writers in `Wire.hs` currently emit: - -- `schema = "arboricx.bundle.manifest.v1"` -- `bundleType = "tree-calculus-executable-object"` -- `tree.calculus = "tree-calculus.v1"` -- `tree.nodeHash.algorithm = "sha256"` -- `tree.nodeHash.domain = "arboricx.merkle.node.v1"` -- `tree.nodePayload = "arboricx.merkle.payload.v1"` -- `runtime.semantics = "tree-calculus.v1"` -- `runtime.evaluation = "normal-order"` -- `runtime.abi = "arboricx.abi.tree.v1"` -- `runtime.capabilities = []` -- `closure = "complete"` -- `metadata.createdBy = "arboricx"` - ---- - -## 4. Nodes section (binary) - -Node section payload layout: - -``` -node_count: u64 BE -repeat node_count times: - hash: 32 bytes - payload_len: u32 BE - payload: payload_len bytes -``` - -Node payload grammar: - -- `0x00` => Leaf -- `0x01 || child_hash(32)` => Stem -- `0x02 || left_hash(32)||right(32)` => Fork - -Section decoder rejects: -- duplicate node hashes, -- truncated entries, -- payload overruns, -- trailing bytes after final node. - ---- - -## 5. Verification behavior (`verifyBundle`) - -`verifyBundle` enforces all of: - -1. bundle version >= 1. -2. bundle has at least one node. -3. manifest constants match hardcoded v1 values (schema/type/calculus/hash algo/domain/payload/runtime semantics/ABI). -4. runtime capabilities must be empty. -5. closure must be `complete`. -6. manifest has at least one root and one export. -7. root sets in `bundleRoots` and `manifest.roots` must match exactly. -8. each root and export root exists in node map. -9. each node payload deserializes and re-hashes to declared node hash. -10. all referenced child hashes exist. -11. full closure reachability from roots succeeds. - -`importBundle` runs decode + verify before storing nodes. - ---- - -## 6. Export/import semantics - -### 6.1 Export - -`exportNamedBundle`: -- Traverses reachable nodes for each requested root hash. -- Builds node map. -- Builds default manifest and CBOR bytes. -- Emits two sections (manifest, nodes). - -`exportBundle` auto-names exports: -- 1 root => `root` -- N>1 => `root0`, `root1`, ... - -### 6.2 Import - -`importBundle`: -1. Decode bundle. -2. Verify bundle. -3. Insert all node payloads into content store. -4. For each manifest export: reconstruct tree by export root and store name binding in DB. -5. Return bundle root list. - ---- - -## 7. Determinism properties - -Current implementation is deterministic for identical logical input because: -- Node map serialized in ascending hash order (`Map.toAscList`). -- Field order in manifest encoding is fixed by code. -- Section ordering is fixed: manifest then nodes. - -So repeated exports of same roots produce byte-identical bundles. - ---- - -## 8. Known gaps and sane resolutions - -These are important design gaps visible from current code. - -### Gap A: Node hash domain mismatch risk (critical) - -Status: **resolved in current codebase**. - -What was wrong: -- Manifest declared `tree.nodeHash.domain = "arboricx.merkle.node.v1"`. -- Hashing implementation previously used `"tricu.merkle.node.v1"`. - -Current state: -- Haskell hashing now uses `"arboricx.merkle.node.v1"`. -- JS reference runtime hashing now uses `"arboricx.merkle.node.v1"`. -- JS manifest validation now requires `"arboricx.merkle.node.v1"`. - -Remaining recommendation: -- Keep hash-domain constants centralized/shared to prevent future drift. -- Add explicit test vectors for Leaf/Stem/Fork hashes under the Arboricx domain. - -### Gap B: CBOR decode is order-strict, not generic-map tolerant - -Observed: -- Decoder expects exact key order for most maps. - -Impact: -- Another canonical CBOR writer that reorders keys may decode-fail even if semantically equivalent. - -Sane resolution: -- For v1 compatibility, decode maps as unordered key/value collections, require key presence and types, and reject unknown keys only where desired. -- Keep writer deterministic, but relax reader. - -### Gap C: “Canonical CBOR” claim is stronger than implementation - -Observed: -- Writer uses fixed order but does not explicitly sort keys per RFC 8949 canonical ordering rules. - -Sane resolution: -- Either (a) rename as “deterministic CBOR” profile, or (b) implement explicit canonical key ordering and canonical-length/minimal integer forms checks. - -### Gap D: Extra section preservation - -Observed: -- Decoder tolerates unknown non-critical sections, but `Bundle` model/encoder drops them on re-encode. - -Sane resolution: -- Add `bundleExtraSections :: [SectionEntry+Bytes]` if round-trip preservation is desired. - -### Gap E: Section version not enforced - -Observed: -- Section entry `Version` is parsed but unused. - -Sane resolution: -- Enforce known version matrix (e.g., manifest v1, nodes v1), or explicitly document “advisory only”. - -### Gap F: Runtime capability policy is hard fail - -Observed: -- Any non-empty capabilities list is rejected. - -Sane resolution: -- Keep strict for now, but define capability negotiation strategy for v1.1+ (unknown capabilities => reject unless explicitly allowed by host policy). - -### Gap G: Error handling style in import/export path - -Observed: -- Several paths throw `error` for malformed data/store misses. - -Sane resolution: -- Return `Either`-style typed errors through public API (`decode`, `verify`, `import`), reserve exceptions for truly internal faults. - ---- - -## 9. Conformance checklist (v1 current) - -A conforming v1 reader/writer for this profile should: - -- Implement the 32-byte header and 60-byte section records exactly. -- Support required sections 1 and 2. -- Verify section digests with SHA-256. -- Decode/encode manifest CBOR matching the field model above. -- Parse nodes section and validate node payload structure. -- Recompute and verify node hashes. -- Enforce complete closure for roots. -- Enforce manifest/runtime constants used by v1. - ---- - -## 10. Suggested follow-up docs - -To stabilize interoperability, add: - -1. `docs/arboricx-bundle-test-vectors.md` (golden header/manifest/nodes + expected hashes). -2. `docs/arboricx-bundle-errors.md` (normative error codes/strings). -3. `docs/arboricx-bundle-evolution.md` (rules for minor/major upgrades, capability negotiation, extra sections). diff --git a/docs/arboricx-bundle-format.md b/docs/arboricx-bundle-format.md new file mode 100644 index 0000000..f567f6a --- /dev/null +++ b/docs/arboricx-bundle-format.md @@ -0,0 +1,419 @@ +# Arboricx Portable Bundle Format Specification + +**Version:** 0.1 +**Status:** Exploratory +**Author:** A range of slopmachines guided by James Eversole +**Human Review Status:** 5 minute scan-through - this is an evolving and malleable document + +The Arboricx Portable Bundle is a self-contained, content-addressed binary format for distributing Tree Calculus programs and their associated Merkle DAGs. It provides: + +- A fixed binary container with header, section directory, and typed sections +- A language-neutral Merkle node layer for content-addressed tree values +- A fixed-order binary manifest for semantic metadata, exports, and optional extensions + +## Table of Contents + +1. [Top-Level Container Layout](#1-top-level-container-layout) +2. [Header](#2-header) +3. [Section Directory](#3-section-directory) +4. [Section: Manifest (type 1)](#4-section-manifest-type-1) +5. [Section: Nodes (type 2)](#5-section-nodes-type-2) +6. [Merkle Node Payload Format](#6-merkle-node-payload-format) +7. [Merkle Hash Computation](#7-merkle-hash-computation) +8. [Tree Calculus Reduction Semantics](#8-tree-calculus-reduction-semantics) +9. [Binary Primitives](#9-binary-primitives) +10. [Bundle Verification](#10-bundle-verification) +11. [Known Section Types](#11-known-section-types) + +--- + +## 1. Top-Level Container Layout + +An Arboricx bundle is a flat binary blob with the following layout: + +``` ++------------------+------------------+------------------+------------------+ +| Header | Section Directory| Manifest Section | Nodes Section | +| (32 bytes) | (N × 60 bytes) | (variable) | (variable) | ++------------------+------------------+------------------+------------------+ +``` + +The container uses **big-endian** byte order for all multi-byte integers. + +Total bundle size = 32 + (sectionCount × 60) + manifestSize + nodesSize + +--- + +## 2. Header + +| Offset | Size | Field | Description | +|--------|------|-------|-------------| +| 0 | 8 bytes | Magic | ASCII `"ARBORICX"` (`0x41 0x52 0x42 0x4F 0x52 0x49 0x43 0x58`) | +| 8 | 2 bytes | Major version | `u16` BE. Currently `1` | +| 10 | 2 bytes | Minor version | `u16` BE. Currently `0` | +| 12 | 4 bytes | Section count | `u32` BE. Number of entries in the section directory | +| 16 | 8 bytes | Flags | `u64` BE. Reserved; currently all zeros | +| 24 | 8 bytes | Directory offset | `u64` BE. Byte offset from the start of the bundle to the section directory | + +**Constraints:** +- Major version must be `1`. Bundles with unsupported major versions are rejected. +- The directory offset must point to a valid location within the bundle. +- The directory offset is always `32` for bundles with the current layout (header immediately followed by the directory). + +--- + +## 3. Section Directory + +The section directory is an array of `N` entries, where `N` is the section count from the header. Each entry is exactly **60 bytes**. + +| Offset (within entry) | Size | Field | Description | +|----------------------|------|-------|-------------| +| 0 | 4 bytes | Type | `u32` BE. Section type identifier (see [Known Section Types](#11-known-section-types)) | +| 4 | 2 bytes | Version | `u16` BE. Section-specific version | +| 6 | 2 bytes | Flags | `u16` BE. Bit flags: bit 0 (`0x0001`) = critical section | +| 8 | 2 bytes | Compression | `u16` BE. Compression codec (currently only `0` = none) | +| 10 | 2 bytes | Digest algorithm | `u16` BE. Hash algorithm (currently only `1` = SHA-256) | +| 12 | 8 bytes | Offset | `u64` BE. Byte offset from the start of the bundle to the section data | +| 20 | 8 bytes | Length | `u64` BE. Length of the section data in bytes | +| 28 | 32 bytes | SHA-256 digest | Raw digest of the section data | + +**Verification:** +- Unknown critical sections (flags & `0x0001`) are rejected. +- Compression must be `0` (none). +- Digest algorithm must be `1` (SHA-256). +- The SHA-256 digest in the directory entry must match `SHA256(section_data)`. + +--- + +## 4. Section: Manifest (type 1) + +The manifest is a binary encoding of bundle metadata. It uses a **fixed-order core** layout followed by an optional **TLV tail** for extensibility. + +### 4.1 Format + +``` +Manifest = + magic 8 bytes "ARBMNFST" + major u16 BE Manifest major version (1) + minor u16 BE Manifest minor version (0) + + schema string Length-prefixed UTF-8 text + bundleType string Length-prefixed UTF-8 text + + treeCalculus string Length-prefixed UTF-8 text + treeHashAlgorithm string Length-prefixed UTF-8 text + treeHashDomain string Length-prefixed UTF-8 text + treeNodePayload string Length-prefixed UTF-8 text + + runtimeSemantics string Length-prefixed UTF-8 text + runtimeEvaluation string Length-prefixed UTF-8 text + runtimeAbi string Length-prefixed UTF-8 text + capabilityCount u32 BE Number of capability strings + capabilities string[] Array of length-prefixed UTF-8 capability strings + + closure u8 0 = complete, 1 = partial + rootCount u32 BE Number of root entries + roots Root[] Array of root entries + exportCount u32 BE Number of export entries + exports Export[] Array of export entries + + metadataFieldCount u32 BE Number of metadata TLV entries + metadataFields TLV[] Metadata tag-value entries + extensionFieldCount u32 BE Number of extension TLV entries + extensionFields TLV[] Extension tag-value entries (skipped by parsers) +``` + +**Trailing bytes after the manifest must be zero** (no leftover data). + +### 4.2 String Format + +Every `string` field uses the same encoding: + +``` +string = + length u32 BE Number of UTF-8 bytes in the string (not the number of characters) + bytes byte[length] UTF-8 encoded string content +``` + +The length field carries the byte count, so parsers can skip strings without decoding UTF-8. + +### 4.3 Root Entry + +``` +Root = + hash 32 bytes Raw SHA-256 hash of the Merkle node + role string Length-prefixed UTF-8 text ("default" for the first root, "root" for others) +``` + +The hash is stored as **raw bytes** (not hex-encoded). It corresponds to the Merkle hash of the node. + +### 4.4 Export Entry + +``` +Export = + name string Length-prefixed UTF-8 text (export identifier) + root 32 bytes Raw SHA-256 hash of the Merkle node + kind string Length-prefixed UTF-8 text (currently "term") + abi string Length-prefixed UTF-8 text (ABI string) +``` + +### 4.5 TLV Entry + +``` +TLV = + tag u16 BE Tag identifier (type) + length u32 BE Number of bytes in the value + value byte[length] Raw bytes +``` + +TLV entries support variable-length values and are skippable by parsers that do not recognize a tag: read the `u32` length and advance by `2 + 4 + length` bytes. + +### 4.6 Metadata Tags + +| Tag | Name | Value | +|-----|------|-------| +| 1 | package | UTF-8 text: package name | +| 2 | version | UTF-8 text: version string | +| 3 | description | UTF-8 text: description | +| 4 | license | UTF-8 text: license identifier or text | +| 5 | createdBy | UTF-8 text: creator identifier | + +Unknown metadata tags are ignored. Unknown extension tags are skipped by length. + +### 4.7 Semantic Constraints + +A valid bundle manifest must satisfy: + +| Constraint | Value | +|-----------|-------| +| `schema` | `"arboricx.bundle.manifest.v1"` | +| `bundleType` | `"tree-calculus-executable-object"` | +| `treeCalculus` | `"tree-calculus.v1"` | +| `treeHashAlgorithm` | `"sha256"` | +| `treeHashDomain` | `"arboricx.merkle.node.v1"` | +| `treeNodePayload` | `"arboricx.merkle.payload.v1"` | +| `runtimeSemantics` | `"tree-calculus.v1"` | +| `runtimeAbi` | `"arboricx.abi.tree.v1"` | +| `runtimeCapabilities` | Empty array | +| `closure` | `0` (complete) | +| `rootCount` | At least 1 | +| `exportCount` | At least 1 | +| Export names | Non-empty | +| Export roots | Non-empty (32 bytes each) | + +--- + +## 5. Section: Nodes (type 2) + +The nodes section contains all Merkle DAG nodes referenced by the manifest. It is a sequence of node entries preceded by a count. + +``` +NodesSection = + nodeCount u64 BE Total number of node entries + entries NodeEntry[] +``` + +Each node entry: + +``` +NodeEntry = + hash 32 bytes Raw SHA-256 hash of this node + payloadLen u32 BE Length of the payload in bytes + payload byte[payloadLen] Node payload (see Section 6) +``` + +The node count is `u64` to support large bundles. Entries are stored in the order produced by the exporter (typically sorted by hash for determinism). + +--- + +## 6. Merkle Node Payload Format + +Each node in the Merkle DAG is one of three types. The payload is a single byte type tag followed by hash references: + +### Leaf + +``` +Payload = 0x00 +``` + +A leaf has no children. The payload is exactly 1 byte. + +### Stem + +``` +Payload = 0x01 || child_hash (32 bytes raw) +``` + +A stem has exactly one child. The payload is 33 bytes. + +### Fork + +``` +Payload = 0x02 || left_hash (32 bytes raw) || right_hash (32 bytes raw) +``` + +A fork has exactly two children. The payload is 65 bytes. + +**Validation:** +- Leaf payloads must be exactly 1 byte (`0x00`). +- Stem payloads must be exactly 33 bytes. +- Fork payloads must be exactly 65 bytes. +- Unknown type bytes are rejected. + +--- + +## 7. Merkle Hash Computation + +Each node is identified by a SHA-256 hash of its canonical payload: + +``` +hash = SHA256( domain_tag || 0x00 || payload ) +``` + +Where: + +| Component | Value | +|-----------|-------| +| `domain_tag` | `"arboricx.merkle.node.v1"` as UTF-8 bytes | +| Separator | `0x00` (one zero byte) | +| `payload` | The node's canonical serialization from Section 6 | + +**Examples:** + +- **Leaf:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x00)` +- **Stem:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x01 || child_hash_bytes)` +- **Fork:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x02 || left_hash_bytes || right_hash_bytes)` + +The resulting SHA-256 hash is stored as a hex-encoded string in the manifest (64 hex characters). Within the nodes section, it is stored as raw bytes. + +--- + +## 8. Tree Calculus Reduction Semantics + +The bundle represents a **Tree Calculus** term as a Merkle DAG. The reduction rules are: + +### Apply Rules + +``` +apply(Fork(Leaf, a), _) = a +apply(Fork(Stem(a), b), c) = apply(apply(a, c), apply(b, c)) +apply(Fork(Fork, _, _), Leaf) = left of inner Fork +apply(Fork(Fork, _, _), Stem) = right of inner Fork +apply(Fork(Fork, _, _), Fork) = apply(apply(c, u), v) where c = Fork(u, v) +apply(Leaf, b) = Stem(b) +apply(Stem(a), b) = Fork(a, b) +``` + +### Internal Representation + +In the reduction engine, Fork nodes use a `[right, left]` (stack) ordering: +- `Fork = [right_child, left_child]` +- `Stem = [child]` +- `Leaf = []` + +This ordering supports stack-based reduction: pop two terms, apply, push results back. + +### Closure + +The bundle declares `closure = "complete"`, meaning all nodes reachable from export roots are present in the nodes section. No external references exist. + +--- + +## 9. Binary Primitives + +All multi-byte integers use **big-endian** byte order. + +### u16 (2 bytes) + +``` +byte[0] | byte[1] +value = (byte[0] << 8) | byte[1] +``` + +### u32 (4 bytes) + +``` +byte[0] | byte[1] | byte[2] | byte[3] +value = (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3] +``` + +### u64 (8 bytes) + +``` +byte[0] ... byte[7] +value = (byte[0] << 56) | ... | byte[7] +``` + +### u8 (1 byte) + +A single byte, value `0-255`. + +--- + +## 10. Bundle Verification + +A complete bundle verification proceeds in this order: + +1. **Magic check:** First 8 bytes must be `"ARBORICX"`. +2. **Version check:** Major version must be `1`. +3. **Section directory:** Parse all entries; reject unknown critical sections. +4. **Digest verification:** For each section, compute `SHA256(section_data)` and compare with the digest in the directory entry. +5. **Manifest parsing:** Decode the fixed-order manifest; validate semantic constraints. +6. **Node section:** Parse all node entries; reject duplicates. +7. **Root verification:** All root hashes from the manifest must exist in the node map. +8. **Export verification:** All export root hashes must exist in the node map. +9. **Node hash verification:** For each node, compute `SHA256(domain || 0x00 || payload)` and compare with the stored hash. +10. **Children verification:** For each Stem/Fork node, both child hashes must exist in the node map. +11. **Closure verification:** Starting from each root hash, traverse the DAG and confirm all reachable nodes are present. + +--- + +## 11. Known Section Types + +| Type | Name | Required | Version | Description | +|------|------|----------|---------|-------------| +| 1 | Manifest | Yes | 1 | Bundle metadata in fixed-order binary format | +| 2 | Nodes | Yes | 1 | Merkle DAG node entries | + +Unknown section types are permitted if not marked as critical (flags bit 0 is not set). + +--- + +## Appendix A: Complete Example Layout (id.arboricx) + +A minimal `id.arboricx` bundle has: + +``` ++---------------------------------------------------+ +| Header (32 bytes) | +| Magic: "ARBORICX" | +| Major: 1, Minor: 0 | +| Section count: 2 | +| Flags: 0 | +| Dir offset: 32 | ++---------------------------------------------------+ +| Section Directory (120 bytes = 2 × 60) | +| Entry 0: type=1 (manifest), offset=152, len=375 | +| Entry 1: type=2 (nodes), offset=527, len=284 | ++---------------------------------------------------+ +| Manifest Section (375 bytes) | +| Magic: "ARBMNFST" | +| Version: 1.0 | +| Core strings (schema, bundleType, tree spec, | +| runtime spec, capabilities, closure, roots, | +| exports, metadata TLVs, extension fields) | ++---------------------------------------------------+ +| Nodes Section (284 bytes) | +| Node count: 2 | +| Node entry 1: hash + payload (Leaf) | +| Node entry 2: hash + payload (Fork) | ++---------------------------------------------------+ +``` + +The manifest section starts at byte 152 (0x98) and the nodes section at byte 527 (0x20F). + +--- + +## Appendix B: File Extension + +Bundles produced by the `tricu` tool use the `.arboricx` file extension. The `.tri` extension is used for plain source files; the `.arboricx` extension identifies the portable binary format. diff --git a/ext/js/src/bundle.js b/ext/js/src/bundle.js index 911b1a5..1179ac7 100644 --- a/ext/js/src/bundle.js +++ b/ext/js/src/bundle.js @@ -18,12 +18,12 @@ * Offset 8B u64 BE * Length 8B u64 BE * SHA256Digest 32B raw - * Manifest: canonical CBOR-encoded map (cborg output from Haskell) + * Manifest: fixed-order core + TLV tail (ARBMNFST magic) * Nodes: binary section */ import { createHash } from "node:crypto"; -import { decodeCbor } from "./cbor.js"; +import { decodeManifest } from "./manifest.js"; // ── Constants ─────────────────────────────────────────────────────────────── @@ -173,37 +173,12 @@ export function parseBundle(buffer) { } /** - * Post-process a CBOR-decoded manifest to normalize hash fields - * from raw bytes to hex strings (matching the old JSON wire format). - */ -function normalizeManifest(raw) { - const tree = raw.tree; - if (tree && tree.nodeHash && tree.nodeHash.domain) { - tree.nodeHash.domain = tree.nodeHash.domain; - } - - // Convert root hashes from raw bytes to hex - const roots = (raw.roots || []).map((r) => ({ - ...r, - hash: r.hash instanceof Uint8Array ? Buffer.from(r.hash).toString("hex") : r.hash, - })); - - // Convert export root hashes from raw bytes to hex - const exports = (raw.exports || []).map((e) => ({ - ...e, - root: e.root instanceof Uint8Array ? Buffer.from(e.root).toString("hex") : e.root, - })); - - return { ...raw, roots, exports }; -} - -/** - * Convenience: parse and return the manifest from CBOR. + * Convenience: parse and return the manifest from the fixed-order binary format. */ export function parseManifest(buffer) { const bundle = parseBundle(buffer); const manifestEntry = bundle.sections.get(SECTION_MANIFEST); - return normalizeManifest(decodeCbor(manifestEntry.data)); + return decodeManifest(manifestEntry.data); } /** diff --git a/ext/js/src/cbor.js b/ext/js/src/cbor.js deleted file mode 100644 index 352619f..0000000 --- a/ext/js/src/cbor.js +++ /dev/null @@ -1,130 +0,0 @@ -/** - * cbor.js — Minimal CBOR decoder for the Arboricx manifest format. - * - * Decodes the canonical CBOR produced by the Haskell cborg library: - * - Maps: major type 5 (0xa0 + length) - * - Arrays: major type 4 (0x80 + length) - * - Text strings: major type 3, UTF-8 encoded - * - Byte strings: major type 2 - * - Unsigned ints: major type 0 - * - Simple values: 0xc2 = false, 0xc3 = true - * - * Only covers the subset needed for the manifest. - */ - -// ── Decoding state ────────────────────────────────────────────────────────── - -/** - * @param {Buffer} data - * @returns {number} remaining buffer - */ -function makeDecoder(data) { - let offset = 0; - - return { - /** @returns {number} current offset */ - getPos() { return offset; }, - - /** @returns {number} remaining bytes */ - remaining() { return data.length - offset; }, - - /** @returns {number} total length */ - length() { return data.length; }, - - /** Read N bytes and advance */ - read(n) { - if (offset + n > data.length) { - throw new Error(`CBOR read: expected ${n} bytes, ${data.length - offset} remaining at offset ${offset}`); - } - const slice = data.slice(offset, offset + n); - offset += n; - return slice; - }, - - /** Read a single byte */ - readByte() { - if (offset >= data.length) { - throw new Error(`CBOR readByte: no bytes remaining at offset ${offset}`); - } - return data[offset++]; - }, - }; -} - -// ── CBOR helpers ──────────────────────────────────────────────────────────── - -/** - * Read a CBOR length (major type initial byte encodes length for values < 24). - * For 24+, reads additional bytes per spec. - * @returns {number} - */ -function cborReadLength(dec, startByte) { - const additional = startByte & 0x1f; - if (additional < 24) return additional; - if (additional === 24) return dec.read(1)[0]; - if (additional === 25) return dec.read(2).readUint16BE(0); - if (additional === 26) return dec.read(4).readUint32BE(0); - throw new Error(`CBOR: unsupported additional info ${additional}`); -} - -// ── Top-level decode ──────────────────────────────────────────────────────── - -/** - * Decode a single CBOR value from buffer bytes. - * @param {Buffer} buf - * @returns {*} - */ -export function decodeCbor(buf) { - const dec = makeDecoder(buf); - const result = cborDecode(dec); - return result; -} - -function cborDecode(dec) { - const first = dec.readByte(); - const major = (first >> 5) & 0x07; - const info = first & 0x1f; - - switch (major) { - case 0: // unsigned int - case 1: // negative int - return cborReadLength(dec, first); - - case 2: // byte string - return dec.read(cborReadLength(dec, first)); - - case 3: // text string (UTF-8) - const len = cborReadLength(dec, first); - return dec.read(len).toString("utf-8"); - - case 4: // array - const arrLen = cborReadLength(dec, first); - const arr = []; - for (let i = 0; i < arrLen; i++) { - arr.push(cborDecode(dec)); - } - return arr; - - case 5: // map - const mapLen = cborReadLength(dec, first); - const map = {}; - for (let i = 0; i < mapLen; i++) { - const key = cborDecode(dec); - const val = cborDecode(dec); - map[key] = val; - } - return map; - - case 7: // simple values / floats - if (info === 20) return false; - if (info === 21) return true; - if (info === 22) return null; // undefined - if (info === 23) return null; // break (shouldn't appear in definite-length) - // 0xf9-fb are half/float/double floats — not used by our writer - throw new Error(`CBOR: unsupported simple value ${info}`); - - default: - // Tags (major 6) and break (0xff) — not used in our manifest - throw new Error(`CBOR: unsupported major type ${major}, info ${info}`); - } -} diff --git a/ext/js/src/manifest.js b/ext/js/src/manifest.js index f6bef29..4a55b3b 100644 --- a/ext/js/src/manifest.js +++ b/ext/js/src/manifest.js @@ -1,13 +1,220 @@ /** - * manifest.js — Minimal manifest parsing and export lookup. + * manifest.js — Fixed-order manifest parsing and export lookup. * - * The manifest is a JSON object with fields: - * schema, bundleType, tree, runtime, closure, roots, exports, - * imports, sections, metadata + * The manifest binary format (ManifestV1): + * magic(8) + major(u16) + minor(u16) + * + schema(string) + bundleType(string) + * + treeCalculus(string) + treeHashAlgorithm(string) + treeHashDomain(string) + treeNodePayload(string) + * + runtimeSemantics(string) + runtimeEvaluation(string) + runtimeAbi(string) + * + capabilityCount(u32) + capabilities(string[]) + * + closure(u8) + * + rootCount(u32) + roots[] + * + exportCount(u32) + exports[] + * + metadataFieldCount(u32) + metadataTLVs[] + * + extensionFieldCount(u32) + extensionTLVs[] * - * We parse only what we need for runtime entrypoint selection. + * String format: u32 BE length + UTF-8 bytes. + * Root: 32 bytes raw hash + role(string). + * Export: name(string) + 32 bytes raw root hash + kind(string) + abi(string). + * TLV: u16 tag + u32 length + value bytes. */ +// ── Constants ─────────────────────────────────────────────────────────────── + +const MANIFEST_MAGIC = "ARBMNFST"; +const MANIFEST_MAJOR = 1; +const MANIFEST_MINOR = 0; + +// Metadata TLV tags +const TAG_PACKAGE = 1; +const TAG_VERSION = 2; +const TAG_DESCRIPTION = 3; +const TAG_LICENSE = 4; +const TAG_CREATED_BY = 5; + +// Closure bytes +const CLOSURE_COMPLETE = 0; +const CLOSURE_PARTIAL = 1; + +// ── Binary helpers ────────────────────────────────────────────────────────── + +function u16(buf, off) { + if (off + 2 > buf.length) throw new Error("manifest: not enough bytes for u16"); + return { value: buf.readUint16BE(off), next: off + 2 }; +} + +function u32(buf, off) { + if (off + 4 > buf.length) throw new Error("manifest: not enough bytes for u32"); + return { value: buf.readUint32BE(off), next: off + 4 }; +} + +function u8(buf, off) { + if (off >= buf.length) throw new Error("manifest: not enough bytes for u8"); + return { value: buf.readUint8(off), next: off + 1 }; +} + +/** + * Read a length-prefixed UTF-8 string: u32 BE length + UTF-8 bytes. + * Returns { text, next }. + */ +function readStr(buf, off) { + const { value: len, next: afterLen } = u32(buf, off); + if (afterLen + len > buf.length) throw new Error("manifest: string extends beyond input"); + return { text: buf.toString("utf-8", afterLen, afterLen + len), next: afterLen + len }; +} + +/** + * Read raw bytes of given length. + * Returns { bytes, next }. + */ +function readRaw(buf, off, n) { + if (off + n > buf.length) throw new Error(`manifest: not enough bytes for ${n}-byte read`); + return { value: buf.slice(off, off + n), next: off + n }; +} + +// ── Manifest decoder ──────────────────────────────────────────────────────── + +/** + * Decode the manifest binary from a Buffer. + * + * Returns a normalized manifest object matching the shape expected + * by validateManifest / selectExport. + */ +export function decodeManifest(buf) { + let off = 0; + + // Magic (8 bytes) + const magic = buf.toString("utf-8", 0, 8); + if (magic !== MANIFEST_MAGIC) { + throw new Error(`invalid manifest magic: expected ${MANIFEST_MAGIC}, got "${magic}"`); + } + off = 8; + + // Version + const { value: major } = u16(buf, off); + if (major !== MANIFEST_MAJOR) throw new Error(`unsupported manifest major version: ${major}`); + off += 4; // u16 major + u16 minor + + // Helper: read length-prefixed text + const readText = () => { + const { text, next } = readStr(buf, off); + off = next; + return text; + }; + + // Core strings + const schema = readText(); + const bundleType = readText(); + const treeCalculus = readText(); + const treeHashAlgorithm = readText(); + const treeHashDomain = readText(); + const treeNodePayload = readText(); + const runtimeSemantics = readText(); + const runtimeEvaluation = readText(); + const runtimeAbi = readText(); + + // Capabilities (u32 count + string[]) + const { value: capCount } = u32(buf, off); + off += 4; + const capabilities = []; + for (let i = 0; i < capCount; i++) { + capabilities.push(readText()); + } + + // Closure (u8) + const { value: closureByte } = u8(buf, off); + off += 1; + const closure = closureByte === CLOSURE_COMPLETE ? "complete" : "partial"; + + // Roots (u32 count + Root[]) + // Root: 32 bytes raw hash + role(string) + const { value: rootCount } = u32(buf, off); + off += 4; + const roots = []; + for (let i = 0; i < rootCount; i++) { + const { value: hashRaw } = readRaw(buf, off, 32); + off += 32; + const { text: role, next: rOff } = readStr(buf, off); + off = rOff; + roots.push({ hash: hashRaw.toString("hex"), role }); + } + + // Exports (u32 count + Export[]) + // Export: name(string) + 32 bytes raw root hash + kind(string) + abi(string) + const { value: exportCount } = u32(buf, off); + off += 4; + const exports = []; + for (let i = 0; i < exportCount; i++) { + const { text: name, next: nOff } = readStr(buf, off); + off = nOff; + const { value: expHashRaw } = readRaw(buf, off, 32); + off += 32; + const { text: kind, next: kOff } = readStr(buf, off); + off = kOff; + const { text: abi, next: aOff } = readStr(buf, off); + off = aOff; + exports.push({ name, root: expHashRaw.toString("hex"), kind, abi }); + } + + // Metadata (u32 count + TLV[]) + // TLV: u16 tag + u32 length + value bytes + const { value: metaCount } = u32(buf, off); + off += 4; + const metadata = {}; + for (let i = 0; i < metaCount; i++) { + const { value: tag } = u16(buf, off); + off += 2; + const { value: tlvLen } = u32(buf, off); + off += 4; + const { value: tlvRaw } = readRaw(buf, off, tlvLen); + off += tlvLen; + const val = tlvRaw.toString("utf-8"); + switch (tag) { + case TAG_PACKAGE: metadata.package = val; break; + case TAG_VERSION: metadata.version = val; break; + case TAG_DESCRIPTION: metadata.description = val; break; + case TAG_LICENSE: metadata.license = val; break; + case TAG_CREATED_BY: metadata.createdBy = val; break; + } + } + + // Extensions (u32 count + TLV[] — skip all) + const { value: extCount } = u32(buf, off); + off += 4; + for (let i = 0; i < extCount; i++) { + const { value: _tag } = u16(buf, off); + off += 2; + const { value: tlvLen } = u32(buf, off); + off += 4; + off += tlvLen; // skip value + } + + return { + schema, + bundleType, + tree: { + calculus: treeCalculus, + nodeHash: { + algorithm: treeHashAlgorithm, + domain: treeHashDomain, + }, + nodePayload: treeNodePayload, + }, + runtime: { + semantics: runtimeSemantics, + evaluation: runtimeEvaluation, + abi: runtimeAbi, + capabilities, + }, + closure, + roots, + exports, + metadata: Object.keys(metadata).length > 0 ? metadata : undefined, + }; +} + +// ── Validation ────────────────────────────────────────────────────────────── + /** * Validate the manifest against the runtime profile requirements. * Throws on violation. diff --git a/src/Wire.hs b/src/Wire.hs index b6ed741..1211a33 100644 --- a/src/Wire.hs +++ b/src/Wire.hs @@ -24,40 +24,22 @@ module Wire import ContentStore (getNodeMerkle, loadTree, putMerkleNode, storeTerm) import Research -import Codec.CBOR.Decoding ( Decoder - , decodeString - , decodeBytes - , decodeListLen - , decodeMapLen - ) -import Control.Monad (replicateM, forM) -import Codec.CBOR.Encoding ( Encoding - , encodeMapLen - , encodeListLen - , encodeString - , encodeBytes - ) -import Codec.CBOR.Write (toLazyByteString) -import Data.Monoid (mconcat) -import Codec.CBOR.Read (deserialiseFromBytes, DeserialiseFailure(..)) - import Control.Exception (SomeException, evaluate, try) import Control.Monad (foldM, unless, when) import Crypto.Hash (Digest, SHA256, hash) -import Data.Bits ((.&.), (.|.), shiftL, shiftR) +import Data.Bits ((.|.), (.&.), shiftL, shiftR) import Data.ByteArray (convert) import Data.ByteString (ByteString) import Data.Foldable (traverse_) import Data.Map (Map) import Data.Text (Text, unpack) -import Data.Text.Encoding (decodeUtf8, encodeUtf8) -import Data.Word (Word16, Word32, Word64) +import Data.Text.Encoding (decodeUtf8, decodeUtf8', encodeUtf8) +import Data.Word (Word16, Word32, Word64, Word8) import Database.SQLite.Simple (Connection) import GHC.Generics (Generic) import qualified Data.ByteString as BS import qualified Data.ByteString.Base16 as Base16 -import qualified Data.ByteString.Lazy as BL import qualified Data.Map as Map import qualified Data.Set as Set import qualified Data.Text as T @@ -91,92 +73,316 @@ compressionNone = 0 digestSha256 = 1 -- --------------------------------------------------------------------------- --- CBOR encoding helpers +-- Manifest binary constants -- --------------------------------------------------------------------------- --- | Canonical CBOR map length encoder. -cmkLen :: Int -> Encoding -cmkLen n = encodeMapLen (fromIntegral n) +-- | Magic prefix identifying the fixed-order manifest v1 format. +manifestMagic :: ByteString +manifestMagic = "ARBMNFST" --- | Decode a CBOR array of n elements. -decodeListN :: Decoder s a -> Int -> Decoder s [a] -decodeListN dec n = replicateM n dec +-- | Manifest major version. +manifestMajorVersion :: Word16 +manifestMajorVersion = 1 --- | Decode a CBOR map (sequence of key-value pairs). -decodeMapN :: Decoder s a -> Decoder s b -> Int -> Decoder s [(a, b)] -decodeMapN keyDec valDec n = forM [1..n] $ \_ -> - keyDec >>= \k -> valDec >>= \v -> pure (k, v) +-- | Manifest minor version. +manifestMinorVersion :: Word16 +manifestMinorVersion = 0 -decodeKey :: Text -> Decoder s () -decodeKey expected = do - actual <- decodeString - unless (actual == expected) $ - fail $ "expected key " ++ show expected ++ ", got " ++ show actual +-- | Closure mode to byte. +closureToByte :: ClosureMode -> Word8 +closureToByte = \case + ClosureComplete -> 0 + ClosurePartial -> 1 --- | Canonical CBOR array length encoder. -cakLen :: Int -> Encoding -cakLen n = encodeListLen (fromIntegral n) +closureFromByte :: Word8 -> Either String ClosureMode +closureFromByte = \case + 0 -> Right ClosureComplete + 1 -> Right ClosurePartial + n -> Left $ "unsupported closure byte: " ++ show n --- | Encode a canonical CBOR map with key-value pairs as flat sequence. -cmkPairs :: [(Text, Encoding)] -> Encoding -cmkPairs [] = cmkLen 0 -cmkPairs kvs = cmkLen (length kvs) <> mconcat [encodeString k <> v | (k, v) <- kvs] - --- | Encode a canonical CBOR array. -cakSeq :: [Encoding] -> Encoding -cakSeq [] = cakLen 0 -cakSeq xs = cakLen (length xs) <> mconcat xs - --- | Encode a canonical CBOR text string. -encText :: Text -> Encoding -encText = encodeString - --- | Encode a canonical CBOR byte string. -encBytes :: ByteString -> Encoding -encBytes = encodeBytes +-- | Metadata tag constants. +tagPackage, tagVersion, tagDescription, tagLicense, tagCreatedBy :: Word16 +tagPackage = 1 +tagVersion = 2 +tagDescription = 3 +tagLicense = 4 +tagCreatedBy = 5 -- --------------------------------------------------------------------------- --- Data types with CBOR instances +-- Fixed-order manifest binary helpers +-- --------------------------------------------------------------------------- + +-- | Encode a UTF-8 text string as: u32 length + UTF-8 bytes. +encodeLengthPrefixedText :: Text -> ByteString +encodeLengthPrefixedText t = encode32 (fromIntegral $ BS.length bs) <> bs + where bs = encodeUtf8 t + +-- | Decode a length-prefixed UTF-8 text string. +-- Returns the decoded Text and the remaining ByteString. +decodeLengthPrefixedText :: ByteString -> Either String (Text, ByteString) +decodeLengthPrefixedText bs = + case decode32be "text_length" bs of + Left err -> Left $ "decodeLengthPrefixedText: " ++ err + Right (len, rest) -> do + let payloadLen = fromIntegral len + when (BS.length rest < payloadLen) $ + Left "decodeLengthPrefixedText: string extends beyond input" + let (textBytes, after) = BS.splitAt payloadLen rest + case decodeUtf8' textBytes of + Right txt -> Right (txt, after) + Left _ -> Left "decodeLengthPrefixedText: invalid UTF-8" + +-- | Encode a metadata value as a TLV entry: u16 tag + u32 length + raw bytes. +encodeMetadataTLV :: Word16 -> ByteString -> ByteString +encodeMetadataTLV tag val = encode16 tag <> encode32 (fromIntegral $ BS.length val) <> val + +-- --------------------------------------------------------------------------- +-- Fixed-order manifest encoders +-- --------------------------------------------------------------------------- + +-- | Encode the entire manifest in fixed-order core + TLV tail layout. +encodeManifest :: BundleManifest -> ByteString +encodeManifest m = + manifestMagic + <> encode16 manifestMajorVersion + <> encode16 manifestMinorVersion + <> encodeLengthPrefixedText (manifestSchema m) + <> encodeLengthPrefixedText (manifestBundleType m) + <> encodeLengthPrefixedText (treeCalculus (manifestTree m)) + <> encodeLengthPrefixedText (nodeHashAlgorithm (treeNodeHash (manifestTree m))) + <> encodeLengthPrefixedText (nodeHashDomain (treeNodeHash (manifestTree m))) + <> encodeLengthPrefixedText (treeNodePayload (manifestTree m)) + <> encodeLengthPrefixedText (runtimeSemantics (manifestRuntime m)) + <> encodeLengthPrefixedText (runtimeEvaluation (manifestRuntime m)) + <> encodeLengthPrefixedText (runtimeAbi (manifestRuntime m)) + <> encode32 (fromIntegral $ length (runtimeCapabilities (manifestRuntime m))) + <> encodeCapabilities (runtimeCapabilities (manifestRuntime m)) + <> BS.pack [closureToByte (manifestClosure m)] + <> encode32 (fromIntegral $ length (manifestRoots m)) + <> encodeRoots (manifestRoots m) + <> encode32 (fromIntegral $ length (manifestExports m)) + <> encodeExports (manifestExports m) + <> encodeMetadataTLVs (manifestMetadata m) + <> encode32 0 -- zero extension fields + +encodeCapabilities :: [Text] -> ByteString +encodeCapabilities caps = mconcat (map encodeLengthPrefixedText caps) + +encodeRoots :: [BundleRoot] -> ByteString +encodeRoots = mconcat . map encodeRoot + +encodeRoot :: BundleRoot -> ByteString +encodeRoot root = + merkleHashToRaw (rootHash root) + <> encodeLengthPrefixedText (rootRole root) + +encodeExports :: [BundleExport] -> ByteString +encodeExports = mconcat . map encodeExport + +encodeExport :: BundleExport -> ByteString +encodeExport exp = + encodeLengthPrefixedText (exportName exp) + <> merkleHashToRaw (exportRoot exp) + <> encodeLengthPrefixedText (exportKind exp) + <> encodeLengthPrefixedText (exportAbi exp) + +-- | Encode metadata as: u32 field count + TLV entries for present fields. +-- Metadata TLV values are raw UTF-8 bytes; the TLV length already carries size. +encodeMetadataTLVs :: BundleMetadata -> ByteString +encodeMetadataTLVs m = + let entries = metadataTLVEntries m + in encode32 (fromIntegral $ length entries) <> encodeTLVs entries + +metadataTLVEntries :: BundleMetadata -> [(Word16, ByteString)] +metadataTLVEntries m = + maybeEntry tagPackage (metadataPackage m) + ++ maybeEntry tagVersion (metadataVersion m) + ++ maybeEntry tagDescription (metadataDescription m) + ++ maybeEntry tagLicense (metadataLicense m) + ++ maybeEntry tagCreatedBy (metadataCreatedBy m) + where + maybeEntry _ Nothing = [] + maybeEntry tag (Just value) = [(tag, encodeUtf8 value)] + +encodeTLVs :: [(Word16, ByteString)] -> ByteString +encodeTLVs tlvs = mconcat (map (uncurry encodeMetadataTLV) tlvs) + +-- --------------------------------------------------------------------------- +-- Fixed-order manifest decoders +-- --------------------------------------------------------------------------- + +-- | Decode the manifest from fixed-order core + TLV tail bytes. +-- All remaining bytes after the core fields are treated as the TLV tail. +decodeManifest :: ByteString -> Either String BundleManifest +decodeManifest bs = do + -- Header + when (BS.length bs < 8) $ Left "manifest too short for magic" + when (BS.take 8 bs /= manifestMagic) $ Left "invalid manifest magic" + let rest = BS.drop 8 bs + (major, rest') <- decode16be "major" rest + when (major /= manifestMajorVersion) $ Left $ "unsupported manifest major version: " ++ show major + (_minor, rest'') <- decode16be "minor" rest' + + -- Core strings + (schema, rest''') <- decodeLengthPrefixedText rest'' + (bundleType, rest'''') <- decodeLengthPrefixedText rest''' + + -- Tree spec fields (flat) + (calc, rest1) <- decodeLengthPrefixedText rest'''' + (alg, rest2) <- decodeLengthPrefixedText rest1 + (domain, rest3) <- decodeLengthPrefixedText rest2 + (payload, rest4) <- decodeLengthPrefixedText rest3 + + -- Runtime spec fields (flat) + (sem, restR1) <- decodeLengthPrefixedText rest4 + (eval, restR2) <- decodeLengthPrefixedText restR1 + (abi, restR3) <- decodeLengthPrefixedText restR2 + + (capCount, restR4) <- decode32be "capability_count" restR3 + let capLen = fromIntegral capCount + (caps, restR5) <- decodeCapabilities capLen restR4 + + -- Closure + when (BS.length restR5 < 1) $ Left "manifest truncated: missing closure byte" + let (closureByte, restR6) = BS.splitAt 1 restR5 + closure <- closureFromByte (head $ BS.unpack closureByte) + + -- Roots + (rootCount, restR7) <- decode32be "root_count" restR6 + let rootCountInt = fromIntegral rootCount + (roots, restR8) <- decodeRoots rootCountInt restR7 + + -- Exports + (exportCount, restR9) <- decode32be "export_count" restR8 + let exportCountInt = fromIntegral exportCount + (exports, restR10) <- decodeExports exportCountInt restR9 + + -- TLV tail + (metadata, _ext) <- decodeMetadataAndExtensions restR10 + + pure BundleManifest + { manifestSchema = schema + , manifestBundleType = bundleType + , manifestTree = TreeSpec + { treeCalculus = calc + , treeNodeHash = NodeHashSpec + { nodeHashAlgorithm = alg + , nodeHashDomain = domain + } + , treeNodePayload = payload + } + , manifestRuntime = RuntimeSpec + { runtimeSemantics = sem + , runtimeEvaluation = eval + , runtimeAbi = abi + , runtimeCapabilities = caps + } + , manifestClosure = closure + , manifestRoots = roots + , manifestExports = exports + , manifestMetadata = metadata + } + +-- | Decode length-prefixed capability strings. +decodeCapabilities :: Int -> ByteString -> Either String ([Text], ByteString) +decodeCapabilities 0 bs = Right ([], bs) +decodeCapabilities n bs = do + (txt, rest) <- decodeLengthPrefixedText bs + (restTxts, restFinal) <- decodeCapabilities (n - 1) rest + Right (txt : restTxts, restFinal) + +-- | Decode root entries. +decodeRoots :: Int -> ByteString -> Either String ([BundleRoot], ByteString) +decodeRoots 0 bs = Right ([], bs) +decodeRoots n bs = do + when (BS.length bs < 32) $ Left "decodeRoots: truncated root hash" + let (hashBytes, rest) = BS.splitAt 32 bs + role <- decodeLengthPrefixedText rest + (restRoots, restFinal) <- decodeRoots (n - 1) (snd role) + Right (BundleRoot (rawToMerkleHash hashBytes) (fst role) : restRoots, restFinal) + +-- | Decode export entries. +decodeExports :: Int -> ByteString -> Either String ([BundleExport], ByteString) +decodeExports 0 bs = Right ([], bs) +decodeExports n bs = do + name <- decodeLengthPrefixedText bs + when (BS.length (snd name) < 32) $ Left "decodeExports: truncated export root hash" + let (hashBytes, rest) = BS.splitAt 32 (snd name) + kind <- decodeLengthPrefixedText rest + abi <- decodeLengthPrefixedText (snd kind) + (restExports, restFinal) <- decodeExports (n - 1) (snd abi) + Right (BundleExport (fst name) (rawToMerkleHash hashBytes) (fst kind) (fst abi) : restExports, restFinal) + +-- | Decode TLV tail into metadata and extensions. +-- Layout: u32 metadata-count, metadata TLVs, u32 extension-count, extension TLVs. +-- For now, known metadata tags are decoded and extension TLVs are skipped. +decodeMetadataAndExtensions :: ByteString -> Either String (BundleMetadata, ByteString) +decodeMetadataAndExtensions bs = do + (metadataCount, rest1) <- decode32be "metadata_field_count" bs + (metadataTlvs, rest2) <- decodeTLVs (fromIntegral metadataCount) rest1 + metadata <- decodeMetadataTLVs metadataTlvs + (extensionCount, rest3) <- decode32be "extension_field_count" rest2 + (_extensionTlvs, rest4) <- decodeTLVs (fromIntegral extensionCount) rest3 + unless (BS.null rest4) $ Left "trailing bytes after manifest TLV tail" + Right (metadata, rest4) + +-- | Decode a fixed number of TLV entries. +decodeTLVs :: Int -> ByteString -> Either String ([TLVEntry], ByteString) +decodeTLVs 0 bs = Right ([], bs) +decodeTLVs n bs = do + (tag, rest1) <- decode16be "tlv_tag" bs + (len, rest2) <- decode32be "tlv_length" rest1 + let payloadLen = fromIntegral len + when (BS.length rest2 < payloadLen) $ Left "TLV value extends beyond input" + let (value, after) = BS.splitAt payloadLen rest2 + (restTlvs, restFinal) <- decodeTLVs (n - 1) after + Right ((tag, value) : restTlvs, restFinal) + +-- | Decode known metadata TLV entries into BundleMetadata. +-- Unknown tags are ignored. +decodeMetadataTLVs :: [(Word16, ByteString)] -> Either String BundleMetadata +decodeMetadataTLVs tlvs = do + pkg <- decodeOptionalMetadataText tagPackage + ver <- decodeOptionalMetadataText tagVersion + desc <- decodeOptionalMetadataText tagDescription + lic <- decodeOptionalMetadataText tagLicense + by <- decodeOptionalMetadataText tagCreatedBy + pure BundleMetadata + { metadataPackage = pkg + , metadataVersion = ver + , metadataDescription = desc + , metadataLicense = lic + , metadataCreatedBy = by + } + where + lookupTag t = go t tlvs + go _ [] = Nothing + go t ((tag, val):rest) + | tag == t = Just val + | otherwise = go t rest + decodeOptionalMetadataText tag = + case lookupTag tag of + Nothing -> Right Nothing + Just raw -> case decodeUtf8' raw of + Right txt -> Right (Just txt) + Left _ -> Left $ "metadata TLV has invalid UTF-8 for tag " ++ show tag + +type TLVEntry = (Word16, ByteString) + +-- --------------------------------------------------------------------------- +-- Data types -- --------------------------------------------------------------------------- -- | Closure declaration. data ClosureMode = ClosureComplete | ClosurePartial deriving (Show, Eq, Ord, Generic) -toCBORClosure :: ClosureMode -> Encoding -toCBORClosure = encText . \case - ClosureComplete -> "complete" - ClosurePartial -> "partial" - -closureFromCBOR :: Decoder s ClosureMode -closureFromCBOR = decodeString >>= \case - "complete" -> pure ClosureComplete - "partial" -> pure ClosurePartial - other -> fail $ "ClosureMode: " ++ show other - -- | Hash specification (algorithm + domain strings). data NodeHashSpec = NodeHashSpec { nodeHashAlgorithm :: Text , nodeHashDomain :: Text } deriving (Show, Eq, Ord, Generic) -toCBORNodeHashSpec :: NodeHashSpec -> Encoding -toCBORNodeHashSpec (NodeHashSpec alg dom) = - cmkPairs - [ ("algorithm", encText alg) - , ("domain", encText dom) - ] - -nodeHashSpecFromCBOR :: Decoder s NodeHashSpec -nodeHashSpecFromCBOR = do - n <- decodeMapLen - unless (n == 2) $ fail "NodeHashSpec: must have exactly 2 entries" - decodeKey "algorithm" - alg <- decodeString - decodeKey "domain" - dom <- decodeString - pure (NodeHashSpec alg dom) - -- | Tree specification. data TreeSpec = TreeSpec { treeCalculus :: Text @@ -184,26 +390,6 @@ data TreeSpec = TreeSpec , treeNodePayload :: Text } deriving (Show, Eq, Ord, Generic) -toCBORTreeSpec :: TreeSpec -> Encoding -toCBORTreeSpec (TreeSpec calc hspec payload) = - cmkPairs - [ ("calculus", encText calc) - , ("nodeHash", toCBORNodeHashSpec hspec) - , ("nodePayload", encText payload) - ] - -treeSpecFromCBOR :: Decoder s TreeSpec -treeSpecFromCBOR = do - n <- decodeMapLen - unless (n == 3) $ fail "TreeSpec: must have exactly 3 entries" - decodeKey "calculus" - calc <- decodeString - decodeKey "nodeHash" - hspec <- nodeHashSpecFromCBOR - decodeKey "nodePayload" - payload <- decodeString - pure (TreeSpec calc hspec payload) - -- | Runtime specification. data RuntimeSpec = RuntimeSpec { runtimeSemantics :: Text @@ -212,53 +398,12 @@ data RuntimeSpec = RuntimeSpec , runtimeCapabilities :: [Text] } deriving (Show, Eq, Ord, Generic) -toCBORRuntimeSpec :: RuntimeSpec -> Encoding -toCBORRuntimeSpec (RuntimeSpec sem eval abi caps) = - cmkPairs - [ ("semantics", encText sem) - , ("evaluation", encText eval) - , ("abi", encText abi) - , ("capabilities", cakSeq (map encText caps)) - ] - -runtimeSpecFromCBOR :: Decoder s RuntimeSpec -runtimeSpecFromCBOR = do - n <- decodeMapLen - unless (n == 4) $ fail "RuntimeSpec: must have exactly 4 entries" - decodeKey "semantics" - sem <- decodeString - decodeKey "evaluation" - eval <- decodeString - decodeKey "abi" - abi <- decodeString - decodeKey "capabilities" - clen <- decodeListLen - caps <- decodeListN decodeString clen - pure (RuntimeSpec sem eval abi caps) - -- | A root hash reference. data BundleRoot = BundleRoot { rootHash :: MerkleHash , rootRole :: Text } deriving (Show, Eq, Ord, Generic) -toCBORBundleRoot :: BundleRoot -> Encoding -toCBORBundleRoot (BundleRoot h role) = - cmkPairs - [ ("hash", encBytes (merkleHashToRaw h)) - , ("role", encText role) - ] - -bundleRootFromCBOR :: Decoder s BundleRoot -bundleRootFromCBOR = do - n <- decodeMapLen - unless (n == 2) $ fail "BundleRoot: must have exactly 2 entries" - decodeKey "hash" - hRaw <- decodeBytes - decodeKey "role" - role <- decodeString - pure (BundleRoot (rawToMerkleHash hRaw) role) - -- | An export entry. data BundleExport = BundleExport { exportName :: Text @@ -267,29 +412,6 @@ data BundleExport = BundleExport , exportAbi :: Text } deriving (Show, Eq, Ord, Generic) -toCBORBundleExport :: BundleExport -> Encoding -toCBORBundleExport (BundleExport name h kind abi) = - cmkPairs - [ ("name", encText name) - , ("root", encBytes (merkleHashToRaw h)) - , ("kind", encText kind) - , ("abi", encText abi) - ] - -bundleExportFromCBOR :: Decoder s BundleExport -bundleExportFromCBOR = do - n <- decodeMapLen - unless (n == 4) $ fail "BundleExport: must have exactly 4 entries" - decodeKey "name" - name <- decodeString - decodeKey "root" - hRaw <- decodeBytes - decodeKey "kind" - kind <- decodeString - decodeKey "abi" - abi <- decodeString - pure (BundleExport name (rawToMerkleHash hRaw) kind abi) - -- | Optional package metadata. data BundleMetadata = BundleMetadata { metadataPackage :: Maybe Text @@ -299,33 +421,6 @@ data BundleMetadata = BundleMetadata , metadataCreatedBy :: Maybe Text } deriving (Show, Eq, Ord, Generic) -metadataFromCBOR :: Decoder s BundleMetadata -metadataFromCBOR = do - mlen <- decodeMapLen - entries <- decodeMapN decodeString decodeString mlen - let lookupText k = go k entries - go _ [] = Nothing - go k ((k', v):rest) - | k == k' = Just v - | otherwise = go k rest - pure BundleMetadata - { metadataPackage = lookupText "package" - , metadataVersion = lookupText "version" - , metadataDescription = lookupText "description" - , metadataLicense = lookupText "license" - , metadataCreatedBy = lookupText "createdBy" - } - -metadataToCBOR :: BundleMetadata -> Encoding -metadataToCBOR (BundleMetadata pkg ver desc lic by) = - let pairs = - maybe [] (\v -> [("package", encText v)]) pkg - ++ maybe [] (\v -> [("version", encText v)]) ver - ++ maybe [] (\v -> [("description", encText v)]) desc - ++ maybe [] (\v -> [("license", encText v)]) lic - ++ maybe [] (\v -> [("createdBy", encText v)]) by - in cmkPairs pairs - -- | The manifest: top-level bundle metadata. data BundleManifest = BundleManifest { manifestSchema :: Text @@ -338,43 +433,6 @@ data BundleManifest = BundleManifest , manifestMetadata :: BundleMetadata } deriving (Show, Eq, Generic) -manifestToCBOR :: BundleManifest -> Encoding -manifestToCBOR m = - cmkPairs - [ ("schema", encText (manifestSchema m)) - , ("bundleType", encText (manifestBundleType m)) - , ("tree", toCBORTreeSpec (manifestTree m)) - , ("runtime", toCBORRuntimeSpec (manifestRuntime m)) - , ("closure", toCBORClosure (manifestClosure m)) - , ("roots", cakSeq (map toCBORBundleRoot (manifestRoots m))) - , ("exports", cakSeq (map toCBORBundleExport (manifestExports m))) - , ("metadata", metadataToCBOR (manifestMetadata m)) - ] - -manifestFromCBOR :: Decoder s BundleManifest -manifestFromCBOR = do - n <- decodeMapLen - unless (n == 8) $ fail "BundleManifest: must have exactly 8 entries" - decodeKey "schema" - schema <- decodeString - decodeKey "bundleType" - bundleType <- decodeString - decodeKey "tree" - tree <- treeSpecFromCBOR - decodeKey "runtime" - runtime <- runtimeSpecFromCBOR - decodeKey "closure" - closure <- closureFromCBOR - decodeKey "roots" - rlen <- decodeListLen - roots <- decodeListN bundleRootFromCBOR rlen - decodeKey "exports" - elen <- decodeListLen - exports <- decodeListN bundleExportFromCBOR elen - decodeKey "metadata" - metadata <- metadataFromCBOR - pure (BundleManifest schema bundleType tree runtime closure roots exports metadata) - -- | Portable executable-object bundle. -- -- Merkle node payloads remain the language-neutral executable core: @@ -388,28 +446,12 @@ data Bundle = Bundle , bundleManifestBytes :: ByteString } deriving (Show, Eq) --- --------------------------------------------------------------------------- --- CBOR manifest serialization --- --------------------------------------------------------------------------- - --- | Encode the manifest as canonical CBOR. -encodeManifest :: BundleManifest -> ByteString -encodeManifest m = BL.toStrict (toLazyByteString (manifestToCBOR m)) - --- | Decode a manifest from CBOR bytes. -decodeManifest :: ByteString -> Either String BundleManifest -decodeManifest bs = - case deserialiseFromBytes manifestFromCBOR (BL.fromStrict bs) of - Right (rest, m) - | BS.null (BL.toStrict rest) -> Right m - | otherwise -> Left "trailing bytes after manifest CBOR" - Left (DeserialiseFailure _ msg) -> Left msg - -- --------------------------------------------------------------------------- -- Bundle encoding -- --------------------------------------------------------------------------- -- | Encode a Bundle to portable Bundle v1 bytes. +-- The manifest is serialized using the fixed-order core + TLV tail format. encodeBundle :: Bundle -> ByteString encodeBundle bundle = let nodeSection = encodeNodeSection (bundleNodes bundle) diff --git a/test/fixtures/false.arboricx b/test/fixtures/false.arboricx index 8ee3588..7b9335d 100644 Binary files a/test/fixtures/false.arboricx and b/test/fixtures/false.arboricx differ diff --git a/test/fixtures/id.arboricx b/test/fixtures/id.arboricx index 1e289cb..873873f 100644 Binary files a/test/fixtures/id.arboricx and b/test/fixtures/id.arboricx differ diff --git a/test/fixtures/map.arboricx b/test/fixtures/map.arboricx index 3f1d02e..62c4c71 100644 Binary files a/test/fixtures/map.arboricx and b/test/fixtures/map.arboricx differ diff --git a/test/fixtures/notQ.arboricx b/test/fixtures/notQ.arboricx index 80c94be..a91ad48 100644 Binary files a/test/fixtures/notQ.arboricx and b/test/fixtures/notQ.arboricx differ diff --git a/test/fixtures/true.arboricx b/test/fixtures/true.arboricx index 565d1ab..a87c11d 100644 Binary files a/test/fixtures/true.arboricx and b/test/fixtures/true.arboricx differ diff --git a/tricu.cabal b/tricu.cabal index 46f5fd2..048b6f8 100644 --- a/tricu.cabal +++ b/tricu.cabal @@ -41,7 +41,6 @@ executable tricu , base16-bytestring , base64-bytestring , bytestring - , cborg , cmdargs , containers , cryptonite @@ -94,7 +93,6 @@ test-suite tricu-tests , base16-bytestring , base64-bytestring , bytestring - , cborg , cmdargs , containers , cryptonite