# Arboricx Portable Bundle Format Specification **Version:** 0.1 **Status:** Exploratory **Author:** A range of slopmachines guided by James Eversole **Human Review Status:** 5 minute scan-through - this is an evolving and malleable document The Arboricx Portable Bundle is a self-contained, content-addressed binary format for distributing Tree Calculus programs and their associated Merkle DAGs. It provides: - A fixed binary container with header, section directory, and typed sections - A language-neutral Merkle node layer for content-addressed tree values - A fixed-order binary manifest for semantic metadata, exports, and optional extensions ## Table of Contents 1. [Top-Level Container Layout](#1-top-level-container-layout) 2. [Header](#2-header) 3. [Section Directory](#3-section-directory) 4. [Section: Manifest (type 1)](#4-section-manifest-type-1) 5. [Section: Nodes (type 2)](#5-section-nodes-type-2) 6. [Merkle Node Payload Format](#6-merkle-node-payload-format) 7. [Merkle Hash Computation](#7-merkle-hash-computation) 8. [Tree Calculus Reduction Semantics](#8-tree-calculus-reduction-semantics) 9. [Binary Primitives](#9-binary-primitives) 10. [Bundle Verification](#10-bundle-verification) 11. [Known Section Types](#11-known-section-types) --- ## 1. Top-Level Container Layout An Arboricx bundle is a flat binary blob with the following layout: ``` +------------------+------------------+------------------+------------------+ | Header | Section Directory| Manifest Section | Nodes Section | | (32 bytes) | (N × 60 bytes) | (variable) | (variable) | +------------------+------------------+------------------+------------------+ ``` The container uses **big-endian** byte order for all multi-byte integers. Total bundle size = 32 + (sectionCount × 60) + manifestSize + nodesSize --- ## 2. Header | Offset | Size | Field | Description | |--------|------|-------|-------------| | 0 | 8 bytes | Magic | ASCII `"ARBORICX"` (`0x41 0x52 0x42 0x4F 0x52 0x49 0x43 0x58`) | | 8 | 2 bytes | Major version | `u16` BE. Currently `1` | | 10 | 2 bytes | Minor version | `u16` BE. Currently `0` | | 12 | 4 bytes | Section count | `u32` BE. Number of entries in the section directory | | 16 | 8 bytes | Flags | `u64` BE. Reserved; currently all zeros | | 24 | 8 bytes | Directory offset | `u64` BE. Byte offset from the start of the bundle to the section directory | **Constraints:** - Major version must be `1`. Bundles with unsupported major versions are rejected. - The directory offset must point to a valid location within the bundle. - The directory offset is always `32` for bundles with the current layout (header immediately followed by the directory). --- ## 3. Section Directory The section directory is an array of `N` entries, where `N` is the section count from the header. Each entry is exactly **60 bytes**. | Offset (within entry) | Size | Field | Description | |----------------------|------|-------|-------------| | 0 | 4 bytes | Type | `u32` BE. Section type identifier (see [Known Section Types](#11-known-section-types)) | | 4 | 2 bytes | Version | `u16` BE. Section-specific version | | 6 | 2 bytes | Flags | `u16` BE. Bit flags: bit 0 (`0x0001`) = critical section | | 8 | 2 bytes | Compression | `u16` BE. Compression codec (currently only `0` = none) | | 10 | 2 bytes | Digest algorithm | `u16` BE. Hash algorithm (currently only `1` = SHA-256) | | 12 | 8 bytes | Offset | `u64` BE. Byte offset from the start of the bundle to the section data | | 20 | 8 bytes | Length | `u64` BE. Length of the section data in bytes | | 28 | 32 bytes | SHA-256 digest | Raw digest of the section data | **Verification:** - Unknown critical sections (flags & `0x0001`) are rejected. - Compression must be `0` (none). - Digest algorithm must be `1` (SHA-256). - The SHA-256 digest in the directory entry must match `SHA256(section_data)`. --- ## 4. Section: Manifest (type 1) The manifest is a binary encoding of bundle metadata. It uses a **fixed-order core** layout followed by an optional **TLV tail** for extensibility. ### 4.1 Format ``` Manifest = magic 8 bytes "ARBMNFST" major u16 BE Manifest major version (1) minor u16 BE Manifest minor version (0) schema string Length-prefixed UTF-8 text bundleType string Length-prefixed UTF-8 text treeCalculus string Length-prefixed UTF-8 text treeHashAlgorithm string Length-prefixed UTF-8 text treeHashDomain string Length-prefixed UTF-8 text treeNodePayload string Length-prefixed UTF-8 text runtimeSemantics string Length-prefixed UTF-8 text runtimeEvaluation string Length-prefixed UTF-8 text runtimeAbi string Length-prefixed UTF-8 text capabilityCount u32 BE Number of capability strings capabilities string[] Array of length-prefixed UTF-8 capability strings closure u8 0 = complete, 1 = partial rootCount u32 BE Number of root entries roots Root[] Array of root entries exportCount u32 BE Number of export entries exports Export[] Array of export entries metadataFieldCount u32 BE Number of metadata TLV entries metadataFields TLV[] Metadata tag-value entries extensionFieldCount u32 BE Number of extension TLV entries extensionFields TLV[] Extension tag-value entries (skipped by parsers) ``` **Trailing bytes after the manifest must be zero** (no leftover data). ### 4.2 String Format Every `string` field uses the same encoding: ``` string = length u32 BE Number of UTF-8 bytes in the string (not the number of characters) bytes byte[length] UTF-8 encoded string content ``` The length field carries the byte count, so parsers can skip strings without decoding UTF-8. ### 4.3 Root Entry ``` Root = hash 32 bytes Raw SHA-256 hash of the Merkle node role string Length-prefixed UTF-8 text ("default" for the first root, "root" for others) ``` The hash is stored as **raw bytes** (not hex-encoded). It corresponds to the Merkle hash of the node. ### 4.4 Export Entry ``` Export = name string Length-prefixed UTF-8 text (export identifier) root 32 bytes Raw SHA-256 hash of the Merkle node kind string Length-prefixed UTF-8 text (currently "term") abi string Length-prefixed UTF-8 text (ABI string) ``` ### 4.5 TLV Entry ``` TLV = tag u16 BE Tag identifier (type) length u32 BE Number of bytes in the value value byte[length] Raw bytes ``` TLV entries support variable-length values and are skippable by parsers that do not recognize a tag: read the `u32` length and advance by `2 + 4 + length` bytes. ### 4.6 Metadata Tags | Tag | Name | Value | |-----|------|-------| | 1 | package | UTF-8 text: package name | | 2 | version | UTF-8 text: version string | | 3 | description | UTF-8 text: description | | 4 | license | UTF-8 text: license identifier or text | | 5 | createdBy | UTF-8 text: creator identifier | Unknown metadata tags are ignored. Unknown extension tags are skipped by length. ### 4.7 Semantic Constraints A valid bundle manifest must satisfy: | Constraint | Value | |-----------|-------| | `schema` | `"arboricx.bundle.manifest.v1"` | | `bundleType` | `"tree-calculus-executable-object"` | | `treeCalculus` | `"tree-calculus.v1"` | | `treeHashAlgorithm` | `"sha256"` | | `treeHashDomain` | `"arboricx.merkle.node.v1"` | | `treeNodePayload` | `"arboricx.merkle.payload.v1"` | | `runtimeSemantics` | `"tree-calculus.v1"` | | `runtimeAbi` | `"arboricx.abi.tree.v1"` | | `runtimeCapabilities` | Empty array | | `closure` | `0` (complete) | | `rootCount` | At least 1 | | `exportCount` | At least 1 | | Export names | Non-empty | | Export roots | Non-empty (32 bytes each) | --- ## 5. Section: Nodes (type 2) The nodes section contains all Merkle DAG nodes referenced by the manifest. It is a sequence of node entries preceded by a count. ``` NodesSection = nodeCount u64 BE Total number of node entries entries NodeEntry[] ``` Each node entry: ``` NodeEntry = hash 32 bytes Raw SHA-256 hash of this node payloadLen u32 BE Length of the payload in bytes payload byte[payloadLen] Node payload (see Section 6) ``` The node count is `u64` to support large bundles. Entries are stored in the order produced by the exporter (typically sorted by hash for determinism). --- ## 6. Merkle Node Payload Format Each node in the Merkle DAG is one of three types. The payload is a single byte type tag followed by hash references: ### Leaf ``` Payload = 0x00 ``` A leaf has no children. The payload is exactly 1 byte. ### Stem ``` Payload = 0x01 || child_hash (32 bytes raw) ``` A stem has exactly one child. The payload is 33 bytes. ### Fork ``` Payload = 0x02 || left_hash (32 bytes raw) || right_hash (32 bytes raw) ``` A fork has exactly two children. The payload is 65 bytes. **Validation:** - Leaf payloads must be exactly 1 byte (`0x00`). - Stem payloads must be exactly 33 bytes. - Fork payloads must be exactly 65 bytes. - Unknown type bytes are rejected. --- ## 7. Merkle Hash Computation Each node is identified by a SHA-256 hash of its canonical payload: ``` hash = SHA256( domain_tag || 0x00 || payload ) ``` Where: | Component | Value | |-----------|-------| | `domain_tag` | `"arboricx.merkle.node.v1"` as UTF-8 bytes | | Separator | `0x00` (one zero byte) | | `payload` | The node's canonical serialization from Section 6 | **Examples:** - **Leaf:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x00)` - **Stem:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x01 || child_hash_bytes)` - **Fork:** `SHA256("arboricx.merkle.node.v1" || 0x00 || 0x02 || left_hash_bytes || right_hash_bytes)` The resulting SHA-256 hash is stored as a hex-encoded string in the manifest (64 hex characters). Within the nodes section, it is stored as raw bytes. --- ## 8. Tree Calculus Reduction Semantics The bundle represents a **Tree Calculus** term as a Merkle DAG. The reduction rules are: ### Apply Rules ``` apply(Fork(Leaf, a), _) = a apply(Fork(Stem(a), b), c) = apply(apply(a, c), apply(b, c)) apply(Fork(Fork, _, _), Leaf) = left of inner Fork apply(Fork(Fork, _, _), Stem) = right of inner Fork apply(Fork(Fork, _, _), Fork) = apply(apply(c, u), v) where c = Fork(u, v) apply(Leaf, b) = Stem(b) apply(Stem(a), b) = Fork(a, b) ``` ### Internal Representation In the reduction engine, Fork nodes use a `[right, left]` (stack) ordering: - `Fork = [right_child, left_child]` - `Stem = [child]` - `Leaf = []` This ordering supports stack-based reduction: pop two terms, apply, push results back. ### Closure The bundle declares `closure = "complete"`, meaning all nodes reachable from export roots are present in the nodes section. No external references exist. --- ## 9. Binary Primitives All multi-byte integers use **big-endian** byte order. ### u16 (2 bytes) ``` byte[0] | byte[1] value = (byte[0] << 8) | byte[1] ``` ### u32 (4 bytes) ``` byte[0] | byte[1] | byte[2] | byte[3] value = (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3] ``` ### u64 (8 bytes) ``` byte[0] ... byte[7] value = (byte[0] << 56) | ... | byte[7] ``` ### u8 (1 byte) A single byte, value `0-255`. --- ## 10. Bundle Verification A complete bundle verification proceeds in this order: 1. **Magic check:** First 8 bytes must be `"ARBORICX"`. 2. **Version check:** Major version must be `1`. 3. **Section directory:** Parse all entries; reject unknown critical sections. 4. **Digest verification:** For each section, compute `SHA256(section_data)` and compare with the digest in the directory entry. 5. **Manifest parsing:** Decode the fixed-order manifest; validate semantic constraints. 6. **Node section:** Parse all node entries; reject duplicates. 7. **Root verification:** All root hashes from the manifest must exist in the node map. 8. **Export verification:** All export root hashes must exist in the node map. 9. **Node hash verification:** For each node, compute `SHA256(domain || 0x00 || payload)` and compare with the stored hash. 10. **Children verification:** For each Stem/Fork node, both child hashes must exist in the node map. 11. **Closure verification:** Starting from each root hash, traverse the DAG and confirm all reachable nodes are present. --- ## 11. Known Section Types | Type | Name | Required | Version | Description | |------|------|----------|---------|-------------| | 1 | Manifest | Yes | 1 | Bundle metadata in fixed-order binary format | | 2 | Nodes | Yes | 1 | Merkle DAG node entries | Unknown section types are permitted if not marked as critical (flags bit 0 is not set). --- ## Appendix A: Complete Example Layout (id.arboricx) A minimal `id.arboricx` bundle has: ``` +---------------------------------------------------+ | Header (32 bytes) | | Magic: "ARBORICX" | | Major: 1, Minor: 0 | | Section count: 2 | | Flags: 0 | | Dir offset: 32 | +---------------------------------------------------+ | Section Directory (120 bytes = 2 × 60) | | Entry 0: type=1 (manifest), offset=152, len=375 | | Entry 1: type=2 (nodes), offset=527, len=284 | +---------------------------------------------------+ | Manifest Section (375 bytes) | | Magic: "ARBMNFST" | | Version: 1.0 | | Core strings (schema, bundleType, tree spec, | | runtime spec, capabilities, closure, roots, | | exports, metadata TLVs, extension fields) | +---------------------------------------------------+ | Nodes Section (284 bytes) | | Node count: 2 | | Node entry 1: hash + payload (Leaf) | | Node entry 2: hash + payload (Fork) | +---------------------------------------------------+ ``` The manifest section starts at byte 152 (0x98) and the nodes section at byte 527 (0x20F). --- ## Appendix B: File Extension Bundles produced by the `tricu` tool use the `.arboricx` file extension. The `.tri` extension is used for plain source files; the `.arboricx` extension identifies the portable binary format.