14 KiB
Arboricx Portable Bundle Format Specification
Version: 0.1 Status: Exploratory Author: A range of slopmachines guided by James Eversole Human Review Status: 5 minute scan-through - this is an evolving and malleable document
The Arboricx Portable Bundle is a self-contained, content-addressed binary format for distributing Tree Calculus programs and their associated Merkle DAGs. It provides:
- A fixed binary container with header, section directory, and typed sections
- A language-neutral Merkle node layer for content-addressed tree values
- A fixed-order binary manifest for semantic metadata, exports, and optional extensions
Table of Contents
- Top-Level Container Layout
- Header
- Section Directory
- Section: Manifest (type 1)
- Section: Nodes (type 2)
- Merkle Node Payload Format
- Merkle Hash Computation
- Tree Calculus Reduction Semantics
- Binary Primitives
- Bundle Verification
- Known Section Types
1. Top-Level Container Layout
An Arboricx bundle is a flat binary blob with the following layout:
+------------------+------------------+------------------+------------------+
| Header | Section Directory| Manifest Section | Nodes Section |
| (32 bytes) | (N × 60 bytes) | (variable) | (variable) |
+------------------+------------------+------------------+------------------+
The container uses big-endian byte order for all multi-byte integers.
Total bundle size = 32 + (sectionCount × 60) + manifestSize + nodesSize
2. Header
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 8 bytes | Magic | ASCII "ARBORICX" (0x41 0x52 0x42 0x4F 0x52 0x49 0x43 0x58) |
| 8 | 2 bytes | Major version | u16 BE. Currently 1 |
| 10 | 2 bytes | Minor version | u16 BE. Currently 0 |
| 12 | 4 bytes | Section count | u32 BE. Number of entries in the section directory |
| 16 | 8 bytes | Flags | u64 BE. Reserved; currently all zeros |
| 24 | 8 bytes | Directory offset | u64 BE. Byte offset from the start of the bundle to the section directory |
Constraints:
- Major version must be
1. Bundles with unsupported major versions are rejected. - The directory offset must point to a valid location within the bundle.
- The directory offset is always
32for bundles with the current layout (header immediately followed by the directory).
3. Section Directory
The section directory is an array of N entries, where N is the section count from the header. Each entry is exactly 60 bytes.
| Offset (within entry) | Size | Field | Description |
|---|---|---|---|
| 0 | 4 bytes | Type | u32 BE. Section type identifier (see Known Section Types) |
| 4 | 2 bytes | Version | u16 BE. Section-specific version |
| 6 | 2 bytes | Flags | u16 BE. Bit flags: bit 0 (0x0001) = critical section |
| 8 | 2 bytes | Compression | u16 BE. Compression codec (currently only 0 = none) |
| 10 | 2 bytes | Digest algorithm | u16 BE. Hash algorithm (currently only 1 = SHA-256) |
| 12 | 8 bytes | Offset | u64 BE. Byte offset from the start of the bundle to the section data |
| 20 | 8 bytes | Length | u64 BE. Length of the section data in bytes |
| 28 | 32 bytes | SHA-256 digest | Raw digest of the section data |
Verification:
- Unknown critical sections (flags &
0x0001) are rejected. - Compression must be
0(none). - Digest algorithm must be
1(SHA-256). - The SHA-256 digest in the directory entry must match
SHA256(section_data).
4. Section: Manifest (type 1)
The manifest is a binary encoding of bundle metadata. It uses a fixed-order core layout followed by an optional TLV tail for extensibility.
4.1 Format
Manifest =
magic 8 bytes "ARBMNFST"
major u16 BE Manifest major version (1)
minor u16 BE Manifest minor version (0)
schema string Length-prefixed UTF-8 text
bundleType string Length-prefixed UTF-8 text
treeCalculus string Length-prefixed UTF-8 text
treeHashAlgorithm string Length-prefixed UTF-8 text
treeHashDomain string Length-prefixed UTF-8 text
treeNodePayload string Length-prefixed UTF-8 text
runtimeSemantics string Length-prefixed UTF-8 text
runtimeEvaluation string Length-prefixed UTF-8 text
runtimeAbi string Length-prefixed UTF-8 text
capabilityCount u32 BE Number of capability strings
capabilities string[] Array of length-prefixed UTF-8 capability strings
closure u8 0 = complete, 1 = partial
rootCount u32 BE Number of root entries
roots Root[] Array of root entries
exportCount u32 BE Number of export entries
exports Export[] Array of export entries
metadataFieldCount u32 BE Number of metadata TLV entries
metadataFields TLV[] Metadata tag-value entries
extensionFieldCount u32 BE Number of extension TLV entries
extensionFields TLV[] Extension tag-value entries (skipped by parsers)
Trailing bytes after the manifest must be zero (no leftover data).
4.2 String Format
Every string field uses the same encoding:
string =
length u32 BE Number of UTF-8 bytes in the string (not the number of characters)
bytes byte[length] UTF-8 encoded string content
The length field carries the byte count, so parsers can skip strings without decoding UTF-8.
4.3 Root Entry
Root =
hash 32 bytes Raw SHA-256 hash of the Merkle node
role string Length-prefixed UTF-8 text ("default" for the first root, "root" for others)
The hash is stored as raw bytes (not hex-encoded). It corresponds to the Merkle hash of the node.
4.4 Export Entry
Export =
name string Length-prefixed UTF-8 text (export identifier)
root 32 bytes Raw SHA-256 hash of the Merkle node
kind string Length-prefixed UTF-8 text (currently "term")
abi string Length-prefixed UTF-8 text (ABI string)
4.5 TLV Entry
TLV =
tag u16 BE Tag identifier (type)
length u32 BE Number of bytes in the value
value byte[length] Raw bytes
TLV entries support variable-length values and are skippable by parsers that do not recognize a tag: read the u32 length and advance by 2 + 4 + length bytes.
4.6 Metadata Tags
| Tag | Name | Value |
|---|---|---|
| 1 | package | UTF-8 text: package name |
| 2 | version | UTF-8 text: version string |
| 3 | description | UTF-8 text: description |
| 4 | license | UTF-8 text: license identifier or text |
| 5 | createdBy | UTF-8 text: creator identifier |
Unknown metadata tags are ignored. Unknown extension tags are skipped by length.
4.7 Semantic Constraints
A valid bundle manifest must satisfy:
| Constraint | Value |
|---|---|
schema |
"arboricx.bundle.manifest.v1" |
bundleType |
"tree-calculus-executable-object" |
treeCalculus |
"tree-calculus.v1" |
treeHashAlgorithm |
"sha256" |
treeHashDomain |
"arboricx.merkle.node.v1" |
treeNodePayload |
"arboricx.merkle.payload.v1" |
runtimeSemantics |
"tree-calculus.v1" |
runtimeAbi |
"arboricx.abi.tree.v1" |
runtimeCapabilities |
Empty array |
closure |
0 (complete) |
rootCount |
At least 1 |
exportCount |
At least 1 |
| Export names | Non-empty |
| Export roots | Non-empty (32 bytes each) |
5. Section: Nodes (type 2)
The nodes section contains all Merkle DAG nodes referenced by the manifest. It is a sequence of node entries preceded by a count.
NodesSection =
nodeCount u64 BE Total number of node entries
entries NodeEntry[]
Each node entry:
NodeEntry =
hash 32 bytes Raw SHA-256 hash of this node
payloadLen u32 BE Length of the payload in bytes
payload byte[payloadLen] Node payload (see Section 6)
The node count is u64 to support large bundles. Entries are stored in the order produced by the exporter (typically sorted by hash for determinism).
6. Merkle Node Payload Format
Each node in the Merkle DAG is one of three types. The payload is a single byte type tag followed by hash references:
Leaf
Payload = 0x00
A leaf has no children. The payload is exactly 1 byte.
Stem
Payload = 0x01 || child_hash (32 bytes raw)
A stem has exactly one child. The payload is 33 bytes.
Fork
Payload = 0x02 || left_hash (32 bytes raw) || right_hash (32 bytes raw)
A fork has exactly two children. The payload is 65 bytes.
Validation:
- Leaf payloads must be exactly 1 byte (
0x00). - Stem payloads must be exactly 33 bytes.
- Fork payloads must be exactly 65 bytes.
- Unknown type bytes are rejected.
7. Merkle Hash Computation
Each node is identified by a SHA-256 hash of its canonical payload:
hash = SHA256( domain_tag || 0x00 || payload )
Where:
| Component | Value |
|---|---|
domain_tag |
"arboricx.merkle.node.v1" as UTF-8 bytes |
| Separator | 0x00 (one zero byte) |
payload |
The node's canonical serialization from Section 6 |
Examples:
- Leaf:
SHA256("arboricx.merkle.node.v1" || 0x00 || 0x00) - Stem:
SHA256("arboricx.merkle.node.v1" || 0x00 || 0x01 || child_hash_bytes) - Fork:
SHA256("arboricx.merkle.node.v1" || 0x00 || 0x02 || left_hash_bytes || right_hash_bytes)
The resulting SHA-256 hash is stored as a hex-encoded string in the manifest (64 hex characters). Within the nodes section, it is stored as raw bytes.
8. Tree Calculus Reduction Semantics
The bundle represents a Tree Calculus term as a Merkle DAG. The reduction rules are:
Apply Rules
apply(Fork(Leaf, a), _) = a
apply(Fork(Stem(a), b), c) = apply(apply(a, c), apply(b, c))
apply(Fork(Fork, _, _), Leaf) = left of inner Fork
apply(Fork(Fork, _, _), Stem) = right of inner Fork
apply(Fork(Fork, _, _), Fork) = apply(apply(c, u), v) where c = Fork(u, v)
apply(Leaf, b) = Stem(b)
apply(Stem(a), b) = Fork(a, b)
Internal Representation
In the reduction engine, Fork nodes use a [right, left] (stack) ordering:
Fork = [right_child, left_child]Stem = [child]Leaf = []
This ordering supports stack-based reduction: pop two terms, apply, push results back.
Closure
The bundle declares closure = "complete", meaning all nodes reachable from export roots are present in the nodes section. No external references exist.
9. Binary Primitives
All multi-byte integers use big-endian byte order.
u16 (2 bytes)
byte[0] | byte[1]
value = (byte[0] << 8) | byte[1]
u32 (4 bytes)
byte[0] | byte[1] | byte[2] | byte[3]
value = (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3]
u64 (8 bytes)
byte[0] ... byte[7]
value = (byte[0] << 56) | ... | byte[7]
u8 (1 byte)
A single byte, value 0-255.
10. Bundle Verification
A complete bundle verification proceeds in this order:
- Magic check: First 8 bytes must be
"ARBORICX". - Version check: Major version must be
1. - Section directory: Parse all entries; reject unknown critical sections.
- Digest verification: For each section, compute
SHA256(section_data)and compare with the digest in the directory entry. - Manifest parsing: Decode the fixed-order manifest; validate semantic constraints.
- Node section: Parse all node entries; reject duplicates.
- Root verification: All root hashes from the manifest must exist in the node map.
- Export verification: All export root hashes must exist in the node map.
- Node hash verification: For each node, compute
SHA256(domain || 0x00 || payload)and compare with the stored hash. - Children verification: For each Stem/Fork node, both child hashes must exist in the node map.
- Closure verification: Starting from each root hash, traverse the DAG and confirm all reachable nodes are present.
11. Known Section Types
| Type | Name | Required | Version | Description |
|---|---|---|---|---|
| 1 | Manifest | Yes | 1 | Bundle metadata in fixed-order binary format |
| 2 | Nodes | Yes | 1 | Merkle DAG node entries |
Unknown section types are permitted if not marked as critical (flags bit 0 is not set).
Appendix A: Complete Example Layout (id.arboricx)
A minimal id.arboricx bundle has:
+---------------------------------------------------+
| Header (32 bytes) |
| Magic: "ARBORICX" |
| Major: 1, Minor: 0 |
| Section count: 2 |
| Flags: 0 |
| Dir offset: 32 |
+---------------------------------------------------+
| Section Directory (120 bytes = 2 × 60) |
| Entry 0: type=1 (manifest), offset=152, len=375 |
| Entry 1: type=2 (nodes), offset=527, len=284 |
+---------------------------------------------------+
| Manifest Section (375 bytes) |
| Magic: "ARBMNFST" |
| Version: 1.0 |
| Core strings (schema, bundleType, tree spec, |
| runtime spec, capabilities, closure, roots, |
| exports, metadata TLVs, extension fields) |
+---------------------------------------------------+
| Nodes Section (284 bytes) |
| Node count: 2 |
| Node entry 1: hash + payload (Leaf) |
| Node entry 2: hash + payload (Fork) |
+---------------------------------------------------+
The manifest section starts at byte 152 (0x98) and the nodes section at byte 527 (0x20F).
Appendix B: File Extension
Bundles produced by the tricu tool use the .arboricx file extension. The .tri extension is used for plain source files; the .arboricx extension identifies the portable binary format.