We don't need SHA verification or Merkle dags in our transport bundle. Content stores can handle both bundle and term verification and hashing.
11 KiB
Arboricx Portable Bundle Format Specification
Version: 1.1 (Indexed)
Status: Stable
Author: Slopmachines guided by James Eversole
The Arboricx Portable Bundle is a self-contained binary format for distributing Tree Calculus programs. It uses topological indexing instead of cryptographic hashing for node identity, making it writable from pure Tree Calculus and verifiable via structural inspection.
Table of Contents
- Design Principles
- Top-Level Container Layout
- Header
- Section Directory
- Section: Manifest (type 1)
- Section: Nodes (type 2)
- Node Payload Format
- Tree Calculus Reduction Semantics
- Binary Primitives
- Bundle Verification
- Canonicalization
- Known Section Types
1. Design Principles
- No cryptographic primitives required. Node identity is topological (array index), not a SHA-256 hash.
- Self-contained. A bundle includes all nodes reachable from its exports. No external references.
- Deterministic. Canonical bundles produce byte-identical output for identical input terms.
- Small. ~5 bytes per node entry (length + payload) versus ~36 bytes in hash-based formats.
- Verifiable via structure. Bounds checking and acyclicity verification replace hash recomputation.
Global artifact identity (for registries, lockfiles, or content-addressed caches) is achieved by hashing the complete canonical bundle file externally. The bundle format itself knows nothing about this hash.
2. Top-Level Container Layout
+------------------+------------------+------------------+------------------+
| Header | Section Directory| Manifest Section | Nodes Section |
| (32 bytes) | (N × 32 bytes) | (variable) | (variable) |
+------------------+------------------+------------------+------------------+
Total bundle size = 32 + (sectionCount × 32) + manifestSize + nodesSize
All multi-byte integers use big-endian byte order.
3. Header
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 8 bytes | Magic | ASCII "ARBORICX" |
| 8 | 2 bytes | Major version | u16 BE. Currently 1 |
| 10 | 2 bytes | Minor version | u16 BE. Currently 0 |
| 12 | 4 bytes | Section count | u32 BE. Number of entries in the section directory |
| 16 | 8 bytes | Flags | u64 BE. Reserved; currently all zeros |
| 24 | 8 bytes | Directory offset | u64 BE. Byte offset to the section directory (always 32) |
4. Section Directory
Array of N entries, each exactly 32 bytes.
| Offset (within entry) | Size | Field | Description |
|---|---|---|---|
| 0 | 4 bytes | Type | u32 BE. Section type identifier |
| 4 | 2 bytes | Version | u16 BE. Section-specific version |
| 6 | 2 bytes | Flags | u16 BE. Bit 0 (0x0001) = critical section |
| 8 | 2 bytes | Compression | u16 BE. 0 = none (currently the only value) |
| 10 | 2 bytes | Reserved | u16 BE. Padding; must be zero |
| 12 | 8 bytes | Offset | u64 BE. Byte offset from bundle start to section data |
| 20 | 8 bytes | Length | u64 BE. Length of section data in bytes |
| 28 | 4 bytes | Reserved | Padding; must be zero |
Verification:
- Unknown critical sections are rejected.
- Compression must be
0(none). - Reserved fields must be zero.
Note: No per-section digest is stored. Integrity is verified at the distribution layer (e.g. SHA-256 of the complete bundle file) rather than inside the container.
5. Section: Manifest (type 1)
Binary encoding of bundle metadata. Fixed-order core layout followed by optional TLV tail.
Manifest =
magic 8 bytes "ARBMNFST"
major u16 BE Manifest major version (1)
minor u16 BE Manifest minor version (1)
schema string "arboricx.bundle.manifest.v1"
bundleType string "tree-calculus-executable-object"
treeCalculus string "tree-calculus.v1"
treeHashAlgorithm string "indexed"
treeHashDomain string "arboricx.indexed.node.v1"
treeNodePayload string "arboricx.indexed.payload.v1"
runtimeSemantics string "tree-calculus.v1"
runtimeEvaluation string "normal-order"
runtimeAbi string "arboricx.abi.tree.v1"
capabilityCount u32 BE Number of capability strings (currently 0)
capabilities string[] Array of length-prefixed UTF-8 strings
closure u8 0 = complete
rootCount u32 BE Number of root entries
roots Root[] Array of root entries
exportCount u32 BE Number of export entries
exports Export[] Array of export entries
metadataFieldCount u32 BE Number of metadata TLV entries
metadataFields TLV[] Metadata tag-value entries
extensionFieldCount u32 BE Number of extension TLV entries (currently 0)
extensionFields TLV[] Extension entries (skipped by parsers)
String Format
string =
length u32 BE Number of UTF-8 bytes
bytes byte[length] UTF-8 content
Root Entry
Root =
index u32 BE Node index into the nodes section
role string Length-prefixed UTF-8 ("default" for first root, "root" for others)
Export Entry
Export =
name string Length-prefixed UTF-8 export identifier
root u32 BE Node index into the nodes section
kind string Length-prefixed UTF-8 (currently "term")
abi string Length-prefixed UTF-8 ABI string
TLV Entry
TLV =
tag u16 BE Tag identifier
length u32 BE Value length in bytes
value byte[length]
Metadata Tags
| Tag | Name | Value |
|---|---|---|
| 1 | package | UTF-8 text |
| 2 | version | UTF-8 text |
| 3 | description | UTF-8 text |
| 4 | license | UTF-8 text |
| 5 | createdBy | UTF-8 text |
Unknown metadata tags are ignored. Unknown extension tags are skipped by length.
Semantic Constraints
| Constraint | Value |
|---|---|
schema |
"arboricx.bundle.manifest.v1" |
bundleType |
"tree-calculus-executable-object" |
treeCalculus |
"tree-calculus.v1" |
treeHashAlgorithm |
"indexed" |
treeHashDomain |
"arboricx.indexed.node.v1" |
treeNodePayload |
"arboricx.indexed.payload.v1" |
runtimeSemantics |
"tree-calculus.v1" |
runtimeAbi |
"arboricx.abi.tree.v1" |
closure |
0 (complete) |
rootCount |
At least 1 |
exportCount |
At least 1 |
6. Section: Nodes (type 2)
NodesSection =
nodeCount u64 BE Total number of node entries
entries NodeEntry[]
Node Entry
NodeEntry =
payloadLen u32 BE Length of payload in bytes
payload byte[payloadLen]
There is no hash field. The node is identified solely by its position in the array.
7. Node Payload Format
Child references are u32 big-endian indices into the node array. The array must be topologically sorted: every child index must be strictly less than the entry's own position.
Leaf
Payload = 0x00
Exactly 1 byte.
Stem
Payload = 0x01 || child_index (u32 BE)
Exactly 5 bytes.
Fork
Payload = 0x02 || left_index (u32 BE) || right_index (u32 BE)
Exactly 9 bytes.
8. Tree Calculus Reduction Semantics
The bundle represents a Tree Calculus term. The reduction rules are:
The t operator is left associative.
1. t t a b -> a
2. t (t a) b c -> a c (b c)
3a. t (t a b) c t -> a
3b. t (t a b) c (t u) -> b u
3c. t (t a b) c (t u v) -> c u v
Closure: The bundle declares closure = "complete", meaning all nodes reachable from export roots are present in the nodes section. No external references exist.
9. Binary Primitives
u8
Single byte, value 0-255.
u16 (2 bytes)
value = (byte[0] << 8) | byte[1]
u32 (4 bytes)
value = (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3]
u64 (8 bytes)
value = (byte[0] << 56) | ... | byte[7]
10. Bundle Verification
- Magic check: First 8 bytes must be
"ARBORICX". - Version check: Major version must be
1. - Section directory: Parse all entries; reject unknown critical sections. Verify reserved fields are zero.
- Manifest parsing: Decode fixed-order manifest; validate semantic constraints.
- Nodes section: Parse all entries.
- Bounds checking:
- Every root index
< nodeCount - Every export index
< nodeCount - In every Stem payload,
child_index < entry_positionandchild_index < nodeCount - In every Fork payload, both indices
< entry_positionand< nodeCount
- Every root index
- Acyclicity: Guaranteed by the
child < parentrule above. - Closure: Traverse from all root/export indices; confirm every reached index is valid.
No hash computation is required.
11. Canonicalization
A bundle is canonical iff:
- Maximal deduplication. No two entries represent structurally identical subtrees.
- Topological order. Children precede parents.
- Deterministic post-order traversal. Nodes are emitted in the order discovered by a left-to-right recursive post-order walk.
- No trailing bytes in any section.
- Reserved fields are zero.
Canonical bundles produce deterministic bytes and can be file-level hashed for global identity.
12. Known Section Types
| Type | Name | Required | Version | Description |
|---|---|---|---|---|
| 1 | Manifest | Yes | 1 | Bundle metadata |
| 2 | Nodes | Yes | 1 | Topological DAG node entries |
Unknown section types are permitted if not marked critical.
Appendix A: Complete Example Layout
A minimal bundle for Stem(Leaf) (the Tree Calculus encoding of t t):
+---------------------------------------------------+
| Header (32 bytes) |
| Magic: "ARBORICX" |
| Major: 1, Minor: 0 |
| Section count: 2 |
| Flags: 0 |
| Dir offset: 32 |
+---------------------------------------------------+
| Section Directory (64 bytes = 2 × 32) |
| Entry 0: type=1 (manifest), offset=96, len=~200 |
| Entry 1: type=2 (nodes), offset=~296, len=10 |
+---------------------------------------------------+
| Manifest Section (~200 bytes) |
| Magic: "ARBMNFST", Version: 1.1 |
| Schema, bundleType, tree spec, runtime spec |
| Closure: 0, Roots: [1], Exports: ["main" -> 1] |
| Metadata TLVs, zero extension fields |
+---------------------------------------------------+
| Nodes Section (10 bytes) |
| Node count: 2 |
| Entry 0: payloadLen=1, payload=[0x00] |
| Entry 1: payloadLen=5, payload=[0x01, 0,0,0,0] |
+---------------------------------------------------+
Appendix B: File Extension
Bundles use the .arboricx file extension. Plain source files use .tri.