Files
tricu/docs/arboricx-bundle-format.md
James Eversole 31bf7094f4 Arboricx bundle format 1.1
We don't need SHA verification or Merkle dags in our transport bundle. Content
stores can handle both bundle and term verification and hashing.
2026-05-12 15:18:29 -05:00

11 KiB
Raw Permalink Blame History

Arboricx Portable Bundle Format Specification

Version: 1.1 (Indexed)

Status: Stable

Author: Slopmachines guided by James Eversole

The Arboricx Portable Bundle is a self-contained binary format for distributing Tree Calculus programs. It uses topological indexing instead of cryptographic hashing for node identity, making it writable from pure Tree Calculus and verifiable via structural inspection.

Table of Contents

  1. Design Principles
  2. Top-Level Container Layout
  3. Header
  4. Section Directory
  5. Section: Manifest (type 1)
  6. Section: Nodes (type 2)
  7. Node Payload Format
  8. Tree Calculus Reduction Semantics
  9. Binary Primitives
  10. Bundle Verification
  11. Canonicalization
  12. Known Section Types

1. Design Principles

  • No cryptographic primitives required. Node identity is topological (array index), not a SHA-256 hash.
  • Self-contained. A bundle includes all nodes reachable from its exports. No external references.
  • Deterministic. Canonical bundles produce byte-identical output for identical input terms.
  • Small. ~5 bytes per node entry (length + payload) versus ~36 bytes in hash-based formats.
  • Verifiable via structure. Bounds checking and acyclicity verification replace hash recomputation.

Global artifact identity (for registries, lockfiles, or content-addressed caches) is achieved by hashing the complete canonical bundle file externally. The bundle format itself knows nothing about this hash.


2. Top-Level Container Layout

+------------------+------------------+------------------+------------------+
| Header           | Section Directory| Manifest Section | Nodes Section    |
| (32 bytes)       | (N × 32 bytes)   | (variable)       | (variable)       |
+------------------+------------------+------------------+------------------+

Total bundle size = 32 + (sectionCount × 32) + manifestSize + nodesSize

All multi-byte integers use big-endian byte order.


3. Header

Offset Size Field Description
0 8 bytes Magic ASCII "ARBORICX"
8 2 bytes Major version u16 BE. Currently 1
10 2 bytes Minor version u16 BE. Currently 0
12 4 bytes Section count u32 BE. Number of entries in the section directory
16 8 bytes Flags u64 BE. Reserved; currently all zeros
24 8 bytes Directory offset u64 BE. Byte offset to the section directory (always 32)

4. Section Directory

Array of N entries, each exactly 32 bytes.

Offset (within entry) Size Field Description
0 4 bytes Type u32 BE. Section type identifier
4 2 bytes Version u16 BE. Section-specific version
6 2 bytes Flags u16 BE. Bit 0 (0x0001) = critical section
8 2 bytes Compression u16 BE. 0 = none (currently the only value)
10 2 bytes Reserved u16 BE. Padding; must be zero
12 8 bytes Offset u64 BE. Byte offset from bundle start to section data
20 8 bytes Length u64 BE. Length of section data in bytes
28 4 bytes Reserved Padding; must be zero

Verification:

  • Unknown critical sections are rejected.
  • Compression must be 0 (none).
  • Reserved fields must be zero.

Note: No per-section digest is stored. Integrity is verified at the distribution layer (e.g. SHA-256 of the complete bundle file) rather than inside the container.


5. Section: Manifest (type 1)

Binary encoding of bundle metadata. Fixed-order core layout followed by optional TLV tail.

Manifest =
  magic            8 bytes    "ARBMNFST"
  major            u16 BE     Manifest major version (1)
  minor            u16 BE     Manifest minor version (1)

  schema           string     "arboricx.bundle.manifest.v1"
  bundleType       string     "tree-calculus-executable-object"

  treeCalculus     string     "tree-calculus.v1"
  treeHashAlgorithm string    "indexed"
  treeHashDomain   string     "arboricx.indexed.node.v1"
  treeNodePayload  string     "arboricx.indexed.payload.v1"

  runtimeSemantics string     "tree-calculus.v1"
  runtimeEvaluation string    "normal-order"
  runtimeAbi       string     "arboricx.abi.tree.v1"
  capabilityCount  u32 BE     Number of capability strings (currently 0)
  capabilities     string[]   Array of length-prefixed UTF-8 strings

  closure          u8         0 = complete
  rootCount        u32 BE     Number of root entries
  roots            Root[]     Array of root entries
  exportCount      u32 BE     Number of export entries
  exports          Export[]   Array of export entries

  metadataFieldCount u32 BE   Number of metadata TLV entries
  metadataFields   TLV[]      Metadata tag-value entries
  extensionFieldCount u32 BE  Number of extension TLV entries (currently 0)
  extensionFields  TLV[]      Extension entries (skipped by parsers)

String Format

string =
  length      u32 BE    Number of UTF-8 bytes
  bytes       byte[length]  UTF-8 content

Root Entry

Root =
  index       u32 BE     Node index into the nodes section
  role        string     Length-prefixed UTF-8 ("default" for first root, "root" for others)

Export Entry

Export =
  name        string     Length-prefixed UTF-8 export identifier
  root        u32 BE     Node index into the nodes section
  kind        string     Length-prefixed UTF-8 (currently "term")
  abi         string     Length-prefixed UTF-8 ABI string

TLV Entry

TLV =
  tag         u16 BE     Tag identifier
  length      u32 BE     Value length in bytes
  value       byte[length]

Metadata Tags

Tag Name Value
1 package UTF-8 text
2 version UTF-8 text
3 description UTF-8 text
4 license UTF-8 text
5 createdBy UTF-8 text

Unknown metadata tags are ignored. Unknown extension tags are skipped by length.

Semantic Constraints

Constraint Value
schema "arboricx.bundle.manifest.v1"
bundleType "tree-calculus-executable-object"
treeCalculus "tree-calculus.v1"
treeHashAlgorithm "indexed"
treeHashDomain "arboricx.indexed.node.v1"
treeNodePayload "arboricx.indexed.payload.v1"
runtimeSemantics "tree-calculus.v1"
runtimeAbi "arboricx.abi.tree.v1"
closure 0 (complete)
rootCount At least 1
exportCount At least 1

6. Section: Nodes (type 2)

NodesSection =
  nodeCount     u64 BE    Total number of node entries
  entries       NodeEntry[]

Node Entry

NodeEntry =
  payloadLen    u32 BE    Length of payload in bytes
  payload       byte[payloadLen]

There is no hash field. The node is identified solely by its position in the array.


7. Node Payload Format

Child references are u32 big-endian indices into the node array. The array must be topologically sorted: every child index must be strictly less than the entry's own position.

Leaf

Payload = 0x00

Exactly 1 byte.

Stem

Payload = 0x01 || child_index (u32 BE)

Exactly 5 bytes.

Fork

Payload = 0x02 || left_index (u32 BE) || right_index (u32 BE)

Exactly 9 bytes.


8. Tree Calculus Reduction Semantics

The bundle represents a Tree Calculus term. The reduction rules are:

The t operator is left associative.
1.  t  t      a b       -> a
2.  t (t a)   b c       -> a c (b c)
3a. t (t a b) c t       -> a
3b. t (t a b) c (t u)   -> b u
3c. t (t a b) c (t u v) -> c u v

Closure: The bundle declares closure = "complete", meaning all nodes reachable from export roots are present in the nodes section. No external references exist.


9. Binary Primitives

u8

Single byte, value 0-255.

u16 (2 bytes)

value = (byte[0] << 8) | byte[1]

u32 (4 bytes)

value = (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3]

u64 (8 bytes)

value = (byte[0] << 56) | ... | byte[7]

10. Bundle Verification

  1. Magic check: First 8 bytes must be "ARBORICX".
  2. Version check: Major version must be 1.
  3. Section directory: Parse all entries; reject unknown critical sections. Verify reserved fields are zero.
  4. Manifest parsing: Decode fixed-order manifest; validate semantic constraints.
  5. Nodes section: Parse all entries.
  6. Bounds checking:
    • Every root index < nodeCount
    • Every export index < nodeCount
    • In every Stem payload, child_index < entry_position and child_index < nodeCount
    • In every Fork payload, both indices < entry_position and < nodeCount
  7. Acyclicity: Guaranteed by the child < parent rule above.
  8. Closure: Traverse from all root/export indices; confirm every reached index is valid.

No hash computation is required.


11. Canonicalization

A bundle is canonical iff:

  1. Maximal deduplication. No two entries represent structurally identical subtrees.
  2. Topological order. Children precede parents.
  3. Deterministic post-order traversal. Nodes are emitted in the order discovered by a left-to-right recursive post-order walk.
  4. No trailing bytes in any section.
  5. Reserved fields are zero.

Canonical bundles produce deterministic bytes and can be file-level hashed for global identity.


12. Known Section Types

Type Name Required Version Description
1 Manifest Yes 1 Bundle metadata
2 Nodes Yes 1 Topological DAG node entries

Unknown section types are permitted if not marked critical.


Appendix A: Complete Example Layout

A minimal bundle for Stem(Leaf) (the Tree Calculus encoding of t t):

+---------------------------------------------------+
| Header (32 bytes)                                  |
|   Magic: "ARBORICX"                               |
|   Major: 1, Minor: 0                              |
|   Section count: 2                                 |
|   Flags: 0                                         |
|   Dir offset: 32                                   |
+---------------------------------------------------+
| Section Directory (64 bytes = 2 × 32)              |
|   Entry 0: type=1 (manifest), offset=96, len=~200 |
|   Entry 1: type=2 (nodes),  offset=~296, len=10   |
+---------------------------------------------------+
| Manifest Section (~200 bytes)                      |
|   Magic: "ARBMNFST", Version: 1.1                 |
|   Schema, bundleType, tree spec, runtime spec      |
|   Closure: 0, Roots: [1], Exports: ["main" -> 1]  |
|   Metadata TLVs, zero extension fields             |
+---------------------------------------------------+
| Nodes Section (10 bytes)                           |
|   Node count: 2                                    |
|   Entry 0: payloadLen=1, payload=[0x00]            |
|   Entry 1: payloadLen=5, payload=[0x01, 0,0,0,0]   |
+---------------------------------------------------+

Appendix B: File Extension

Bundles use the .arboricx file extension. Plain source files use .tri.