Files
tricu/docs/arboricx-bundle-format.md

14 KiB
Raw Blame History

Arboricx Portable Bundle Format Specification

Version: 0.1 Status: Exploratory Author: A range of slopmachines guided by James Eversole Human Review Status: 5 minute scan-through - this is an evolving and malleable document

The Arboricx Portable Bundle is a self-contained, content-addressed binary format for distributing Tree Calculus programs and their associated Merkle DAGs. It provides:

  • A fixed binary container with header, section directory, and typed sections
  • A language-neutral Merkle node layer for content-addressed tree values
  • A fixed-order binary manifest for semantic metadata, exports, and optional extensions

Table of Contents

  1. Top-Level Container Layout
  2. Header
  3. Section Directory
  4. Section: Manifest (type 1)
  5. Section: Nodes (type 2)
  6. Merkle Node Payload Format
  7. Merkle Hash Computation
  8. Tree Calculus Reduction Semantics
  9. Binary Primitives
  10. Bundle Verification
  11. Known Section Types

1. Top-Level Container Layout

An Arboricx bundle is a flat binary blob with the following layout:

+------------------+------------------+------------------+------------------+
| Header           | Section Directory| Manifest Section | Nodes Section    |
| (32 bytes)       | (N × 60 bytes)   | (variable)       | (variable)       |
+------------------+------------------+------------------+------------------+

The container uses big-endian byte order for all multi-byte integers.

Total bundle size = 32 + (sectionCount × 60) + manifestSize + nodesSize


2. Header

Offset Size Field Description
0 8 bytes Magic ASCII "ARBORICX" (0x41 0x52 0x42 0x4F 0x52 0x49 0x43 0x58)
8 2 bytes Major version u16 BE. Currently 1
10 2 bytes Minor version u16 BE. Currently 0
12 4 bytes Section count u32 BE. Number of entries in the section directory
16 8 bytes Flags u64 BE. Reserved; currently all zeros
24 8 bytes Directory offset u64 BE. Byte offset from the start of the bundle to the section directory

Constraints:

  • Major version must be 1. Bundles with unsupported major versions are rejected.
  • The directory offset must point to a valid location within the bundle.
  • The directory offset is always 32 for bundles with the current layout (header immediately followed by the directory).

3. Section Directory

The section directory is an array of N entries, where N is the section count from the header. Each entry is exactly 60 bytes.

Offset (within entry) Size Field Description
0 4 bytes Type u32 BE. Section type identifier (see Known Section Types)
4 2 bytes Version u16 BE. Section-specific version
6 2 bytes Flags u16 BE. Bit flags: bit 0 (0x0001) = critical section
8 2 bytes Compression u16 BE. Compression codec (currently only 0 = none)
10 2 bytes Digest algorithm u16 BE. Hash algorithm (currently only 1 = SHA-256)
12 8 bytes Offset u64 BE. Byte offset from the start of the bundle to the section data
20 8 bytes Length u64 BE. Length of the section data in bytes
28 32 bytes SHA-256 digest Raw digest of the section data

Verification:

  • Unknown critical sections (flags & 0x0001) are rejected.
  • Compression must be 0 (none).
  • Digest algorithm must be 1 (SHA-256).
  • The SHA-256 digest in the directory entry must match SHA256(section_data).

4. Section: Manifest (type 1)

The manifest is a binary encoding of bundle metadata. It uses a fixed-order core layout followed by an optional TLV tail for extensibility.

4.1 Format

Manifest =
  magic            8 bytes    "ARBMNFST"
  major            u16 BE     Manifest major version (1)
  minor            u16 BE     Manifest minor version (0)

  schema           string     Length-prefixed UTF-8 text
  bundleType       string     Length-prefixed UTF-8 text

  treeCalculus     string     Length-prefixed UTF-8 text
  treeHashAlgorithm string    Length-prefixed UTF-8 text
  treeHashDomain   string     Length-prefixed UTF-8 text
  treeNodePayload  string     Length-prefixed UTF-8 text

  runtimeSemantics string     Length-prefixed UTF-8 text
  runtimeEvaluation string    Length-prefixed UTF-8 text
  runtimeAbi       string     Length-prefixed UTF-8 text
  capabilityCount  u32 BE     Number of capability strings
  capabilities     string[]   Array of length-prefixed UTF-8 capability strings

  closure          u8         0 = complete, 1 = partial
  rootCount        u32 BE     Number of root entries
  roots            Root[]     Array of root entries
  exportCount      u32 BE     Number of export entries
  exports          Export[]   Array of export entries

  metadataFieldCount u32 BE   Number of metadata TLV entries
  metadataFields   TLV[]      Metadata tag-value entries
  extensionFieldCount u32 BE  Number of extension TLV entries
  extensionFields  TLV[]      Extension tag-value entries (skipped by parsers)

Trailing bytes after the manifest must be zero (no leftover data).

4.2 String Format

Every string field uses the same encoding:

string =
  length      u32 BE    Number of UTF-8 bytes in the string (not the number of characters)
  bytes       byte[length]  UTF-8 encoded string content

The length field carries the byte count, so parsers can skip strings without decoding UTF-8.

4.3 Root Entry

Root =
  hash        32 bytes    Raw SHA-256 hash of the Merkle node
  role        string      Length-prefixed UTF-8 text ("default" for the first root, "root" for others)

The hash is stored as raw bytes (not hex-encoded). It corresponds to the Merkle hash of the node.

4.4 Export Entry

Export =
  name        string      Length-prefixed UTF-8 text (export identifier)
  root        32 bytes    Raw SHA-256 hash of the Merkle node
  kind        string      Length-prefixed UTF-8 text (currently "term")
  abi         string      Length-prefixed UTF-8 text (ABI string)

4.5 TLV Entry

TLV =
  tag         u16 BE    Tag identifier (type)
  length      u32 BE    Number of bytes in the value
  value       byte[length]  Raw bytes

TLV entries support variable-length values and are skippable by parsers that do not recognize a tag: read the u32 length and advance by 2 + 4 + length bytes.

4.6 Metadata Tags

Tag Name Value
1 package UTF-8 text: package name
2 version UTF-8 text: version string
3 description UTF-8 text: description
4 license UTF-8 text: license identifier or text
5 createdBy UTF-8 text: creator identifier

Unknown metadata tags are ignored. Unknown extension tags are skipped by length.

4.7 Semantic Constraints

A valid bundle manifest must satisfy:

Constraint Value
schema "arboricx.bundle.manifest.v1"
bundleType "tree-calculus-executable-object"
treeCalculus "tree-calculus.v1"
treeHashAlgorithm "sha256"
treeHashDomain "arboricx.merkle.node.v1"
treeNodePayload "arboricx.merkle.payload.v1"
runtimeSemantics "tree-calculus.v1"
runtimeAbi "arboricx.abi.tree.v1"
runtimeCapabilities Empty array
closure 0 (complete)
rootCount At least 1
exportCount At least 1
Export names Non-empty
Export roots Non-empty (32 bytes each)

5. Section: Nodes (type 2)

The nodes section contains all Merkle DAG nodes referenced by the manifest. It is a sequence of node entries preceded by a count.

NodesSection =
  nodeCount     u64 BE    Total number of node entries
  entries       NodeEntry[]

Each node entry:

NodeEntry =
  hash          32 bytes    Raw SHA-256 hash of this node
  payloadLen    u32 BE    Length of the payload in bytes
  payload       byte[payloadLen]  Node payload (see Section 6)

The node count is u64 to support large bundles. Entries are stored in the order produced by the exporter (typically sorted by hash for determinism).


6. Merkle Node Payload Format

Each node in the Merkle DAG is one of three types. The payload is a single byte type tag followed by hash references:

Leaf

Payload = 0x00

A leaf has no children. The payload is exactly 1 byte.

Stem

Payload = 0x01 || child_hash (32 bytes raw)

A stem has exactly one child. The payload is 33 bytes.

Fork

Payload = 0x02 || left_hash (32 bytes raw) || right_hash (32 bytes raw)

A fork has exactly two children. The payload is 65 bytes.

Validation:

  • Leaf payloads must be exactly 1 byte (0x00).
  • Stem payloads must be exactly 33 bytes.
  • Fork payloads must be exactly 65 bytes.
  • Unknown type bytes are rejected.

7. Merkle Hash Computation

Each node is identified by a SHA-256 hash of its canonical payload:

hash = SHA256( domain_tag || 0x00 || payload )

Where:

Component Value
domain_tag "arboricx.merkle.node.v1" as UTF-8 bytes
Separator 0x00 (one zero byte)
payload The node's canonical serialization from Section 6

Examples:

  • Leaf: SHA256("arboricx.merkle.node.v1" || 0x00 || 0x00)
  • Stem: SHA256("arboricx.merkle.node.v1" || 0x00 || 0x01 || child_hash_bytes)
  • Fork: SHA256("arboricx.merkle.node.v1" || 0x00 || 0x02 || left_hash_bytes || right_hash_bytes)

The resulting SHA-256 hash is stored as a hex-encoded string in the manifest (64 hex characters). Within the nodes section, it is stored as raw bytes.


8. Tree Calculus Reduction Semantics

The bundle represents a Tree Calculus term as a Merkle DAG. The reduction rules are:

Apply Rules

apply(Fork(Leaf, a), _)          = a
apply(Fork(Stem(a), b), c)       = apply(apply(a, c), apply(b, c))
apply(Fork(Fork, _, _), Leaf)    = left of inner Fork
apply(Fork(Fork, _, _), Stem)    = right of inner Fork
apply(Fork(Fork, _, _), Fork)    = apply(apply(c, u), v)  where c = Fork(u, v)
apply(Leaf, b)                   = Stem(b)
apply(Stem(a), b)                = Fork(a, b)

Internal Representation

In the reduction engine, Fork nodes use a [right, left] (stack) ordering:

  • Fork = [right_child, left_child]
  • Stem = [child]
  • Leaf = []

This ordering supports stack-based reduction: pop two terms, apply, push results back.

Closure

The bundle declares closure = "complete", meaning all nodes reachable from export roots are present in the nodes section. No external references exist.


9. Binary Primitives

All multi-byte integers use big-endian byte order.

u16 (2 bytes)

byte[0] | byte[1]
value = (byte[0] << 8) | byte[1]

u32 (4 bytes)

byte[0] | byte[1] | byte[2] | byte[3]
value = (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3]

u64 (8 bytes)

byte[0] ... byte[7]
value = (byte[0] << 56) | ... | byte[7]

u8 (1 byte)

A single byte, value 0-255.


10. Bundle Verification

A complete bundle verification proceeds in this order:

  1. Magic check: First 8 bytes must be "ARBORICX".
  2. Version check: Major version must be 1.
  3. Section directory: Parse all entries; reject unknown critical sections.
  4. Digest verification: For each section, compute SHA256(section_data) and compare with the digest in the directory entry.
  5. Manifest parsing: Decode the fixed-order manifest; validate semantic constraints.
  6. Node section: Parse all node entries; reject duplicates.
  7. Root verification: All root hashes from the manifest must exist in the node map.
  8. Export verification: All export root hashes must exist in the node map.
  9. Node hash verification: For each node, compute SHA256(domain || 0x00 || payload) and compare with the stored hash.
  10. Children verification: For each Stem/Fork node, both child hashes must exist in the node map.
  11. Closure verification: Starting from each root hash, traverse the DAG and confirm all reachable nodes are present.

11. Known Section Types

Type Name Required Version Description
1 Manifest Yes 1 Bundle metadata in fixed-order binary format
2 Nodes Yes 1 Merkle DAG node entries

Unknown section types are permitted if not marked as critical (flags bit 0 is not set).


Appendix A: Complete Example Layout (id.arboricx)

A minimal id.arboricx bundle has:

+---------------------------------------------------+
| Header (32 bytes)                                  |
|   Magic: "ARBORICX"                               |
|   Major: 1, Minor: 0                              |
|   Section count: 2                                 |
|   Flags: 0                                         |
|   Dir offset: 32                                   |
+---------------------------------------------------+
| Section Directory (120 bytes = 2 × 60)             |
|   Entry 0: type=1 (manifest), offset=152, len=375 |
|   Entry 1: type=2 (nodes),  offset=527, len=284   |
+---------------------------------------------------+
| Manifest Section (375 bytes)                       |
|   Magic: "ARBMNFST"                                |
|   Version: 1.0                                     |
|   Core strings (schema, bundleType, tree spec,     |
|   runtime spec, capabilities, closure, roots,      |
|   exports, metadata TLVs, extension fields)        |
+---------------------------------------------------+
| Nodes Section (284 bytes)                          |
|   Node count: 2                                    |
|   Node entry 1: hash + payload (Leaf)              |
|   Node entry 2: hash + payload (Fork)              |
+---------------------------------------------------+

The manifest section starts at byte 152 (0x98) and the nodes section at byte 527 (0x20F).


Appendix B: File Extension

Bundles produced by the tricu tool use the .arboricx file extension. The .tri extension is used for plain source files; the .arboricx extension identifies the portable binary format.