Files
tricu/docs/arboricx-bundle-format.md
James Eversole 31bf7094f4 Arboricx bundle format 1.1
We don't need SHA verification or Merkle dags in our transport bundle. Content
stores can handle both bundle and term verification and hashing.
2026-05-12 15:18:29 -05:00

365 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Arboricx Portable Bundle Format Specification
**Version:** 1.1 (Indexed)
**Status:** Stable
**Author:** Slopmachines guided by James Eversole
The Arboricx Portable Bundle is a self-contained binary format for distributing Tree Calculus programs. It uses topological indexing instead of cryptographic hashing for node identity, making it writable from pure Tree Calculus and verifiable via structural inspection.
## Table of Contents
1. [Design Principles](#1-design-principles)
2. [Top-Level Container Layout](#2-top-level-container-layout)
3. [Header](#3-header)
4. [Section Directory](#4-section-directory)
5. [Section: Manifest (type 1)](#5-section-manifest-type-1)
6. [Section: Nodes (type 2)](#6-section-nodes-type-2)
7. [Node Payload Format](#7-node-payload-format)
8. [Tree Calculus Reduction Semantics](#8-tree-calculus-reduction-semantics)
9. [Binary Primitives](#9-binary-primitives)
10. [Bundle Verification](#10-bundle-verification)
11. [Canonicalization](#11-canonicalization)
12. [Known Section Types](#12-known-section-types)
---
## 1. Design Principles
- **No cryptographic primitives required.** Node identity is topological (array index), not a SHA-256 hash.
- **Self-contained.** A bundle includes all nodes reachable from its exports. No external references.
- **Deterministic.** Canonical bundles produce byte-identical output for identical input terms.
- **Small.** ~5 bytes per node entry (length + payload) versus ~36 bytes in hash-based formats.
- **Verifiable via structure.** Bounds checking and acyclicity verification replace hash recomputation.
Global artifact identity (for registries, lockfiles, or content-addressed caches) is achieved by hashing the complete canonical bundle file externally. The bundle format itself knows nothing about this hash.
---
## 2. Top-Level Container Layout
```
+------------------+------------------+------------------+------------------+
| Header | Section Directory| Manifest Section | Nodes Section |
| (32 bytes) | (N × 32 bytes) | (variable) | (variable) |
+------------------+------------------+------------------+------------------+
```
Total bundle size = 32 + (sectionCount × 32) + manifestSize + nodesSize
All multi-byte integers use **big-endian** byte order.
---
## 3. Header
| Offset | Size | Field | Description |
|--------|------|-------|-------------|
| 0 | 8 bytes | Magic | ASCII `"ARBORICX"` |
| 8 | 2 bytes | Major version | `u16` BE. Currently `1` |
| 10 | 2 bytes | Minor version | `u16` BE. Currently `0` |
| 12 | 4 bytes | Section count | `u32` BE. Number of entries in the section directory |
| 16 | 8 bytes | Flags | `u64` BE. Reserved; currently all zeros |
| 24 | 8 bytes | Directory offset | `u64` BE. Byte offset to the section directory (always `32`) |
---
## 4. Section Directory
Array of `N` entries, each exactly **32 bytes**.
| Offset (within entry) | Size | Field | Description |
|----------------------|------|-------|-------------|
| 0 | 4 bytes | Type | `u32` BE. Section type identifier |
| 4 | 2 bytes | Version | `u16` BE. Section-specific version |
| 6 | 2 bytes | Flags | `u16` BE. Bit 0 (`0x0001`) = critical section |
| 8 | 2 bytes | Compression | `u16` BE. `0` = none (currently the only value) |
| 10 | 2 bytes | Reserved | `u16` BE. Padding; must be zero |
| 12 | 8 bytes | Offset | `u64` BE. Byte offset from bundle start to section data |
| 20 | 8 bytes | Length | `u64` BE. Length of section data in bytes |
| 28 | 4 bytes | Reserved | Padding; must be zero |
**Verification:**
- Unknown critical sections are rejected.
- Compression must be `0` (none).
- Reserved fields must be zero.
**Note:** No per-section digest is stored. Integrity is verified at the distribution layer (e.g. SHA-256 of the complete bundle file) rather than inside the container.
---
## 5. Section: Manifest (type 1)
Binary encoding of bundle metadata. Fixed-order core layout followed by optional TLV tail.
```
Manifest =
magic 8 bytes "ARBMNFST"
major u16 BE Manifest major version (1)
minor u16 BE Manifest minor version (1)
schema string "arboricx.bundle.manifest.v1"
bundleType string "tree-calculus-executable-object"
treeCalculus string "tree-calculus.v1"
treeHashAlgorithm string "indexed"
treeHashDomain string "arboricx.indexed.node.v1"
treeNodePayload string "arboricx.indexed.payload.v1"
runtimeSemantics string "tree-calculus.v1"
runtimeEvaluation string "normal-order"
runtimeAbi string "arboricx.abi.tree.v1"
capabilityCount u32 BE Number of capability strings (currently 0)
capabilities string[] Array of length-prefixed UTF-8 strings
closure u8 0 = complete
rootCount u32 BE Number of root entries
roots Root[] Array of root entries
exportCount u32 BE Number of export entries
exports Export[] Array of export entries
metadataFieldCount u32 BE Number of metadata TLV entries
metadataFields TLV[] Metadata tag-value entries
extensionFieldCount u32 BE Number of extension TLV entries (currently 0)
extensionFields TLV[] Extension entries (skipped by parsers)
```
### String Format
```
string =
length u32 BE Number of UTF-8 bytes
bytes byte[length] UTF-8 content
```
### Root Entry
```
Root =
index u32 BE Node index into the nodes section
role string Length-prefixed UTF-8 ("default" for first root, "root" for others)
```
### Export Entry
```
Export =
name string Length-prefixed UTF-8 export identifier
root u32 BE Node index into the nodes section
kind string Length-prefixed UTF-8 (currently "term")
abi string Length-prefixed UTF-8 ABI string
```
### TLV Entry
```
TLV =
tag u16 BE Tag identifier
length u32 BE Value length in bytes
value byte[length]
```
### Metadata Tags
| Tag | Name | Value |
|-----|------|-------|
| 1 | package | UTF-8 text |
| 2 | version | UTF-8 text |
| 3 | description | UTF-8 text |
| 4 | license | UTF-8 text |
| 5 | createdBy | UTF-8 text |
Unknown metadata tags are ignored. Unknown extension tags are skipped by length.
### Semantic Constraints
| Constraint | Value |
|-----------|-------|
| `schema` | `"arboricx.bundle.manifest.v1"` |
| `bundleType` | `"tree-calculus-executable-object"` |
| `treeCalculus` | `"tree-calculus.v1"` |
| `treeHashAlgorithm` | `"indexed"` |
| `treeHashDomain` | `"arboricx.indexed.node.v1"` |
| `treeNodePayload` | `"arboricx.indexed.payload.v1"` |
| `runtimeSemantics` | `"tree-calculus.v1"` |
| `runtimeAbi` | `"arboricx.abi.tree.v1"` |
| `closure` | `0` (complete) |
| `rootCount` | At least 1 |
| `exportCount` | At least 1 |
---
## 6. Section: Nodes (type 2)
```
NodesSection =
nodeCount u64 BE Total number of node entries
entries NodeEntry[]
```
### Node Entry
```
NodeEntry =
payloadLen u32 BE Length of payload in bytes
payload byte[payloadLen]
```
There is **no hash field**. The node is identified solely by its position in the array.
---
## 7. Node Payload Format
Child references are `u32` big-endian indices into the node array. The array **must** be topologically sorted: every child index must be strictly less than the entry's own position.
### Leaf
```
Payload = 0x00
```
Exactly 1 byte.
### Stem
```
Payload = 0x01 || child_index (u32 BE)
```
Exactly 5 bytes.
### Fork
```
Payload = 0x02 || left_index (u32 BE) || right_index (u32 BE)
```
Exactly 9 bytes.
---
## 8. Tree Calculus Reduction Semantics
The bundle represents a **Tree Calculus** term. The reduction rules are:
```
The t operator is left associative.
1. t t a b -> a
2. t (t a) b c -> a c (b c)
3a. t (t a b) c t -> a
3b. t (t a b) c (t u) -> b u
3c. t (t a b) c (t u v) -> c u v
```
**Closure:** The bundle declares `closure = "complete"`, meaning all nodes reachable from export roots are present in the nodes section. No external references exist.
---
## 9. Binary Primitives
### u8
Single byte, value `0-255`.
### u16 (2 bytes)
```
value = (byte[0] << 8) | byte[1]
```
### u32 (4 bytes)
```
value = (byte[0] << 24) | (byte[1] << 16) | (byte[2] << 8) | byte[3]
```
### u64 (8 bytes)
```
value = (byte[0] << 56) | ... | byte[7]
```
---
## 10. Bundle Verification
1. **Magic check:** First 8 bytes must be `"ARBORICX"`.
2. **Version check:** Major version must be `1`.
3. **Section directory:** Parse all entries; reject unknown critical sections. Verify reserved fields are zero.
4. **Manifest parsing:** Decode fixed-order manifest; validate semantic constraints.
5. **Nodes section:** Parse all entries.
6. **Bounds checking:**
- Every root index `< nodeCount`
- Every export index `< nodeCount`
- In every Stem payload, `child_index < entry_position` and `child_index < nodeCount`
- In every Fork payload, both indices `< entry_position` and `< nodeCount`
7. **Acyclicity:** Guaranteed by the `child < parent` rule above.
8. **Closure:** Traverse from all root/export indices; confirm every reached index is valid.
No hash computation is required.
---
## 11. Canonicalization
A bundle is **canonical** iff:
1. **Maximal deduplication.** No two entries represent structurally identical subtrees.
2. **Topological order.** Children precede parents.
3. **Deterministic post-order traversal.** Nodes are emitted in the order discovered by a left-to-right recursive post-order walk.
4. **No trailing bytes** in any section.
5. **Reserved fields are zero.**
Canonical bundles produce deterministic bytes and can be file-level hashed for global identity.
---
## 12. Known Section Types
| Type | Name | Required | Version | Description |
|------|------|----------|---------|-------------|
| 1 | Manifest | Yes | 1 | Bundle metadata |
| 2 | Nodes | Yes | 1 | Topological DAG node entries |
Unknown section types are permitted if not marked critical.
---
## Appendix A: Complete Example Layout
A minimal bundle for `Stem(Leaf)` (the Tree Calculus encoding of `t t`):
```
+---------------------------------------------------+
| Header (32 bytes) |
| Magic: "ARBORICX" |
| Major: 1, Minor: 0 |
| Section count: 2 |
| Flags: 0 |
| Dir offset: 32 |
+---------------------------------------------------+
| Section Directory (64 bytes = 2 × 32) |
| Entry 0: type=1 (manifest), offset=96, len=~200 |
| Entry 1: type=2 (nodes), offset=~296, len=10 |
+---------------------------------------------------+
| Manifest Section (~200 bytes) |
| Magic: "ARBMNFST", Version: 1.1 |
| Schema, bundleType, tree spec, runtime spec |
| Closure: 0, Roots: [1], Exports: ["main" -> 1] |
| Metadata TLVs, zero extension fields |
+---------------------------------------------------+
| Nodes Section (10 bytes) |
| Node count: 2 |
| Entry 0: payloadLen=1, payload=[0x00] |
| Entry 1: payloadLen=5, payload=[0x01, 0,0,0,0] |
+---------------------------------------------------+
```
---
## Appendix B: File Extension
Bundles use the `.arboricx` file extension. Plain source files use `.tri`.