15 KiB
Content Store and Module Format Design
Status: concrete design draft.
This document narrows the higher-level module-system direction into concrete format and storage decisions. It intentionally avoids source/provenance details: modules export usable portable artifacts, not edit history.
Related design overview: docs/module-system-design.md.
1. Scope
This document specifies the first target shape for:
- a neutral filesystem-backed content-addressed store;
- Arboricx Merkle node persistence;
- indexed Arboricx bundle import/export as transport;
- module manifests as immutable export maps;
- workspace aliases as mutable human-facing references;
- View Contract artifact attachment to module exports.
It does not specify:
- package manager semantics;
- dependency solving;
- source-level rebuild/provenance metadata;
- final import syntax;
- garbage collection;
- registry/sync protocol.
2. Non-Negotiable Boundaries
The content store is not tricu-specific and is not Haskell-specific.
The store may contain objects produced by tricu, Haskell, Tree Calculus tools,
Arboricx tooling, or future frontends. The store core only knows object bytes,
object kinds, hashes, aliases, and optionally structural references for known
portable formats.
View Contracts may be first-class artifact references because they are portable Tree Calculus data checked by pure Tree Calculus code. They are not Haskell-private semantics.
Source and build provenance are intentionally excluded from the first module manifest format. A module manifest answers:
What portable artifacts does this module export, and what portable contracts are
paired with them?
It does not answer:
Which source file, parser, frontend, or build command produced these artifacts?
3. Hashing Convention
Objects are content-addressed by SHA-256 over domain-separated canonical bytes.
General rule:
hash = SHA256(domainUtf8 || 0x00 || canonicalPayloadBytes)
This matches the existing Merkle node convention in Research.nodeHash:
SHA256("arboricx.merkle.node.v1" || 0x00 || nodePayload)
The domain string is part of the object format. It prevents identical payload bytes in different formats from accidentally sharing identity.
Hashes are represented externally as 64 lowercase hexadecimal characters.
4. Filesystem Store Layout
The canonical filesystem store layout is:
store/
objects/
abc/
abc123... -- object bytes, sharded by first 3 hex chars
aliases/
names/
modules/
packages/
manifests/
tmp/
The three-character shard follows the existing lib/arboricx/server.tri
convention.
4.1 Object paths
For object hash:
abc123...
object bytes live at:
store/objects/abc/abc123...
The object filename is the full hash. The shard directory is the first three hex characters.
4.2 Atomic writes
Writers should use:
store/tmp/<hash>.<nonce>.tmp
then atomically rename into:
store/objects/<shard>/<hash>
Writing an existing object is idempotent if the existing bytes match the hash.
4.3 Store core metadata
The minimal filesystem store does not require sidecar metadata for every object. Object kind can be known by context or by manifest references.
A later index may cache:
hash -> kind
hash -> size
hash -> references
hash -> createdAt
but this index is not semantic identity.
5. Arboricx Merkle Node Object Format
The persistent Tree Calculus representation is a Merkle DAG of node objects.
Domain:
arboricx.merkle.node.v1
Canonical payloads:
Leaf = 0x00
Stem child = 0x01 || childHashRaw32
Fork left right
= 0x02 || leftHashRaw32 || rightHashRaw32
Where childHashRaw32, leftHashRaw32, and rightHashRaw32 are the raw 32-byte
SHA-256 digests corresponding to child node hashes.
This is already implemented conceptually by:
Research.Node
Research.serializeNode
Research.deserializeNode
Research.nodeHash
The filesystem CAS should use this payload/hash convention directly.
6. Tree Roots
A Tree Calculus value stored in the CAS is identified by the hash of its root Merkle node.
treeRootHash = hash(rootNodePayload)
The complete tree is reconstructed by recursively loading node objects reachable from the root.
Hydration is an interpretation step, not part of object identity. A client may hydrate a root as a plain tree, a graph with explicit sharing, or another runtime representation as long as the observable Tree Calculus value is the same. The filesystem CAS provides structural dedupe and portable identity; it does not by itself guarantee that a hydrated runtime value is the cheapest representation for all workloads.
Merkle nodes are useful for explicit DAG-oriented tooling, audit, and bundle packing. They are not the default representation for module executable exports: storing every subtree as a separate filesystem object is pathologically slow for large normal forms.
For module-backed evaluation and imports, a complete normalized named term is stored as one canonical object:
kind: arboricx.tree-term.v1
hash: <whole-term object hash>
abi: arboricx.abi.tree.v1
The arboricx.tree-term.v1 payload is a prefix encoding:
Leaf = 0x00
Stem t = 0x01 Tree
Fork l r = 0x02 Tree Tree
7. Arboricx Indexed Bundles
Indexed .arboricx bundles remain the transport/execution format.
They are:
- compact;
- self-contained;
- deterministic;
- suitable for restricted runtimes;
- suitable for HTTP serving and deployment.
They are not the canonical long-lived deduplicated store representation.
7.1 Pack
Packing converts one or more CAS tree roots into an indexed bundle:
CAS tree roots -> indexed Arboricx bundle
The packer traverses reachable Merkle nodes, emits a compact indexed node table, and writes a bundle manifest with export names and root indices.
7.3 Unpack
Unpacking converts a bundle into CAS nodes:
indexed Arboricx bundle -> CAS tree roots
The unpacker verifies the bundle structure, reconstructs each exported tree, and stores the corresponding Merkle nodes. It returns the tree root hash for each bundle export.
8. Module Manifest v1
A module is an immutable manifest object. The module identity is the hash of its canonical manifest bytes.
A module name is not identity. It is a workspace alias to a module manifest hash.
8.1 Domain
Proposed domain:
arboricx.module-manifest.v1
8.2 Purpose
A module manifest pairs human-facing export names with portable content objects and optional portable contracts.
It exists to support:
- reproducible import resolution;
- executable export discovery;
- View Contract lookup for imported symbols;
- module-to-module reference tracking;
- transport/store interop.
It does not describe source provenance.
8.3 Conceptual shape
moduleManifestV1:
imports:
- alias: <text>
kind: <object kind>
hash: <object hash>
exports:
- name: <text>
object:
kind: <object kind>
hash: <object hash>
abi: <abi identifier>
view: optional
kind: <view artifact kind>
hash: <view artifact hash>
catalog: optional
kind: <view catalog kind>
hash: <view catalog hash>
metadata: optional human-facing fields
8.4 Imports/references
The imports section is a manifest reference graph, not a store-level language
dependency graph.
Each entry records direct content-addressed references used by the module:
alias: Prelude
kind: arboricx.module-manifest.v1
hash: <module hash>
This supports reproducibility, partial fetch, and audit. The content store core
stores this object but does not need to understand Prelude or import
semantics.
8.5 Exports
Each export is a record, not a single hash. This is required so executable objects and advertised contracts cannot drift apart.
Minimal executable export:
name: "id"
object:
kind: arboricx.tree-term.v1
hash: <whole-term hash>
abi: arboricx.abi.tree.v1
Export with View Contract:
name: "map"
object:
kind: arboricx.tree-term.v1
hash: <whole-term hash>
abi: arboricx.abi.tree.v1
view:
kind: arboricx.view-contract.type.v1
hash: <view type hash>
The manifest preserves the pairing between exported executable and exported
contract. For workspace modules built from local source, annotated exports are
checked before the manifest is published; only exports that pass producer-side
View Contract checking receive direct arboricx.view-contract.type.v1 refs.
8.6 Metadata
Metadata is optional and human-facing. Initial fields may include:
package
version
description
license
createdBy
Metadata is not source provenance and is not required for execution or checking.
9. View Contract Artifacts
View Contract artifacts are portable Arboricx-layer data. They may be stored
as content objects and referenced by module exports. tricu may emit these
objects, but the object kind is not tricu-specific.
Current artifact kind:
arboricx.view-contract.type.v1
arboricx.view-contract.type.v1 is the direct export-view artifact. Its
payload is a canonical prefix binary encoding of the syntactic ViewType:
Name = 0x00 u32be(byte-length) utf8-name
Ref = 0x01 u32be(byte-length) utf8-ref
List = 0x02 ViewType
Maybe = 0x03 ViewType
Pair = 0x04 ViewType ViewType
Result = 0x05 ViewType ViewType
Fn = 0x06 u32be(argument-count) ViewType* ViewType
utf8-ref is tagged text:
i:<decimal-integer> numeric/legacy ref
s:<text> symbolic user ref
Symbolic refs are the preferred user-authored form; numeric refs remain useful for generated code, fixtures, and old low-level examples.
The object hash domain is the object kind:
arboricx.view-contract.type.v1 \0 <payload>
9.1 Export-level pairing
The module manifest is the canonical pairing of an executable export and its advertised contract:
export name -> tree-term hash + optional view artifact hash
This avoids drift such as:
map -> tree A
map.view -> contract B
where aliases might be retargeted independently.
9.2 Import checking
When a source file imports a module, a frontend can resolve an imported export,
decode its direct arboricx.view-contract.type.v1 ref, and emit typed program
evidence locally:
imported List.map has view Fn [...]
For locally built workspace modules this is backed by producer-side checking before the module manifest alias is published, including imported view facts from dependencies used by the producer source. External or prebuilt manifests are trusted boundary declarations for now; they are not accompanied by proof objects. The checker still consumes only local numeric symbols and typed-program evidence. Global content hashes do not become checker symbols.
Correct split:
local checker symbol: 3
presentation label: "List.map"
resolved object: sha256:...
exported view: Fn [...]
9.3 Execution hydration versus contract evidence
Execution imports should use a narrow, demand-driven path:
module import -> selected executable exports -> hydrate selected tree-term objects
This path should not compute a dependency closure over other module exports. Each selected executable export is already a complete Tree Calculus value.
Contract-aware checking may use a broader path:
module import -> selected exports -> exported view type refs -> typed-program evidence
That path emits portable evidence and leaves compatibility policy decisions to
the Tree Calculus checker. typed programs and reusable catalogs do not need their
own binary object kinds today: they are ordinary Tree Calculus data and can be
stored as arboricx.tree-term.v1 when persistence is useful.
10. Workspace Aliases
A workspace is mutable human-facing state over immutable content.
Examples:
List -> module manifest hash
Prelude -> module manifest hash
map -> tree-term hash
httpServer -> bundle hash
Aliases should live under:
store/aliases/
Initial categories:
store/aliases/modules/<name>
store/aliases/names/<name>
store/aliases/packages/<name>
Alias file contents should be simple and explicit, for example:
kind: arboricx.module-manifest.v1
hash: abc123...
Exact encoding can be decided with the first implementation. The important rule is that aliases are mutable pointers, not content identity.
11. Existing Convention Alignment
This design intentionally preserves existing conventions where they already fit:
- SHA-256 domain-separated Merkle node hashing;
Leaf/Stem/Forknode payload tags0x00,0x01,0x02;- three-character object sharding from
lib/arboricx/server.tri; - indexed Arboricx bundles as compact transport objects;
- optional human-facing export names in manifests;
- View Contract checker evidence as portable Tree Calculus data.
It replaces or demotes conventions that do not fit:
- SQLite
terms.namescomma-separated aliases become workspace aliases/indexes; - SQLite
terms.tagscomma-separated tags become optional metadata/indexes; - file imports as AST flattening become transitional behavior;
- names cease to be semantic identity.
12. Implementation Sketch
A staged implementation can proceed as follows:
- Add filesystem CAS helpers alongside the existing SQLite store.
- Store/load Arboricx Merkle nodes using the filesystem layout.
- Implement tree-term storage and reconstruction from filesystem CAS.
- Implement pack from CAS tree terms/Merkle roots to indexed Arboricx bundle.
- Implement unpack from indexed Arboricx bundle to CAS tree terms/Merkle roots.
- Define a concrete module manifest encoding.
- Store/load module manifests as content-addressed objects.
- Add workspace alias read/write helpers.
- Teach import resolution to target module manifests/exports.
- Attach exported View Contract artifacts to module exports.
- Gradually migrate existing
!importusers.
13. Deferred Decisions
These are intentionally left out of the first concrete format:
- package version solving;
- registry/remotes protocol;
- garbage collection/reachability;
- source/provenance/build-record objects;
- editor/update workflows;
- rich visibility/export rules;
- final import syntax;
- whether module manifests also need a tree-native encoding.
14. Summary
The concrete v1 direction is:
Store:
filesystem-backed content-addressed objects
Hashing:
SHA256(domain || 0x00 || canonical payload)
Tree persistence:
Arboricx Merkle nodes
Transport:
indexed .arboricx bundles, packable from and unpackable to CAS roots
Modules:
immutable manifests pairing export names with object refs and optional View
Contract refs
Workspace:
mutable aliases from human names to immutable content hashes
This keeps the store portable, preserves Arboricx's compact transport role,
restores Merkle DAGs as the persistence model, and gives View Contracts a stable
module/export attachment point without making the store tricu-specific.