Files
tricu/docs/content-store-and-module-format.md
James Eversole fdebb6c13d Tricu 2.0.0
Sorry for squashing all of this but 🤷
2026-05-25 12:44:24 -05:00

15 KiB

Content Store and Module Format Design

Status: concrete design draft.

This document narrows the higher-level module-system direction into concrete format and storage decisions. It intentionally avoids source/provenance details: modules export usable portable artifacts, not edit history.

Related design overview: docs/module-system-design.md.

1. Scope

This document specifies the first target shape for:

  • a neutral filesystem-backed content-addressed store;
  • Arboricx Merkle node persistence;
  • indexed Arboricx bundle import/export as transport;
  • module manifests as immutable export maps;
  • workspace aliases as mutable human-facing references;
  • View Contract artifact attachment to module exports.

It does not specify:

  • package manager semantics;
  • dependency solving;
  • source-level rebuild/provenance metadata;
  • final import syntax;
  • garbage collection;
  • registry/sync protocol.

2. Non-Negotiable Boundaries

The content store is not tricu-specific and is not Haskell-specific.

The store may contain objects produced by tricu, Haskell, Tree Calculus tools, Arboricx tooling, or future frontends. The store core only knows object bytes, object kinds, hashes, aliases, and optionally structural references for known portable formats.

View Contracts may be first-class artifact references because they are portable Tree Calculus data checked by pure Tree Calculus code. They are not Haskell-private semantics.

Source and build provenance are intentionally excluded from the first module manifest format. A module manifest answers:

What portable artifacts does this module export, and what portable contracts are
paired with them?

It does not answer:

Which source file, parser, frontend, or build command produced these artifacts?

3. Hashing Convention

Objects are content-addressed by SHA-256 over domain-separated canonical bytes.

General rule:

hash = SHA256(domainUtf8 || 0x00 || canonicalPayloadBytes)

This matches the existing Merkle node convention in Research.nodeHash:

SHA256("arboricx.merkle.node.v1" || 0x00 || nodePayload)

The domain string is part of the object format. It prevents identical payload bytes in different formats from accidentally sharing identity.

Hashes are represented externally as 64 lowercase hexadecimal characters.

4. Filesystem Store Layout

The canonical filesystem store layout is:

store/
  objects/
    abc/
      abc123...        -- object bytes, sharded by first 3 hex chars
  aliases/
    names/
    modules/
    packages/
  manifests/
  tmp/

The three-character shard follows the existing lib/arboricx/server.tri convention.

4.1 Object paths

For object hash:

abc123...

object bytes live at:

store/objects/abc/abc123...

The object filename is the full hash. The shard directory is the first three hex characters.

4.2 Atomic writes

Writers should use:

store/tmp/<hash>.<nonce>.tmp

then atomically rename into:

store/objects/<shard>/<hash>

Writing an existing object is idempotent if the existing bytes match the hash.

4.3 Store core metadata

The minimal filesystem store does not require sidecar metadata for every object. Object kind can be known by context or by manifest references.

A later index may cache:

hash -> kind
hash -> size
hash -> references
hash -> createdAt

but this index is not semantic identity.

5. Arboricx Merkle Node Object Format

The persistent Tree Calculus representation is a Merkle DAG of node objects.

Domain:

arboricx.merkle.node.v1

Canonical payloads:

Leaf         = 0x00
Stem child   = 0x01 || childHashRaw32
Fork left right
             = 0x02 || leftHashRaw32 || rightHashRaw32

Where childHashRaw32, leftHashRaw32, and rightHashRaw32 are the raw 32-byte SHA-256 digests corresponding to child node hashes.

This is already implemented conceptually by:

Research.Node
Research.serializeNode
Research.deserializeNode
Research.nodeHash

The filesystem CAS should use this payload/hash convention directly.

6. Tree Roots

A Tree Calculus value stored in the CAS is identified by the hash of its root Merkle node.

treeRootHash = hash(rootNodePayload)

The complete tree is reconstructed by recursively loading node objects reachable from the root.

Hydration is an interpretation step, not part of object identity. A client may hydrate a root as a plain tree, a graph with explicit sharing, or another runtime representation as long as the observable Tree Calculus value is the same. The filesystem CAS provides structural dedupe and portable identity; it does not by itself guarantee that a hydrated runtime value is the cheapest representation for all workloads.

Merkle nodes are useful for explicit DAG-oriented tooling, audit, and bundle packing. They are not the default representation for module executable exports: storing every subtree as a separate filesystem object is pathologically slow for large normal forms.

For module-backed evaluation and imports, a complete normalized named term is stored as one canonical object:

kind: arboricx.tree-term.v1
hash: <whole-term object hash>
abi:  arboricx.abi.tree.v1

The arboricx.tree-term.v1 payload is a prefix encoding:

Leaf     = 0x00
Stem t   = 0x01 Tree
Fork l r = 0x02 Tree Tree

7. Arboricx Indexed Bundles

Indexed .arboricx bundles remain the transport/execution format.

They are:

  • compact;
  • self-contained;
  • deterministic;
  • suitable for restricted runtimes;
  • suitable for HTTP serving and deployment.

They are not the canonical long-lived deduplicated store representation.

7.1 Pack

Packing converts one or more CAS tree roots into an indexed bundle:

CAS tree roots -> indexed Arboricx bundle

The packer traverses reachable Merkle nodes, emits a compact indexed node table, and writes a bundle manifest with export names and root indices.

7.3 Unpack

Unpacking converts a bundle into CAS nodes:

indexed Arboricx bundle -> CAS tree roots

The unpacker verifies the bundle structure, reconstructs each exported tree, and stores the corresponding Merkle nodes. It returns the tree root hash for each bundle export.

8. Module Manifest v1

A module is an immutable manifest object. The module identity is the hash of its canonical manifest bytes.

A module name is not identity. It is a workspace alias to a module manifest hash.

8.1 Domain

Proposed domain:

arboricx.module-manifest.v1

8.2 Purpose

A module manifest pairs human-facing export names with portable content objects and optional portable contracts.

It exists to support:

  • reproducible import resolution;
  • executable export discovery;
  • View Contract lookup for imported symbols;
  • module-to-module reference tracking;
  • transport/store interop.

It does not describe source provenance.

8.3 Conceptual shape

moduleManifestV1:
  imports:
    - alias: <text>
      kind:  <object kind>
      hash:  <object hash>

  exports:
    - name: <text>
      object:
        kind: <object kind>
        hash: <object hash>
        abi:  <abi identifier>
      view: optional
        kind: <view artifact kind>
        hash: <view artifact hash>
      catalog: optional
        kind: <view catalog kind>
        hash: <view catalog hash>

  metadata: optional human-facing fields

8.4 Imports/references

The imports section is a manifest reference graph, not a store-level language dependency graph.

Each entry records direct content-addressed references used by the module:

alias: Prelude
kind:  arboricx.module-manifest.v1
hash:  <module hash>

This supports reproducibility, partial fetch, and audit. The content store core stores this object but does not need to understand Prelude or import semantics.

8.5 Exports

Each export is a record, not a single hash. This is required so executable objects and advertised contracts cannot drift apart.

Minimal executable export:

name: "id"
object:
  kind: arboricx.tree-term.v1
  hash: <whole-term hash>
  abi:  arboricx.abi.tree.v1

Export with View Contract:

name: "map"
object:
  kind: arboricx.tree-term.v1
  hash: <whole-term hash>
  abi:  arboricx.abi.tree.v1
view:
  kind: arboricx.view-contract.type.v1
  hash: <view type hash>

The manifest preserves the pairing between exported executable and exported contract. For workspace modules built from local source, annotated exports are checked before the manifest is published; only exports that pass producer-side View Contract checking receive direct arboricx.view-contract.type.v1 refs.

8.6 Metadata

Metadata is optional and human-facing. Initial fields may include:

package
version
description
license
createdBy

Metadata is not source provenance and is not required for execution or checking.

9. View Contract Artifacts

View Contract artifacts are portable Arboricx-layer data. They may be stored as content objects and referenced by module exports. tricu may emit these objects, but the object kind is not tricu-specific.

Current artifact kind:

arboricx.view-contract.type.v1

arboricx.view-contract.type.v1 is the direct export-view artifact. Its payload is a canonical prefix binary encoding of the syntactic ViewType:

Name   = 0x00 u32be(byte-length) utf8-name
Ref    = 0x01 u32be(byte-length) utf8-ref
List   = 0x02 ViewType
Maybe  = 0x03 ViewType
Pair   = 0x04 ViewType ViewType
Result = 0x05 ViewType ViewType
Fn     = 0x06 u32be(argument-count) ViewType* ViewType

utf8-ref is tagged text:

i:<decimal-integer>  numeric/legacy ref
s:<text>             symbolic user ref

Symbolic refs are the preferred user-authored form; numeric refs remain useful for generated code, fixtures, and old low-level examples.

The object hash domain is the object kind:

arboricx.view-contract.type.v1 \0 <payload>

9.1 Export-level pairing

The module manifest is the canonical pairing of an executable export and its advertised contract:

export name -> tree-term hash + optional view artifact hash

This avoids drift such as:

map -> tree A
map.view -> contract B

where aliases might be retargeted independently.

9.2 Import checking

When a source file imports a module, a frontend can resolve an imported export, decode its direct arboricx.view-contract.type.v1 ref, and emit typed program evidence locally:

imported List.map has view Fn [...]

For locally built workspace modules this is backed by producer-side checking before the module manifest alias is published, including imported view facts from dependencies used by the producer source. External or prebuilt manifests are trusted boundary declarations for now; they are not accompanied by proof objects. The checker still consumes only local numeric symbols and typed-program evidence. Global content hashes do not become checker symbols.

Correct split:

local checker symbol: 3
presentation label:   "List.map"
resolved object:      sha256:...
exported view:        Fn [...]

9.3 Execution hydration versus contract evidence

Execution imports should use a narrow, demand-driven path:

module import -> selected executable exports -> hydrate selected tree-term objects

This path should not compute a dependency closure over other module exports. Each selected executable export is already a complete Tree Calculus value.

Contract-aware checking may use a broader path:

module import -> selected exports -> exported view type refs -> typed-program evidence

That path emits portable evidence and leaves compatibility policy decisions to the Tree Calculus checker. typed programs and reusable catalogs do not need their own binary object kinds today: they are ordinary Tree Calculus data and can be stored as arboricx.tree-term.v1 when persistence is useful.

10. Workspace Aliases

A workspace is mutable human-facing state over immutable content.

Examples:

List       -> module manifest hash
Prelude    -> module manifest hash
map        -> tree-term hash
httpServer -> bundle hash

Aliases should live under:

store/aliases/

Initial categories:

store/aliases/modules/<name>
store/aliases/names/<name>
store/aliases/packages/<name>

Alias file contents should be simple and explicit, for example:

kind: arboricx.module-manifest.v1
hash: abc123...

Exact encoding can be decided with the first implementation. The important rule is that aliases are mutable pointers, not content identity.

11. Existing Convention Alignment

This design intentionally preserves existing conventions where they already fit:

  • SHA-256 domain-separated Merkle node hashing;
  • Leaf / Stem / Fork node payload tags 0x00, 0x01, 0x02;
  • three-character object sharding from lib/arboricx/server.tri;
  • indexed Arboricx bundles as compact transport objects;
  • optional human-facing export names in manifests;
  • View Contract checker evidence as portable Tree Calculus data.

It replaces or demotes conventions that do not fit:

  • SQLite terms.names comma-separated aliases become workspace aliases/indexes;
  • SQLite terms.tags comma-separated tags become optional metadata/indexes;
  • file imports as AST flattening become transitional behavior;
  • names cease to be semantic identity.

12. Implementation Sketch

A staged implementation can proceed as follows:

  1. Add filesystem CAS helpers alongside the existing SQLite store.
  2. Store/load Arboricx Merkle nodes using the filesystem layout.
  3. Implement tree-term storage and reconstruction from filesystem CAS.
  4. Implement pack from CAS tree terms/Merkle roots to indexed Arboricx bundle.
  5. Implement unpack from indexed Arboricx bundle to CAS tree terms/Merkle roots.
  6. Define a concrete module manifest encoding.
  7. Store/load module manifests as content-addressed objects.
  8. Add workspace alias read/write helpers.
  9. Teach import resolution to target module manifests/exports.
  10. Attach exported View Contract artifacts to module exports.
  11. Gradually migrate existing !import users.

13. Deferred Decisions

These are intentionally left out of the first concrete format:

  • package version solving;
  • registry/remotes protocol;
  • garbage collection/reachability;
  • source/provenance/build-record objects;
  • editor/update workflows;
  • rich visibility/export rules;
  • final import syntax;
  • whether module manifests also need a tree-native encoding.

14. Summary

The concrete v1 direction is:

Store:
  filesystem-backed content-addressed objects

Hashing:
  SHA256(domain || 0x00 || canonical payload)

Tree persistence:
  Arboricx Merkle nodes

Transport:
  indexed .arboricx bundles, packable from and unpackable to CAS roots

Modules:
  immutable manifests pairing export names with object refs and optional View
  Contract refs

Workspace:
  mutable aliases from human names to immutable content hashes

This keeps the store portable, preserves Arboricx's compact transport role, restores Merkle DAGs as the persistence model, and gives View Contracts a stable module/export attachment point without making the store tricu-specific.