# Content Store and Module Format Design Status: concrete design draft. This document narrows the higher-level module-system direction into concrete format and storage decisions. It intentionally avoids source/provenance details: modules export usable portable artifacts, not edit history. Related design overview: `docs/module-system-design.md`. ## 1. Scope This document specifies the first target shape for: - a neutral filesystem-backed content-addressed store; - Arboricx Merkle node persistence; - indexed Arboricx bundle import/export as transport; - module manifests as immutable export maps; - workspace aliases as mutable human-facing references; - View Contract artifact attachment to module exports. It does not specify: - package manager semantics; - dependency solving; - source-level rebuild/provenance metadata; - final import syntax; - garbage collection; - registry/sync protocol. ## 2. Non-Negotiable Boundaries The content store is not `tricu`-specific and is not Haskell-specific. The store may contain objects produced by `tricu`, Haskell, Tree Calculus tools, Arboricx tooling, or future frontends. The store core only knows object bytes, object kinds, hashes, aliases, and optionally structural references for known portable formats. View Contracts may be first-class artifact references because they are portable Tree Calculus data checked by pure Tree Calculus code. They are not Haskell-private semantics. Source and build provenance are intentionally excluded from the first module manifest format. A module manifest answers: ```text What portable artifacts does this module export, and what portable contracts are paired with them? ``` It does not answer: ```text Which source file, parser, frontend, or build command produced these artifacts? ``` ## 3. Hashing Convention Objects are content-addressed by SHA-256 over domain-separated canonical bytes. General rule: ```text hash = SHA256(domainUtf8 || 0x00 || canonicalPayloadBytes) ``` This matches the existing Merkle node convention in `Research.nodeHash`: ```text SHA256("arboricx.merkle.node.v1" || 0x00 || nodePayload) ``` The domain string is part of the object format. It prevents identical payload bytes in different formats from accidentally sharing identity. Hashes are represented externally as 64 lowercase hexadecimal characters. ## 4. Filesystem Store Layout The canonical filesystem store layout is: ```text store/ objects/ abc/ abc123... -- object bytes, sharded by first 3 hex chars aliases/ names/ modules/ packages/ manifests/ tmp/ ``` The three-character shard follows the existing `lib/arboricx/server.tri` convention. ### 4.1 Object paths For object hash: ```text abc123... ``` object bytes live at: ```text store/objects/abc/abc123... ``` The object filename is the full hash. The shard directory is the first three hex characters. ### 4.2 Atomic writes Writers should use: ```text store/tmp/..tmp ``` then atomically rename into: ```text store/objects// ``` Writing an existing object is idempotent if the existing bytes match the hash. ### 4.3 Store core metadata The minimal filesystem store does not require sidecar metadata for every object. Object kind can be known by context or by manifest references. A later index may cache: ```text hash -> kind hash -> size hash -> references hash -> createdAt ``` but this index is not semantic identity. ## 5. Arboricx Merkle Node Object Format The persistent Tree Calculus representation is a Merkle DAG of node objects. Domain: ```text arboricx.merkle.node.v1 ``` Canonical payloads: ```text Leaf = 0x00 Stem child = 0x01 || childHashRaw32 Fork left right = 0x02 || leftHashRaw32 || rightHashRaw32 ``` Where `childHashRaw32`, `leftHashRaw32`, and `rightHashRaw32` are the raw 32-byte SHA-256 digests corresponding to child node hashes. This is already implemented conceptually by: ```text Research.Node Research.serializeNode Research.deserializeNode Research.nodeHash ``` The filesystem CAS should use this payload/hash convention directly. ## 6. Tree Roots A Tree Calculus value stored in the CAS is identified by the hash of its root Merkle node. ```text treeRootHash = hash(rootNodePayload) ``` The complete tree is reconstructed by recursively loading node objects reachable from the root. Hydration is an interpretation step, not part of object identity. A client may hydrate a root as a plain tree, a graph with explicit sharing, or another runtime representation as long as the observable Tree Calculus value is the same. The filesystem CAS provides structural dedupe and portable identity; it does not by itself guarantee that a hydrated runtime value is the cheapest representation for all workloads. Merkle nodes are useful for explicit DAG-oriented tooling, audit, and bundle packing. They are not the default representation for module executable exports: storing every subtree as a separate filesystem object is pathologically slow for large normal forms. For module-backed evaluation and imports, a complete normalized named term is stored as one canonical object: ```text kind: arboricx.tree-term.v1 hash: abi: arboricx.abi.tree.v1 ``` The `arboricx.tree-term.v1` payload is a prefix encoding: ```text Leaf = 0x00 Stem t = 0x01 Tree Fork l r = 0x02 Tree Tree ``` ## 7. Arboricx Indexed Bundles Indexed `.arboricx` bundles remain the transport/execution format. They are: - compact; - self-contained; - deterministic; - suitable for restricted runtimes; - suitable for HTTP serving and deployment. They are not the canonical long-lived deduplicated store representation. ### 7.1 Pack Packing converts one or more CAS tree roots into an indexed bundle: ```text CAS tree roots -> indexed Arboricx bundle ``` The packer traverses reachable Merkle nodes, emits a compact indexed node table, and writes a bundle manifest with export names and root indices. ### 7.3 Unpack Unpacking converts a bundle into CAS nodes: ```text indexed Arboricx bundle -> CAS tree roots ``` The unpacker verifies the bundle structure, reconstructs each exported tree, and stores the corresponding Merkle nodes. It returns the tree root hash for each bundle export. ## 8. Module Manifest v1 A module is an immutable manifest object. The module identity is the hash of its canonical manifest bytes. A module name is not identity. It is a workspace alias to a module manifest hash. ### 8.1 Domain Proposed domain: ```text arboricx.module-manifest.v1 ``` ### 8.2 Purpose A module manifest pairs human-facing export names with portable content objects and optional portable contracts. It exists to support: - reproducible import resolution; - executable export discovery; - View Contract lookup for imported symbols; - module-to-module reference tracking; - transport/store interop. It does not describe source provenance. ### 8.3 Conceptual shape ```text moduleManifestV1: imports: - alias: kind: hash: exports: - name: object: kind: hash: abi: view: optional kind: hash: catalog: optional kind: hash: metadata: optional human-facing fields ``` ### 8.4 Imports/references The `imports` section is a manifest reference graph, not a store-level language dependency graph. Each entry records direct content-addressed references used by the module: ```text alias: Prelude kind: arboricx.module-manifest.v1 hash: ``` This supports reproducibility, partial fetch, and audit. The content store core stores this object but does not need to understand `Prelude` or import semantics. ### 8.5 Exports Each export is a record, not a single hash. This is required so executable objects and advertised contracts cannot drift apart. Minimal executable export: ```text name: "id" object: kind: arboricx.tree-term.v1 hash: abi: arboricx.abi.tree.v1 ``` Export with View Contract: ```text name: "map" object: kind: arboricx.tree-term.v1 hash: abi: arboricx.abi.tree.v1 view: kind: arboricx.view-contract.type.v1 hash: ``` The manifest preserves the pairing between exported executable and exported contract. For workspace modules built from local source, annotated exports are checked before the manifest is published; only exports that pass producer-side View Contract checking receive direct `arboricx.view-contract.type.v1` refs. ### 8.6 Metadata Metadata is optional and human-facing. Initial fields may include: ```text package version description license createdBy ``` Metadata is not source provenance and is not required for execution or checking. ## 9. View Contract Artifacts View Contract artifacts are portable Arboricx-layer data. They may be stored as content objects and referenced by module exports. `tricu` may emit these objects, but the object kind is not tricu-specific. Current artifact kind: ```text arboricx.view-contract.type.v1 ``` `arboricx.view-contract.type.v1` is the direct export-view artifact. Its payload is a canonical prefix binary encoding of the syntactic ViewType: ```text Name = 0x00 u32be(byte-length) utf8-name Ref = 0x01 u32be(byte-length) utf8-ref List = 0x02 ViewType Maybe = 0x03 ViewType Pair = 0x04 ViewType ViewType Result = 0x05 ViewType ViewType Fn = 0x06 u32be(argument-count) ViewType* ViewType ``` `utf8-ref` is tagged text: ```text i: numeric/legacy ref s: symbolic user ref ``` Symbolic refs are the preferred user-authored form; numeric refs remain useful for generated code, fixtures, and old low-level examples. The object hash domain is the object kind: ```text arboricx.view-contract.type.v1 \0 ``` ### 9.1 Export-level pairing The module manifest is the canonical pairing of an executable export and its advertised contract: ```text export name -> tree-term hash + optional view artifact hash ``` This avoids drift such as: ```text map -> tree A map.view -> contract B ``` where aliases might be retargeted independently. ### 9.2 Import checking When a source file imports a module, a frontend can resolve an imported export, decode its direct `arboricx.view-contract.type.v1` ref, and emit typed program evidence locally: ```text imported List.map has view Fn [...] ``` For locally built workspace modules this is backed by producer-side checking before the module manifest alias is published, including imported view facts from dependencies used by the producer source. External or prebuilt manifests are trusted boundary declarations for now; they are not accompanied by proof objects. The checker still consumes only local numeric symbols and typed-program evidence. Global content hashes do not become checker symbols. Correct split: ```text local checker symbol: 3 presentation label: "List.map" resolved object: sha256:... exported view: Fn [...] ``` ### 9.3 Execution hydration versus contract evidence Execution imports should use a narrow, demand-driven path: ```text module import -> selected executable exports -> hydrate selected tree-term objects ``` This path should not compute a dependency closure over other module exports. Each selected executable export is already a complete Tree Calculus value. Contract-aware checking may use a broader path: ```text module import -> selected exports -> exported view type refs -> typed-program evidence ``` That path emits portable evidence and leaves compatibility policy decisions to the Tree Calculus checker. typed programs and reusable catalogs do not need their own binary object kinds today: they are ordinary Tree Calculus data and can be stored as `arboricx.tree-term.v1` when persistence is useful. ## 10. Workspace Aliases A workspace is mutable human-facing state over immutable content. Examples: ```text List -> module manifest hash Prelude -> module manifest hash map -> tree-term hash httpServer -> bundle hash ``` Aliases should live under: ```text store/aliases/ ``` Initial categories: ```text store/aliases/modules/ store/aliases/names/ store/aliases/packages/ ``` Alias file contents should be simple and explicit, for example: ```text kind: arboricx.module-manifest.v1 hash: abc123... ``` Exact encoding can be decided with the first implementation. The important rule is that aliases are mutable pointers, not content identity. ## 11. Existing Convention Alignment This design intentionally preserves existing conventions where they already fit: - SHA-256 domain-separated Merkle node hashing; - `Leaf` / `Stem` / `Fork` node payload tags `0x00`, `0x01`, `0x02`; - three-character object sharding from `lib/arboricx/server.tri`; - indexed Arboricx bundles as compact transport objects; - optional human-facing export names in manifests; - View Contract checker evidence as portable Tree Calculus data. It replaces or demotes conventions that do not fit: - SQLite `terms.names` comma-separated aliases become workspace aliases/indexes; - SQLite `terms.tags` comma-separated tags become optional metadata/indexes; - file imports as AST flattening become transitional behavior; - names cease to be semantic identity. ## 12. Implementation Sketch A staged implementation can proceed as follows: 1. Add filesystem CAS helpers alongside the existing SQLite store. 2. Store/load Arboricx Merkle nodes using the filesystem layout. 3. Implement tree-term storage and reconstruction from filesystem CAS. 4. Implement pack from CAS tree terms/Merkle roots to indexed Arboricx bundle. 5. Implement unpack from indexed Arboricx bundle to CAS tree terms/Merkle roots. 6. Define a concrete module manifest encoding. 7. Store/load module manifests as content-addressed objects. 8. Add workspace alias read/write helpers. 9. Teach import resolution to target module manifests/exports. 10. Attach exported View Contract artifacts to module exports. 11. Gradually migrate existing `!import` users. ## 13. Deferred Decisions These are intentionally left out of the first concrete format: - package version solving; - registry/remotes protocol; - garbage collection/reachability; - source/provenance/build-record objects; - editor/update workflows; - rich visibility/export rules; - final import syntax; - whether module manifests also need a tree-native encoding. ## 14. Summary The concrete v1 direction is: ```text Store: filesystem-backed content-addressed objects Hashing: SHA256(domain || 0x00 || canonical payload) Tree persistence: Arboricx Merkle nodes Transport: indexed .arboricx bundles, packable from and unpackable to CAS roots Modules: immutable manifests pairing export names with object refs and optional View Contract refs Workspace: mutable aliases from human names to immutable content hashes ``` This keeps the store portable, preserves Arboricx's compact transport role, restores Merkle DAGs as the persistence model, and gives View Contracts a stable module/export attachment point without making the store `tricu`-specific.