# Module System and Content Store Design Status: design draft. This document records the intended direction for reworking `tricu` modules, imports, Arboricx storage/transport, and the content store. It is not an implementation plan yet; it is a shared design target. ## 1. Problem Statement The current module/import/content-store system is useful as a prototype, but it is not coherent enough to build on indefinitely. Current behavior combines several partially-overlapping systems: - `!import "path.tri" Namespace` and `!import "path.tri" !Local` perform filesystem-relative source preprocessing; - imported definitions are flattened into one program; - namespace qualification is implemented by string rewriting; - evaluation uses a flat `Map String T` environment; - the Haskell content store stores Tree Calculus Merkle nodes plus an ad hoc `terms` table with comma-separated names and tags; - the REPL can resolve names from the content store, including multiple versions; - Arboricx bundles provide compact indexed transport objects; - `lib/arboricx/server.tri` already sketches a filesystem-backed object store. This works only when users and maintainers are mindful of sharp edges: - names serve too many roles at once; - modules are not first-class semantic objects; - imports are closer to AST paste-and-prefix than resolution; - `!Local` imports can create global collisions; - content identity, human aliases, source files, and evaluated terms are not cleanly separated; - the SQLite schema is convenient but not a principled content-addressed store; - Arboricx transport and long-lived storage are not clearly distinguished. ## 2. Design Principles ### 2.1 Content addressability is foundational Immutable content should be identified by hashes. Human names should be metadata or workspace aliases over content, not semantic identity. This follows the core lesson from systems such as Unison: separate stable content identity from ergonomic naming and namespace organization. ### 2.2 The content store is language-neutral The content store must not be married to `tricu` or Haskell. It stores a small set of portable Arboricx artifacts: module manifests, complete tree terms, and direct View Contract types. Lower-level Merkle/bundle formats exist for transport and DAG tooling, but the store core should treat all objects as content-addressed bytes with formats/media types. `tricu` and Haskell are clients/tooling. They are not the semantic owners of the store. ### 2.3 View Contracts are portable enough to integrate The store may integrate with View Contracts because the checker and evidence format are pure Tree Calculus / portable tree data. View Contracts are not a Haskell-private or `tricu`-private semantic layer. The module resolver may emit typed-program evidence, but checker semantics remain unchanged: ```text Haskell emits evidence. tricu judges evidence. ``` ### 2.4 Modules should reflect definitions as they actually exist The module system should conform to the reality of content-addressed immutable artifacts and mutable human aliases. We should not contort definitions to fit a traditional text-file module system if that fights the storage model. ### 2.5 Transport and storage are different jobs Indexed Arboricx bundles are excellent transport/execution objects. Merkle DAGs are better long-lived persistence objects. These should remain separate but interoperable representations. ## 3. Conceptual Architecture ```text Content Store neutral content-addressed object store Arboricx CAS / Merkle Store Tree Calculus node/object formats suitable for persistence and dedupe Arboricx Bundle compact indexed transport/execution format View Contract Artifact portable evidence/checker data over tree artifacts Module Manifest immutable export map from names to content objects and optional contracts Workspace mutable aliases, selected versions, package pins, and user-facing names tricu one frontend/toolchain that emits/consumes these portable artifacts ``` The content store stores objects. Arboricx defines important object formats. View Contracts define portable checking artifacts. `tricu` produces and consumes those formats. ### 3.1 Execution imports versus contract checking Import resolution has two intentionally different performance profiles. For normal execution/evaluation, resolving a module import should hydrate only the executable exports directly demanded by the importing source. Exported Tree Calculus values are complete normal forms: importing `foo` does not require hydrating separate `bar` or `baz` exports that may have helped build it. This is the fast path for `!import`, including `!Local` imports. View Contract checking is a separate evidence-gathering path. It may load exported direct view types for the symbols that participate in a check. That slower path must remain behind the typed program boundary: ```text Haskell emits evidence. tricu judges evidence. ``` Reusable view catalogs are ordinary tricu libraries/tree terms, not a separate core CAS artifact kind. For locally built workspace modules, advertised direct export views are producer-checked before the manifest alias is written. Producer checking includes advertised views from any imported modules used by that source, so a module cannot publish a local annotated export that contradicts a dependency's exported view. If producer checking fails, the module alias is not written. Consumer checking then resolves selected module exports, decodes their `arboricx.view-contract.type.v1` refs, and emits trusted `KnownView` evidence for the local imported symbols. Those facts are module-boundary assumptions: local workspace builds create them after producer-side checking, while external or prebuilt manifests are trusted inputs for now. In all cases, compatibility with local requirements is still judged by the portable checker in `lib/view.tri`. ## 4. Content Store Direction ### 4.1 Store core The store core should be a content-addressed object store: ```text hash -> object bytes hash -> object kind / media type hash -> optional metadata/index entries ``` The hash should be over canonical bytes with domain separation. The object kind or media type determines how a client interprets those bytes. Current module/check object kinds: ```text arboricx.module-manifest.v1 arboricx.tree-term.v1 arboricx.view-contract.type.v1 ``` Merkle nodes and indexed bundles remain lower-level Arboricx transport/DAG formats, but they are not the module/eval storage model. typed programs and view catalogs are ordinary tree terms unless a future external tooling use case proves that they need their own object kind. The store core should not need to know what a `tricu` definition means. ### 4.2 Filesystem-backed layout The long-term store should converge with the direction already sketched in `lib/arboricx/server.tri`: ```text store/ objects/ abc/ abc123...object aliases/ names/ modules/ packages/ manifests/ tmp/ ``` SQLite may remain useful as an optional index/cache, but it should not be the canonical store model. ### 4.3 Structural references, not language dependencies The store may understand structural content references when they are part of an object format. For example, a Merkle node naturally references child hashes: ```text Leaf Stem childHash Fork leftHash rightHash ``` This is not a `tricu` dependency graph. It is content structure. Language/tool-level relationships such as "compiled from source", "exported by module", or "checked with contract" can live in manifests or indexes. They should not be required by the store core. ## 5. Arboricx Role Arboricx should be understood as a family of portable Tree Calculus artifact formats, not as a single storage mechanism. ### 5.1 Arboricx Bundle The existing indexed `.arboricx` format remains the preferred transport and execution object: - compact; - self-contained; - deterministic; - easy to parse in constrained runtimes; - suitable for deployment and HTTP serving; - structurally verifiable without hash recomputation per node. It says: ```text Here is everything you need, densely packed. ``` ### 5.2 Arboricx CAS / Merkle Store The persistent store should use content-addressed structural objects: ```text Leaf Stem childHash Fork leftHash rightHash ``` This enables dedupe across definitions, modules, packages, and versions. A large program that shares subtrees with other programs should not store those subtrees multiple times. It says: ```text Here are immutable objects, addressable independently. ``` ### 5.3 Pack and unpack Transport and storage should interoperate explicitly: ```text CAS root(s) -> pack -> indexed Arboricx bundle Arboricx bundle -> unpack -> CAS root(s) ``` The bundle can be treated as an opaque content-addressed blob by the store, and it can also be unpacked into Merkle nodes for dedupe and partial reuse. ## 6. Modules ### 6.1 Module identity A module should be an immutable manifest object. Its identity is the hash of its canonical manifest bytes. A module name is not identity. It is a workspace alias or package-level alias to a module hash. ### 6.2 Module contents A module manifest should primarily be an export map: ```text module hash exports: name -> content reference metadata: package version description license createdBy optional: view contract artifact refs ABI/media type info source/provenance refs ``` The manifest should be portable and mostly format-oriented. It should not depend on Haskell data structures or `tricu`-specific internal semantics. ### 6.3 Export entries An export entry may eventually look conceptually like: ```text name: "map" object: sha256:... kind: arboricx.tree-term.v1 abi: arboricx.abi.tree.v1 view: sha256:... -- optional View Contract artifact source: sha256:... -- optional source/provenance object ``` Executable module exports are complete normalized tree terms stored as one `arboricx.tree-term.v1` object per named export. Merkle-node storage remains available for DAG-oriented tooling, but module/eval imports should not store or hydrate every subtree as a separate filesystem object. ### 6.4 Import behavior Imports should resolve module aliases or content references to module manifests, then bind selected exports into the local source scope. Export selection has one intentional aggregator special case: ```text module with local top-level definitions -> exports only those local definitions module with only imports -> reexports the evaluated import env ``` This lets files such as `prelude.tri` act as explicit barrel modules without making every ordinary module reexport its imports. A module that defines even one local top-level name does not implicitly reexport imported names. The future pipeline should be: ```text parse source resolve imports/names to module exports and content refs lower source using resolved refs emit a view-tree artifact check evidence when requested store/export artifacts ``` It should not be: ```text paste imported ASTs into one file and rewrite strings ``` ## 7. Workspace Layer Mutable human-facing state belongs in a workspace layer. Examples: ```text List -> module hash Http -> module hash map -> definition/tree hash selected List version -> module hash package pin prelude -> package/module hash ``` The workspace is where names, selections, pins, and aliases live. Renaming should usually mutate workspace aliases, not immutable content objects. This gives humans stable ergonomic names without making names semantic identity. ## 8. Definition Identity There are two useful identities and we should support both. ### 8.1 Tree identity A Tree Calculus value has a Merkle root hash. This identifies the executable tree itself. This is the right identity for: - execution; - dedupe; - bundle roots; - low-level artifact sharing. ### 8.2 Module/export identity The module manifest is the higher-level artifact boundary. It pairs each export name with its compiled tree term and optional direct View Contract type. The content store should not require extra definition/source/provenance objects, and fully untyped Tree Calculus code must remain valid. ## 9. View Contract Integration View Contracts should attach to modules/exports as portable artifacts. An imported definition can be assigned a local numeric symbol while lowering a typed program. Its global identity remains a content hash or module export ref. This is the intended split: ```text typed program local symbol: 3 Debug label: "List.map" Resolved object: sha256:... Exported view: Fn [...] ``` De Bruijn-style integer symbols are still appropriate inside a typed program. They are local evidence identifiers, not global content identity. We should not make global objects depend on numeric checker symbols. Untyped code remains valid with no contract artifact. If a boundary needs to participate in checking but has no information, it may use `Any` or rely on policy. We should not pretend all untyped functions have an infinite `Any -> Any -> ...` contract. ## 10. Import Syntax Direction Exact syntax is future work, but the current `!import` form should be considered a transitional mechanism. Future imports should distinguish: - path-based source imports for local development; - workspace/module alias imports; - explicit content-addressed imports; - selected/exposed names; - qualified versus unqualified binding. Possible directions: ```tri import "./list.tri" as List import List exposing (map foldl) import #abc123... as List ``` The syntax should be designed after the object/module model is clearer. ## 11. Migration Strategy A plausible migration path: 1. Define the neutral object store model and filesystem layout. 2. Implement Merkle node persistence against that layout. 3. Add pack/unpack between CAS roots and indexed Arboricx bundles. 4. Replace ad hoc SQLite `terms` names/tags with workspace aliases or a clearer index layer. 5. Define module manifest objects. 6. Teach source imports to resolve manifests/exports instead of rewriting ASTs. 7. Attach View Contract artifacts to module exports. 8. Gradually migrate existing `lib/` and `demos/` imports. Compatibility shims may keep existing `!import` working during migration. ## 12. Open Questions - What exact canonical byte format should store objects use? - Should module manifests be binary, tree-encoded, or both? - What media type/kind registry do we need first? - How should object references be represented in source syntax? - How should workspaces be stored and shared? - What is the minimum useful module manifest? - Should source files compile directly to module manifests, or should manifests be produced by explicit package commands? - How much Arboricx bundle metadata should reference CAS roots? - What GC/reachability model should the store eventually use? ## 13. Summary The desired design is: ```text Content store: portable CAS for immutable objects and structural references Arboricx bundle: compact indexed transport/execution object Arboricx CAS: persistent Merkle DAG/object representation for dedupe and partial reuse Modules: immutable manifests mapping export names to content objects and optional contracts Workspace: mutable human aliases, version selections, and package/module pins View Contracts: portable evidence artifacts attached to exports and checked by pure Tree Calculus code ``` The key architectural rule is that hashes provide stable identity, while names provide human usability. The module system should be built on that separation.