15 KiB
Module System and Content Store Design
Status: design draft.
This document records the intended direction for reworking tricu modules,
imports, Arboricx storage/transport, and the content store. It is not an
implementation plan yet; it is a shared design target.
1. Problem Statement
The current module/import/content-store system is useful as a prototype, but it is not coherent enough to build on indefinitely.
Current behavior combines several partially-overlapping systems:
!import "path.tri" Namespaceand!import "path.tri" !Localperform filesystem-relative source preprocessing;- imported definitions are flattened into one program;
- namespace qualification is implemented by string rewriting;
- evaluation uses a flat
Map String Tenvironment; - the Haskell content store stores Tree Calculus Merkle nodes plus an ad hoc
termstable with comma-separated names and tags; - the REPL can resolve names from the content store, including multiple versions;
- Arboricx bundles provide compact indexed transport objects;
lib/arboricx/server.trialready sketches a filesystem-backed object store.
This works only when users and maintainers are mindful of sharp edges:
- names serve too many roles at once;
- modules are not first-class semantic objects;
- imports are closer to AST paste-and-prefix than resolution;
!Localimports can create global collisions;- content identity, human aliases, source files, and evaluated terms are not cleanly separated;
- the SQLite schema is convenient but not a principled content-addressed store;
- Arboricx transport and long-lived storage are not clearly distinguished.
2. Design Principles
2.1 Content addressability is foundational
Immutable content should be identified by hashes. Human names should be metadata or workspace aliases over content, not semantic identity.
This follows the core lesson from systems such as Unison: separate stable content identity from ergonomic naming and namespace organization.
2.2 The content store is language-neutral
The content store must not be married to tricu or Haskell.
It stores a small set of portable Arboricx artifacts: module manifests, complete tree terms, and direct View Contract types. Lower-level Merkle/bundle formats exist for transport and DAG tooling, but the store core should treat all objects as content-addressed bytes with formats/media types.
tricu and Haskell are clients/tooling. They are not the semantic owners of the
store.
2.3 View Contracts are portable enough to integrate
The store may integrate with View Contracts because the checker and evidence
format are pure Tree Calculus / portable tree data. View Contracts are not a
Haskell-private or tricu-private semantic layer.
The module resolver may emit typed-program evidence, but checker semantics remain unchanged:
Haskell emits evidence.
tricu judges evidence.
2.4 Modules should reflect definitions as they actually exist
The module system should conform to the reality of content-addressed immutable artifacts and mutable human aliases. We should not contort definitions to fit a traditional text-file module system if that fights the storage model.
2.5 Transport and storage are different jobs
Indexed Arboricx bundles are excellent transport/execution objects. Merkle DAGs are better long-lived persistence objects. These should remain separate but interoperable representations.
3. Conceptual Architecture
Content Store
neutral content-addressed object store
Arboricx CAS / Merkle Store
Tree Calculus node/object formats suitable for persistence and dedupe
Arboricx Bundle
compact indexed transport/execution format
View Contract Artifact
portable evidence/checker data over tree artifacts
Module Manifest
immutable export map from names to content objects and optional contracts
Workspace
mutable aliases, selected versions, package pins, and user-facing names
tricu
one frontend/toolchain that emits/consumes these portable artifacts
The content store stores objects. Arboricx defines important object formats.
View Contracts define portable checking artifacts. tricu produces and consumes
those formats.
3.1 Execution imports versus contract checking
Import resolution has two intentionally different performance profiles.
For normal execution/evaluation, resolving a module import should hydrate only
the executable exports directly demanded by the importing source. Exported Tree
Calculus values are complete normal forms: importing foo does not require
hydrating separate bar or baz exports that may have helped build it. This is
the fast path for !import, including !Local imports.
View Contract checking is a separate evidence-gathering path. It may load exported direct view types for the symbols that participate in a check. That slower path must remain behind the typed program boundary:
Haskell emits evidence.
tricu judges evidence.
Reusable view catalogs are ordinary tricu libraries/tree terms, not a separate core CAS artifact kind.
For locally built workspace modules, advertised direct export views are producer-checked before the manifest alias is written. Producer checking includes advertised views from any imported modules used by that source, so a module cannot publish a local annotated export that contradicts a dependency's exported view. If producer checking fails, the module alias is not written.
Consumer checking then resolves selected module exports, decodes their
arboricx.view-contract.type.v1 refs, and emits trusted KnownView evidence
for the local imported symbols. Those facts are module-boundary assumptions:
local workspace builds create them after producer-side checking, while external
or prebuilt manifests are trusted inputs for now. In all cases, compatibility
with local requirements is still judged by the portable checker in lib/view.tri.
4. Content Store Direction
4.1 Store core
The store core should be a content-addressed object store:
hash -> object bytes
hash -> object kind / media type
hash -> optional metadata/index entries
The hash should be over canonical bytes with domain separation. The object kind or media type determines how a client interprets those bytes.
Current module/check object kinds:
arboricx.module-manifest.v1
arboricx.tree-term.v1
arboricx.view-contract.type.v1
Merkle nodes and indexed bundles remain lower-level Arboricx transport/DAG formats, but they are not the module/eval storage model. typed programs and view catalogs are ordinary tree terms unless a future external tooling use case proves that they need their own object kind.
The store core should not need to know what a tricu definition means.
4.2 Filesystem-backed layout
The long-term store should converge with the direction already sketched in
lib/arboricx/server.tri:
store/
objects/
abc/
abc123...object
aliases/
names/
modules/
packages/
manifests/
tmp/
SQLite may remain useful as an optional index/cache, but it should not be the canonical store model.
4.3 Structural references, not language dependencies
The store may understand structural content references when they are part of an object format. For example, a Merkle node naturally references child hashes:
Leaf
Stem childHash
Fork leftHash rightHash
This is not a tricu dependency graph. It is content structure.
Language/tool-level relationships such as "compiled from source", "exported by module", or "checked with contract" can live in manifests or indexes. They should not be required by the store core.
5. Arboricx Role
Arboricx should be understood as a family of portable Tree Calculus artifact formats, not as a single storage mechanism.
5.1 Arboricx Bundle
The existing indexed .arboricx format remains the preferred transport and
execution object:
- compact;
- self-contained;
- deterministic;
- easy to parse in constrained runtimes;
- suitable for deployment and HTTP serving;
- structurally verifiable without hash recomputation per node.
It says:
Here is everything you need, densely packed.
5.2 Arboricx CAS / Merkle Store
The persistent store should use content-addressed structural objects:
Leaf
Stem childHash
Fork leftHash rightHash
This enables dedupe across definitions, modules, packages, and versions. A large program that shares subtrees with other programs should not store those subtrees multiple times.
It says:
Here are immutable objects, addressable independently.
5.3 Pack and unpack
Transport and storage should interoperate explicitly:
CAS root(s) -> pack -> indexed Arboricx bundle
Arboricx bundle -> unpack -> CAS root(s)
The bundle can be treated as an opaque content-addressed blob by the store, and it can also be unpacked into Merkle nodes for dedupe and partial reuse.
6. Modules
6.1 Module identity
A module should be an immutable manifest object. Its identity is the hash of its canonical manifest bytes.
A module name is not identity. It is a workspace alias or package-level alias to a module hash.
6.2 Module contents
A module manifest should primarily be an export map:
module hash
exports:
name -> content reference
metadata:
package
version
description
license
createdBy
optional:
view contract artifact refs
ABI/media type info
source/provenance refs
The manifest should be portable and mostly format-oriented. It should not depend
on Haskell data structures or tricu-specific internal semantics.
6.3 Export entries
An export entry may eventually look conceptually like:
name: "map"
object: sha256:...
kind: arboricx.tree-term.v1
abi: arboricx.abi.tree.v1
view: sha256:... -- optional View Contract artifact
source: sha256:... -- optional source/provenance object
Executable module exports are complete normalized tree terms stored as one
arboricx.tree-term.v1 object per named export. Merkle-node storage remains
available for DAG-oriented tooling, but module/eval imports should not store or
hydrate every subtree as a separate filesystem object.
6.4 Import behavior
Imports should resolve module aliases or content references to module manifests, then bind selected exports into the local source scope.
Export selection has one intentional aggregator special case:
module with local top-level definitions -> exports only those local definitions
module with only imports -> reexports the evaluated import env
This lets files such as prelude.tri act as explicit barrel modules without
making every ordinary module reexport its imports. A module that defines even one
local top-level name does not implicitly reexport imported names.
The future pipeline should be:
parse source
resolve imports/names to module exports and content refs
lower source using resolved refs
emit a view-tree artifact
check evidence when requested
store/export artifacts
It should not be:
paste imported ASTs into one file and rewrite strings
7. Workspace Layer
Mutable human-facing state belongs in a workspace layer.
Examples:
List -> module hash
Http -> module hash
map -> definition/tree hash
selected List version -> module hash
package pin prelude -> package/module hash
The workspace is where names, selections, pins, and aliases live. Renaming should usually mutate workspace aliases, not immutable content objects.
This gives humans stable ergonomic names without making names semantic identity.
8. Definition Identity
There are two useful identities and we should support both.
8.1 Tree identity
A Tree Calculus value has a Merkle root hash. This identifies the executable tree itself.
This is the right identity for:
- execution;
- dedupe;
- bundle roots;
- low-level artifact sharing.
8.2 Module/export identity
The module manifest is the higher-level artifact boundary. It pairs each export name with its compiled tree term and optional direct View Contract type.
The content store should not require extra definition/source/provenance objects, and fully untyped Tree Calculus code must remain valid.
9. View Contract Integration
View Contracts should attach to modules/exports as portable artifacts.
An imported definition can be assigned a local numeric symbol while lowering a typed program. Its global identity remains a content hash or module export ref.
This is the intended split:
typed program local symbol: 3
Debug label: "List.map"
Resolved object: sha256:...
Exported view: Fn [...]
De Bruijn-style integer symbols are still appropriate inside a typed program. They are local evidence identifiers, not global content identity.
We should not make global objects depend on numeric checker symbols.
Untyped code remains valid with no contract artifact. If a boundary needs to
participate in checking but has no information, it may use Any or rely on
policy. We should not pretend all untyped functions have an infinite
Any -> Any -> ... contract.
10. Import Syntax Direction
Exact syntax is future work, but the current !import form should be considered
a transitional mechanism.
Future imports should distinguish:
- path-based source imports for local development;
- workspace/module alias imports;
- explicit content-addressed imports;
- selected/exposed names;
- qualified versus unqualified binding.
Possible directions:
import "./list.tri" as List
import List exposing (map foldl)
import #abc123... as List
The syntax should be designed after the object/module model is clearer.
11. Migration Strategy
A plausible migration path:
- Define the neutral object store model and filesystem layout.
- Implement Merkle node persistence against that layout.
- Add pack/unpack between CAS roots and indexed Arboricx bundles.
- Replace ad hoc SQLite
termsnames/tags with workspace aliases or a clearer index layer. - Define module manifest objects.
- Teach source imports to resolve manifests/exports instead of rewriting ASTs.
- Attach View Contract artifacts to module exports.
- Gradually migrate existing
lib/anddemos/imports.
Compatibility shims may keep existing !import working during migration.
12. Open Questions
- What exact canonical byte format should store objects use?
- Should module manifests be binary, tree-encoded, or both?
- What media type/kind registry do we need first?
- How should object references be represented in source syntax?
- How should workspaces be stored and shared?
- What is the minimum useful module manifest?
- Should source files compile directly to module manifests, or should manifests be produced by explicit package commands?
- How much Arboricx bundle metadata should reference CAS roots?
- What GC/reachability model should the store eventually use?
13. Summary
The desired design is:
Content store:
portable CAS for immutable objects and structural references
Arboricx bundle:
compact indexed transport/execution object
Arboricx CAS:
persistent Merkle DAG/object representation for dedupe and partial reuse
Modules:
immutable manifests mapping export names to content objects and optional
contracts
Workspace:
mutable human aliases, version selections, and package/module pins
View Contracts:
portable evidence artifacts attached to exports and checked by pure Tree
Calculus code
The key architectural rule is that hashes provide stable identity, while names provide human usability. The module system should be built on that separation.