506 lines
15 KiB
Markdown
506 lines
15 KiB
Markdown
# Module System and Content Store Design
|
|
|
|
Status: design draft.
|
|
|
|
This document records the intended direction for reworking `tricu` modules,
|
|
imports, Arboricx storage/transport, and the content store. It is not an
|
|
implementation plan yet; it is a shared design target.
|
|
|
|
## 1. Problem Statement
|
|
|
|
The current module/import/content-store system is useful as a prototype, but it
|
|
is not coherent enough to build on indefinitely.
|
|
|
|
Current behavior combines several partially-overlapping systems:
|
|
|
|
- `!import "path.tri" Namespace` and `!import "path.tri" !Local` perform
|
|
filesystem-relative source preprocessing;
|
|
- imported definitions are flattened into one program;
|
|
- namespace qualification is implemented by string rewriting;
|
|
- evaluation uses a flat `Map String T` environment;
|
|
- the Haskell content store stores Tree Calculus Merkle nodes plus an ad hoc
|
|
`terms` table with comma-separated names and tags;
|
|
- the REPL can resolve names from the content store, including multiple versions;
|
|
- Arboricx bundles provide compact indexed transport objects;
|
|
- `lib/arboricx/server.tri` already sketches a filesystem-backed object store.
|
|
|
|
This works only when users and maintainers are mindful of sharp edges:
|
|
|
|
- names serve too many roles at once;
|
|
- modules are not first-class semantic objects;
|
|
- imports are closer to AST paste-and-prefix than resolution;
|
|
- `!Local` imports can create global collisions;
|
|
- content identity, human aliases, source files, and evaluated terms are not
|
|
cleanly separated;
|
|
- the SQLite schema is convenient but not a principled content-addressed store;
|
|
- Arboricx transport and long-lived storage are not clearly distinguished.
|
|
|
|
## 2. Design Principles
|
|
|
|
### 2.1 Content addressability is foundational
|
|
|
|
Immutable content should be identified by hashes. Human names should be metadata
|
|
or workspace aliases over content, not semantic identity.
|
|
|
|
This follows the core lesson from systems such as Unison: separate stable
|
|
content identity from ergonomic naming and namespace organization.
|
|
|
|
### 2.2 The content store is language-neutral
|
|
|
|
The content store must not be married to `tricu` or Haskell.
|
|
|
|
It stores a small set of portable Arboricx artifacts: module manifests,
|
|
complete tree terms, and direct View Contract types. Lower-level Merkle/bundle
|
|
formats exist for transport and DAG tooling, but the store core should treat all
|
|
objects as content-addressed bytes with formats/media types.
|
|
|
|
`tricu` and Haskell are clients/tooling. They are not the semantic owners of the
|
|
store.
|
|
|
|
### 2.3 View Contracts are portable enough to integrate
|
|
|
|
The store may integrate with View Contracts because the checker and evidence
|
|
format are pure Tree Calculus / portable tree data. View Contracts are not a
|
|
Haskell-private or `tricu`-private semantic layer.
|
|
|
|
The module resolver may emit typed-program evidence, but checker semantics remain
|
|
unchanged:
|
|
|
|
```text
|
|
Haskell emits evidence.
|
|
tricu judges evidence.
|
|
```
|
|
|
|
### 2.4 Modules should reflect definitions as they actually exist
|
|
|
|
The module system should conform to the reality of content-addressed immutable
|
|
artifacts and mutable human aliases. We should not contort definitions to fit a
|
|
traditional text-file module system if that fights the storage model.
|
|
|
|
### 2.5 Transport and storage are different jobs
|
|
|
|
Indexed Arboricx bundles are excellent transport/execution objects. Merkle DAGs
|
|
are better long-lived persistence objects. These should remain separate but
|
|
interoperable representations.
|
|
|
|
## 3. Conceptual Architecture
|
|
|
|
```text
|
|
Content Store
|
|
neutral content-addressed object store
|
|
|
|
Arboricx CAS / Merkle Store
|
|
Tree Calculus node/object formats suitable for persistence and dedupe
|
|
|
|
Arboricx Bundle
|
|
compact indexed transport/execution format
|
|
|
|
View Contract Artifact
|
|
portable evidence/checker data over tree artifacts
|
|
|
|
Module Manifest
|
|
immutable export map from names to content objects and optional contracts
|
|
|
|
Workspace
|
|
mutable aliases, selected versions, package pins, and user-facing names
|
|
|
|
tricu
|
|
one frontend/toolchain that emits/consumes these portable artifacts
|
|
```
|
|
|
|
The content store stores objects. Arboricx defines important object formats.
|
|
View Contracts define portable checking artifacts. `tricu` produces and consumes
|
|
those formats.
|
|
|
|
### 3.1 Execution imports versus contract checking
|
|
|
|
Import resolution has two intentionally different performance profiles.
|
|
|
|
For normal execution/evaluation, resolving a module import should hydrate only
|
|
the executable exports directly demanded by the importing source. Exported Tree
|
|
Calculus values are complete normal forms: importing `foo` does not require
|
|
hydrating separate `bar` or `baz` exports that may have helped build it. This is
|
|
the fast path for `!import`, including `!Local` imports.
|
|
|
|
View Contract checking is a separate evidence-gathering path. It may load
|
|
exported direct view types for the symbols that participate in a check. That
|
|
slower path must remain behind the typed program boundary:
|
|
|
|
```text
|
|
Haskell emits evidence.
|
|
tricu judges evidence.
|
|
```
|
|
|
|
Reusable view catalogs are ordinary tricu libraries/tree terms, not a separate
|
|
core CAS artifact kind.
|
|
|
|
For locally built workspace modules, advertised direct export views are
|
|
producer-checked before the manifest alias is written. Producer checking includes
|
|
advertised views from any imported modules used by that source, so a module
|
|
cannot publish a local annotated export that contradicts a dependency's exported
|
|
view. If producer checking fails, the module alias is not written.
|
|
|
|
Consumer checking then resolves selected module exports, decodes their
|
|
`arboricx.view-contract.type.v1` refs, and emits trusted `KnownView` evidence
|
|
for the local imported symbols. Those facts are module-boundary assumptions:
|
|
local workspace builds create them after producer-side checking, while external
|
|
or prebuilt manifests are trusted inputs for now. In all cases, compatibility
|
|
with local requirements is still judged by the portable checker in `lib/view.tri`.
|
|
|
|
## 4. Content Store Direction
|
|
|
|
### 4.1 Store core
|
|
|
|
The store core should be a content-addressed object store:
|
|
|
|
```text
|
|
hash -> object bytes
|
|
hash -> object kind / media type
|
|
hash -> optional metadata/index entries
|
|
```
|
|
|
|
The hash should be over canonical bytes with domain separation. The object kind
|
|
or media type determines how a client interprets those bytes.
|
|
|
|
Current module/check object kinds:
|
|
|
|
```text
|
|
arboricx.module-manifest.v1
|
|
arboricx.tree-term.v1
|
|
arboricx.view-contract.type.v1
|
|
```
|
|
|
|
Merkle nodes and indexed bundles remain lower-level Arboricx transport/DAG
|
|
formats, but they are not the module/eval storage model. typed programs and view
|
|
catalogs are ordinary tree terms unless a future external tooling use case proves
|
|
that they need their own object kind.
|
|
|
|
The store core should not need to know what a `tricu` definition means.
|
|
|
|
### 4.2 Filesystem-backed layout
|
|
|
|
The long-term store should converge with the direction already sketched in
|
|
`lib/arboricx/server.tri`:
|
|
|
|
```text
|
|
store/
|
|
objects/
|
|
abc/
|
|
abc123...object
|
|
aliases/
|
|
names/
|
|
modules/
|
|
packages/
|
|
manifests/
|
|
tmp/
|
|
```
|
|
|
|
SQLite may remain useful as an optional index/cache, but it should not be the
|
|
canonical store model.
|
|
|
|
### 4.3 Structural references, not language dependencies
|
|
|
|
The store may understand structural content references when they are part of an
|
|
object format. For example, a Merkle node naturally references child hashes:
|
|
|
|
```text
|
|
Leaf
|
|
Stem childHash
|
|
Fork leftHash rightHash
|
|
```
|
|
|
|
This is not a `tricu` dependency graph. It is content structure.
|
|
|
|
Language/tool-level relationships such as "compiled from source", "exported by
|
|
module", or "checked with contract" can live in manifests or indexes. They
|
|
should not be required by the store core.
|
|
|
|
## 5. Arboricx Role
|
|
|
|
Arboricx should be understood as a family of portable Tree Calculus artifact
|
|
formats, not as a single storage mechanism.
|
|
|
|
### 5.1 Arboricx Bundle
|
|
|
|
The existing indexed `.arboricx` format remains the preferred transport and
|
|
execution object:
|
|
|
|
- compact;
|
|
- self-contained;
|
|
- deterministic;
|
|
- easy to parse in constrained runtimes;
|
|
- suitable for deployment and HTTP serving;
|
|
- structurally verifiable without hash recomputation per node.
|
|
|
|
It says:
|
|
|
|
```text
|
|
Here is everything you need, densely packed.
|
|
```
|
|
|
|
### 5.2 Arboricx CAS / Merkle Store
|
|
|
|
The persistent store should use content-addressed structural objects:
|
|
|
|
```text
|
|
Leaf
|
|
Stem childHash
|
|
Fork leftHash rightHash
|
|
```
|
|
|
|
This enables dedupe across definitions, modules, packages, and versions. A large
|
|
program that shares subtrees with other programs should not store those subtrees
|
|
multiple times.
|
|
|
|
It says:
|
|
|
|
```text
|
|
Here are immutable objects, addressable independently.
|
|
```
|
|
|
|
### 5.3 Pack and unpack
|
|
|
|
Transport and storage should interoperate explicitly:
|
|
|
|
```text
|
|
CAS root(s) -> pack -> indexed Arboricx bundle
|
|
Arboricx bundle -> unpack -> CAS root(s)
|
|
```
|
|
|
|
The bundle can be treated as an opaque content-addressed blob by the store, and
|
|
it can also be unpacked into Merkle nodes for dedupe and partial reuse.
|
|
|
|
## 6. Modules
|
|
|
|
### 6.1 Module identity
|
|
|
|
A module should be an immutable manifest object. Its identity is the hash of its
|
|
canonical manifest bytes.
|
|
|
|
A module name is not identity. It is a workspace alias or package-level alias to
|
|
a module hash.
|
|
|
|
### 6.2 Module contents
|
|
|
|
A module manifest should primarily be an export map:
|
|
|
|
```text
|
|
module hash
|
|
exports:
|
|
name -> content reference
|
|
metadata:
|
|
package
|
|
version
|
|
description
|
|
license
|
|
createdBy
|
|
optional:
|
|
view contract artifact refs
|
|
ABI/media type info
|
|
source/provenance refs
|
|
```
|
|
|
|
The manifest should be portable and mostly format-oriented. It should not depend
|
|
on Haskell data structures or `tricu`-specific internal semantics.
|
|
|
|
### 6.3 Export entries
|
|
|
|
An export entry may eventually look conceptually like:
|
|
|
|
```text
|
|
name: "map"
|
|
object: sha256:...
|
|
kind: arboricx.tree-term.v1
|
|
abi: arboricx.abi.tree.v1
|
|
view: sha256:... -- optional View Contract artifact
|
|
source: sha256:... -- optional source/provenance object
|
|
```
|
|
|
|
Executable module exports are complete normalized tree terms stored as one
|
|
`arboricx.tree-term.v1` object per named export. Merkle-node storage remains
|
|
available for DAG-oriented tooling, but module/eval imports should not store or
|
|
hydrate every subtree as a separate filesystem object.
|
|
|
|
### 6.4 Import behavior
|
|
|
|
Imports should resolve module aliases or content references to module manifests,
|
|
then bind selected exports into the local source scope.
|
|
|
|
Export selection has one intentional aggregator special case:
|
|
|
|
```text
|
|
module with local top-level definitions -> exports only those local definitions
|
|
module with only imports -> reexports the evaluated import env
|
|
```
|
|
|
|
This lets files such as `prelude.tri` act as explicit barrel modules without
|
|
making every ordinary module reexport its imports. A module that defines even one
|
|
local top-level name does not implicitly reexport imported names.
|
|
|
|
The future pipeline should be:
|
|
|
|
```text
|
|
parse source
|
|
resolve imports/names to module exports and content refs
|
|
lower source using resolved refs
|
|
emit a view-tree artifact
|
|
check evidence when requested
|
|
store/export artifacts
|
|
```
|
|
|
|
It should not be:
|
|
|
|
```text
|
|
paste imported ASTs into one file and rewrite strings
|
|
```
|
|
|
|
## 7. Workspace Layer
|
|
|
|
Mutable human-facing state belongs in a workspace layer.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
List -> module hash
|
|
Http -> module hash
|
|
map -> definition/tree hash
|
|
selected List version -> module hash
|
|
package pin prelude -> package/module hash
|
|
```
|
|
|
|
The workspace is where names, selections, pins, and aliases live. Renaming should
|
|
usually mutate workspace aliases, not immutable content objects.
|
|
|
|
This gives humans stable ergonomic names without making names semantic identity.
|
|
|
|
## 8. Definition Identity
|
|
|
|
There are two useful identities and we should support both.
|
|
|
|
### 8.1 Tree identity
|
|
|
|
A Tree Calculus value has a Merkle root hash. This identifies the executable tree
|
|
itself.
|
|
|
|
This is the right identity for:
|
|
|
|
- execution;
|
|
- dedupe;
|
|
- bundle roots;
|
|
- low-level artifact sharing.
|
|
|
|
### 8.2 Module/export identity
|
|
|
|
The module manifest is the higher-level artifact boundary. It pairs each export
|
|
name with its compiled tree term and optional direct View Contract type.
|
|
|
|
The content store should not require extra definition/source/provenance objects,
|
|
and fully untyped Tree Calculus code must remain valid.
|
|
|
|
## 9. View Contract Integration
|
|
|
|
View Contracts should attach to modules/exports as portable artifacts.
|
|
|
|
An imported definition can be assigned a local numeric symbol while lowering a
|
|
typed program. Its global identity remains a content hash or module export ref.
|
|
|
|
This is the intended split:
|
|
|
|
```text
|
|
typed program local symbol: 3
|
|
Debug label: "List.map"
|
|
Resolved object: sha256:...
|
|
Exported view: Fn [...]
|
|
```
|
|
|
|
De Bruijn-style integer symbols are still appropriate inside a typed program. They
|
|
are local evidence identifiers, not global content identity.
|
|
|
|
We should not make global objects depend on numeric checker symbols.
|
|
|
|
Untyped code remains valid with no contract artifact. If a boundary needs to
|
|
participate in checking but has no information, it may use `Any` or rely on
|
|
policy. We should not pretend all untyped functions have an infinite
|
|
`Any -> Any -> ...` contract.
|
|
|
|
## 10. Import Syntax Direction
|
|
|
|
Exact syntax is future work, but the current `!import` form should be considered
|
|
a transitional mechanism.
|
|
|
|
Future imports should distinguish:
|
|
|
|
- path-based source imports for local development;
|
|
- workspace/module alias imports;
|
|
- explicit content-addressed imports;
|
|
- selected/exposed names;
|
|
- qualified versus unqualified binding.
|
|
|
|
Possible directions:
|
|
|
|
```tri
|
|
import "./list.tri" as List
|
|
import List exposing (map foldl)
|
|
import #abc123... as List
|
|
```
|
|
|
|
The syntax should be designed after the object/module model is clearer.
|
|
|
|
## 11. Migration Strategy
|
|
|
|
A plausible migration path:
|
|
|
|
1. Define the neutral object store model and filesystem layout.
|
|
2. Implement Merkle node persistence against that layout.
|
|
3. Add pack/unpack between CAS roots and indexed Arboricx bundles.
|
|
4. Replace ad hoc SQLite `terms` names/tags with workspace aliases or a clearer
|
|
index layer.
|
|
5. Define module manifest objects.
|
|
6. Teach source imports to resolve manifests/exports instead of rewriting ASTs.
|
|
7. Attach View Contract artifacts to module exports.
|
|
8. Gradually migrate existing `lib/` and `demos/` imports.
|
|
|
|
Compatibility shims may keep existing `!import` working during migration.
|
|
|
|
## 12. Open Questions
|
|
|
|
- What exact canonical byte format should store objects use?
|
|
- Should module manifests be binary, tree-encoded, or both?
|
|
- What media type/kind registry do we need first?
|
|
- How should object references be represented in source syntax?
|
|
- How should workspaces be stored and shared?
|
|
- What is the minimum useful module manifest?
|
|
- Should source files compile directly to module manifests, or should manifests
|
|
be produced by explicit package commands?
|
|
- How much Arboricx bundle metadata should reference CAS roots?
|
|
- What GC/reachability model should the store eventually use?
|
|
|
|
## 13. Summary
|
|
|
|
The desired design is:
|
|
|
|
```text
|
|
Content store:
|
|
portable CAS for immutable objects and structural references
|
|
|
|
Arboricx bundle:
|
|
compact indexed transport/execution object
|
|
|
|
Arboricx CAS:
|
|
persistent Merkle DAG/object representation for dedupe and partial reuse
|
|
|
|
Modules:
|
|
immutable manifests mapping export names to content objects and optional
|
|
contracts
|
|
|
|
Workspace:
|
|
mutable human aliases, version selections, and package/module pins
|
|
|
|
View Contracts:
|
|
portable evidence artifacts attached to exports and checked by pure Tree
|
|
Calculus code
|
|
```
|
|
|
|
The key architectural rule is that hashes provide stable identity, while names
|
|
provide human usability. The module system should be built on that separation.
|