Tricu 2.0.0

Sorry for squashing all of this but 🤷
This commit is contained in:
2026-05-25 12:43:15 -05:00
parent 2e2db07bd6
commit fdebb6c13d
105 changed files with 10139 additions and 1938 deletions

View File

@@ -0,0 +1,505 @@
# Module System and Content Store Design
Status: design draft.
This document records the intended direction for reworking `tricu` modules,
imports, Arboricx storage/transport, and the content store. It is not an
implementation plan yet; it is a shared design target.
## 1. Problem Statement
The current module/import/content-store system is useful as a prototype, but it
is not coherent enough to build on indefinitely.
Current behavior combines several partially-overlapping systems:
- `!import "path.tri" Namespace` and `!import "path.tri" !Local` perform
filesystem-relative source preprocessing;
- imported definitions are flattened into one program;
- namespace qualification is implemented by string rewriting;
- evaluation uses a flat `Map String T` environment;
- the Haskell content store stores Tree Calculus Merkle nodes plus an ad hoc
`terms` table with comma-separated names and tags;
- the REPL can resolve names from the content store, including multiple versions;
- Arboricx bundles provide compact indexed transport objects;
- `lib/arboricx/server.tri` already sketches a filesystem-backed object store.
This works only when users and maintainers are mindful of sharp edges:
- names serve too many roles at once;
- modules are not first-class semantic objects;
- imports are closer to AST paste-and-prefix than resolution;
- `!Local` imports can create global collisions;
- content identity, human aliases, source files, and evaluated terms are not
cleanly separated;
- the SQLite schema is convenient but not a principled content-addressed store;
- Arboricx transport and long-lived storage are not clearly distinguished.
## 2. Design Principles
### 2.1 Content addressability is foundational
Immutable content should be identified by hashes. Human names should be metadata
or workspace aliases over content, not semantic identity.
This follows the core lesson from systems such as Unison: separate stable
content identity from ergonomic naming and namespace organization.
### 2.2 The content store is language-neutral
The content store must not be married to `tricu` or Haskell.
It stores a small set of portable Arboricx artifacts: module manifests,
complete tree terms, and direct View Contract types. Lower-level Merkle/bundle
formats exist for transport and DAG tooling, but the store core should treat all
objects as content-addressed bytes with formats/media types.
`tricu` and Haskell are clients/tooling. They are not the semantic owners of the
store.
### 2.3 View Contracts are portable enough to integrate
The store may integrate with View Contracts because the checker and evidence
format are pure Tree Calculus / portable tree data. View Contracts are not a
Haskell-private or `tricu`-private semantic layer.
The module resolver may emit typed-program evidence, but checker semantics remain
unchanged:
```text
Haskell emits evidence.
tricu judges evidence.
```
### 2.4 Modules should reflect definitions as they actually exist
The module system should conform to the reality of content-addressed immutable
artifacts and mutable human aliases. We should not contort definitions to fit a
traditional text-file module system if that fights the storage model.
### 2.5 Transport and storage are different jobs
Indexed Arboricx bundles are excellent transport/execution objects. Merkle DAGs
are better long-lived persistence objects. These should remain separate but
interoperable representations.
## 3. Conceptual Architecture
```text
Content Store
neutral content-addressed object store
Arboricx CAS / Merkle Store
Tree Calculus node/object formats suitable for persistence and dedupe
Arboricx Bundle
compact indexed transport/execution format
View Contract Artifact
portable evidence/checker data over tree artifacts
Module Manifest
immutable export map from names to content objects and optional contracts
Workspace
mutable aliases, selected versions, package pins, and user-facing names
tricu
one frontend/toolchain that emits/consumes these portable artifacts
```
The content store stores objects. Arboricx defines important object formats.
View Contracts define portable checking artifacts. `tricu` produces and consumes
those formats.
### 3.1 Execution imports versus contract checking
Import resolution has two intentionally different performance profiles.
For normal execution/evaluation, resolving a module import should hydrate only
the executable exports directly demanded by the importing source. Exported Tree
Calculus values are complete normal forms: importing `foo` does not require
hydrating separate `bar` or `baz` exports that may have helped build it. This is
the fast path for `!import`, including `!Local` imports.
View Contract checking is a separate evidence-gathering path. It may load
exported direct view types for the symbols that participate in a check. That
slower path must remain behind the typed program boundary:
```text
Haskell emits evidence.
tricu judges evidence.
```
Reusable view catalogs are ordinary tricu libraries/tree terms, not a separate
core CAS artifact kind.
For locally built workspace modules, advertised direct export views are
producer-checked before the manifest alias is written. Producer checking includes
advertised views from any imported modules used by that source, so a module
cannot publish a local annotated export that contradicts a dependency's exported
view. If producer checking fails, the module alias is not written.
Consumer checking then resolves selected module exports, decodes their
`arboricx.view-contract.type.v1` refs, and emits trusted `KnownView` evidence
for the local imported symbols. Those facts are module-boundary assumptions:
local workspace builds create them after producer-side checking, while external
or prebuilt manifests are trusted inputs for now. In all cases, compatibility
with local requirements is still judged by the portable checker in `lib/view.tri`.
## 4. Content Store Direction
### 4.1 Store core
The store core should be a content-addressed object store:
```text
hash -> object bytes
hash -> object kind / media type
hash -> optional metadata/index entries
```
The hash should be over canonical bytes with domain separation. The object kind
or media type determines how a client interprets those bytes.
Current module/check object kinds:
```text
arboricx.module-manifest.v1
arboricx.tree-term.v1
arboricx.view-contract.type.v1
```
Merkle nodes and indexed bundles remain lower-level Arboricx transport/DAG
formats, but they are not the module/eval storage model. typed programs and view
catalogs are ordinary tree terms unless a future external tooling use case proves
that they need their own object kind.
The store core should not need to know what a `tricu` definition means.
### 4.2 Filesystem-backed layout
The long-term store should converge with the direction already sketched in
`lib/arboricx/server.tri`:
```text
store/
objects/
abc/
abc123...object
aliases/
names/
modules/
packages/
manifests/
tmp/
```
SQLite may remain useful as an optional index/cache, but it should not be the
canonical store model.
### 4.3 Structural references, not language dependencies
The store may understand structural content references when they are part of an
object format. For example, a Merkle node naturally references child hashes:
```text
Leaf
Stem childHash
Fork leftHash rightHash
```
This is not a `tricu` dependency graph. It is content structure.
Language/tool-level relationships such as "compiled from source", "exported by
module", or "checked with contract" can live in manifests or indexes. They
should not be required by the store core.
## 5. Arboricx Role
Arboricx should be understood as a family of portable Tree Calculus artifact
formats, not as a single storage mechanism.
### 5.1 Arboricx Bundle
The existing indexed `.arboricx` format remains the preferred transport and
execution object:
- compact;
- self-contained;
- deterministic;
- easy to parse in constrained runtimes;
- suitable for deployment and HTTP serving;
- structurally verifiable without hash recomputation per node.
It says:
```text
Here is everything you need, densely packed.
```
### 5.2 Arboricx CAS / Merkle Store
The persistent store should use content-addressed structural objects:
```text
Leaf
Stem childHash
Fork leftHash rightHash
```
This enables dedupe across definitions, modules, packages, and versions. A large
program that shares subtrees with other programs should not store those subtrees
multiple times.
It says:
```text
Here are immutable objects, addressable independently.
```
### 5.3 Pack and unpack
Transport and storage should interoperate explicitly:
```text
CAS root(s) -> pack -> indexed Arboricx bundle
Arboricx bundle -> unpack -> CAS root(s)
```
The bundle can be treated as an opaque content-addressed blob by the store, and
it can also be unpacked into Merkle nodes for dedupe and partial reuse.
## 6. Modules
### 6.1 Module identity
A module should be an immutable manifest object. Its identity is the hash of its
canonical manifest bytes.
A module name is not identity. It is a workspace alias or package-level alias to
a module hash.
### 6.2 Module contents
A module manifest should primarily be an export map:
```text
module hash
exports:
name -> content reference
metadata:
package
version
description
license
createdBy
optional:
view contract artifact refs
ABI/media type info
source/provenance refs
```
The manifest should be portable and mostly format-oriented. It should not depend
on Haskell data structures or `tricu`-specific internal semantics.
### 6.3 Export entries
An export entry may eventually look conceptually like:
```text
name: "map"
object: sha256:...
kind: arboricx.tree-term.v1
abi: arboricx.abi.tree.v1
view: sha256:... -- optional View Contract artifact
source: sha256:... -- optional source/provenance object
```
Executable module exports are complete normalized tree terms stored as one
`arboricx.tree-term.v1` object per named export. Merkle-node storage remains
available for DAG-oriented tooling, but module/eval imports should not store or
hydrate every subtree as a separate filesystem object.
### 6.4 Import behavior
Imports should resolve module aliases or content references to module manifests,
then bind selected exports into the local source scope.
Export selection has one intentional aggregator special case:
```text
module with local top-level definitions -> exports only those local definitions
module with only imports -> reexports the evaluated import env
```
This lets files such as `prelude.tri` act as explicit barrel modules without
making every ordinary module reexport its imports. A module that defines even one
local top-level name does not implicitly reexport imported names.
The future pipeline should be:
```text
parse source
resolve imports/names to module exports and content refs
lower source using resolved refs
emit a view-tree artifact
check evidence when requested
store/export artifacts
```
It should not be:
```text
paste imported ASTs into one file and rewrite strings
```
## 7. Workspace Layer
Mutable human-facing state belongs in a workspace layer.
Examples:
```text
List -> module hash
Http -> module hash
map -> definition/tree hash
selected List version -> module hash
package pin prelude -> package/module hash
```
The workspace is where names, selections, pins, and aliases live. Renaming should
usually mutate workspace aliases, not immutable content objects.
This gives humans stable ergonomic names without making names semantic identity.
## 8. Definition Identity
There are two useful identities and we should support both.
### 8.1 Tree identity
A Tree Calculus value has a Merkle root hash. This identifies the executable tree
itself.
This is the right identity for:
- execution;
- dedupe;
- bundle roots;
- low-level artifact sharing.
### 8.2 Module/export identity
The module manifest is the higher-level artifact boundary. It pairs each export
name with its compiled tree term and optional direct View Contract type.
The content store should not require extra definition/source/provenance objects,
and fully untyped Tree Calculus code must remain valid.
## 9. View Contract Integration
View Contracts should attach to modules/exports as portable artifacts.
An imported definition can be assigned a local numeric symbol while lowering a
typed program. Its global identity remains a content hash or module export ref.
This is the intended split:
```text
typed program local symbol: 3
Debug label: "List.map"
Resolved object: sha256:...
Exported view: Fn [...]
```
De Bruijn-style integer symbols are still appropriate inside a typed program. They
are local evidence identifiers, not global content identity.
We should not make global objects depend on numeric checker symbols.
Untyped code remains valid with no contract artifact. If a boundary needs to
participate in checking but has no information, it may use `Any` or rely on
policy. We should not pretend all untyped functions have an infinite
`Any -> Any -> ...` contract.
## 10. Import Syntax Direction
Exact syntax is future work, but the current `!import` form should be considered
a transitional mechanism.
Future imports should distinguish:
- path-based source imports for local development;
- workspace/module alias imports;
- explicit content-addressed imports;
- selected/exposed names;
- qualified versus unqualified binding.
Possible directions:
```tri
import "./list.tri" as List
import List exposing (map foldl)
import #abc123... as List
```
The syntax should be designed after the object/module model is clearer.
## 11. Migration Strategy
A plausible migration path:
1. Define the neutral object store model and filesystem layout.
2. Implement Merkle node persistence against that layout.
3. Add pack/unpack between CAS roots and indexed Arboricx bundles.
4. Replace ad hoc SQLite `terms` names/tags with workspace aliases or a clearer
index layer.
5. Define module manifest objects.
6. Teach source imports to resolve manifests/exports instead of rewriting ASTs.
7. Attach View Contract artifacts to module exports.
8. Gradually migrate existing `lib/` and `demos/` imports.
Compatibility shims may keep existing `!import` working during migration.
## 12. Open Questions
- What exact canonical byte format should store objects use?
- Should module manifests be binary, tree-encoded, or both?
- What media type/kind registry do we need first?
- How should object references be represented in source syntax?
- How should workspaces be stored and shared?
- What is the minimum useful module manifest?
- Should source files compile directly to module manifests, or should manifests
be produced by explicit package commands?
- How much Arboricx bundle metadata should reference CAS roots?
- What GC/reachability model should the store eventually use?
## 13. Summary
The desired design is:
```text
Content store:
portable CAS for immutable objects and structural references
Arboricx bundle:
compact indexed transport/execution object
Arboricx CAS:
persistent Merkle DAG/object representation for dedupe and partial reuse
Modules:
immutable manifests mapping export names to content objects and optional
contracts
Workspace:
mutable human aliases, version selections, and package/module pins
View Contracts:
portable evidence artifacts attached to exports and checked by pure Tree
Calculus code
```
The key architectural rule is that hashes provide stable identity, while names
provide human usability. The module system should be built on that separation.