597 lines
15 KiB
Markdown
597 lines
15 KiB
Markdown
# Content Store and Module Format Design
|
|
|
|
Status: concrete design draft.
|
|
|
|
This document narrows the higher-level module-system direction into concrete
|
|
format and storage decisions. It intentionally avoids source/provenance details:
|
|
modules export usable portable artifacts, not edit history.
|
|
|
|
Related design overview: `docs/module-system-design.md`.
|
|
|
|
## 1. Scope
|
|
|
|
This document specifies the first target shape for:
|
|
|
|
- a neutral filesystem-backed content-addressed store;
|
|
- Arboricx Merkle node persistence;
|
|
- indexed Arboricx bundle import/export as transport;
|
|
- module manifests as immutable export maps;
|
|
- workspace aliases as mutable human-facing references;
|
|
- View Contract artifact attachment to module exports.
|
|
|
|
It does not specify:
|
|
|
|
- package manager semantics;
|
|
- dependency solving;
|
|
- source-level rebuild/provenance metadata;
|
|
- final import syntax;
|
|
- garbage collection;
|
|
- registry/sync protocol.
|
|
|
|
## 2. Non-Negotiable Boundaries
|
|
|
|
The content store is not `tricu`-specific and is not Haskell-specific.
|
|
|
|
The store may contain objects produced by `tricu`, Haskell, Tree Calculus tools,
|
|
Arboricx tooling, or future frontends. The store core only knows object bytes,
|
|
object kinds, hashes, aliases, and optionally structural references for known
|
|
portable formats.
|
|
|
|
View Contracts may be first-class artifact references because they are portable
|
|
Tree Calculus data checked by pure Tree Calculus code. They are not
|
|
Haskell-private semantics.
|
|
|
|
Source and build provenance are intentionally excluded from the first module
|
|
manifest format. A module manifest answers:
|
|
|
|
```text
|
|
What portable artifacts does this module export, and what portable contracts are
|
|
paired with them?
|
|
```
|
|
|
|
It does not answer:
|
|
|
|
```text
|
|
Which source file, parser, frontend, or build command produced these artifacts?
|
|
```
|
|
|
|
## 3. Hashing Convention
|
|
|
|
Objects are content-addressed by SHA-256 over domain-separated canonical bytes.
|
|
|
|
General rule:
|
|
|
|
```text
|
|
hash = SHA256(domainUtf8 || 0x00 || canonicalPayloadBytes)
|
|
```
|
|
|
|
This matches the existing Merkle node convention in `Research.nodeHash`:
|
|
|
|
```text
|
|
SHA256("arboricx.merkle.node.v1" || 0x00 || nodePayload)
|
|
```
|
|
|
|
The domain string is part of the object format. It prevents identical payload
|
|
bytes in different formats from accidentally sharing identity.
|
|
|
|
Hashes are represented externally as 64 lowercase hexadecimal characters.
|
|
|
|
## 4. Filesystem Store Layout
|
|
|
|
The canonical filesystem store layout is:
|
|
|
|
```text
|
|
store/
|
|
objects/
|
|
abc/
|
|
abc123... -- object bytes, sharded by first 3 hex chars
|
|
aliases/
|
|
names/
|
|
modules/
|
|
packages/
|
|
manifests/
|
|
tmp/
|
|
```
|
|
|
|
The three-character shard follows the existing `lib/arboricx/server.tri`
|
|
convention.
|
|
|
|
### 4.1 Object paths
|
|
|
|
For object hash:
|
|
|
|
```text
|
|
abc123...
|
|
```
|
|
|
|
object bytes live at:
|
|
|
|
```text
|
|
store/objects/abc/abc123...
|
|
```
|
|
|
|
The object filename is the full hash. The shard directory is the first three hex
|
|
characters.
|
|
|
|
### 4.2 Atomic writes
|
|
|
|
Writers should use:
|
|
|
|
```text
|
|
store/tmp/<hash>.<nonce>.tmp
|
|
```
|
|
|
|
then atomically rename into:
|
|
|
|
```text
|
|
store/objects/<shard>/<hash>
|
|
```
|
|
|
|
Writing an existing object is idempotent if the existing bytes match the hash.
|
|
|
|
### 4.3 Store core metadata
|
|
|
|
The minimal filesystem store does not require sidecar metadata for every object.
|
|
Object kind can be known by context or by manifest references.
|
|
|
|
A later index may cache:
|
|
|
|
```text
|
|
hash -> kind
|
|
hash -> size
|
|
hash -> references
|
|
hash -> createdAt
|
|
```
|
|
|
|
but this index is not semantic identity.
|
|
|
|
## 5. Arboricx Merkle Node Object Format
|
|
|
|
The persistent Tree Calculus representation is a Merkle DAG of node objects.
|
|
|
|
Domain:
|
|
|
|
```text
|
|
arboricx.merkle.node.v1
|
|
```
|
|
|
|
Canonical payloads:
|
|
|
|
```text
|
|
Leaf = 0x00
|
|
Stem child = 0x01 || childHashRaw32
|
|
Fork left right
|
|
= 0x02 || leftHashRaw32 || rightHashRaw32
|
|
```
|
|
|
|
Where `childHashRaw32`, `leftHashRaw32`, and `rightHashRaw32` are the raw 32-byte
|
|
SHA-256 digests corresponding to child node hashes.
|
|
|
|
This is already implemented conceptually by:
|
|
|
|
```text
|
|
Research.Node
|
|
Research.serializeNode
|
|
Research.deserializeNode
|
|
Research.nodeHash
|
|
```
|
|
|
|
The filesystem CAS should use this payload/hash convention directly.
|
|
|
|
## 6. Tree Roots
|
|
|
|
A Tree Calculus value stored in the CAS is identified by the hash of its root
|
|
Merkle node.
|
|
|
|
```text
|
|
treeRootHash = hash(rootNodePayload)
|
|
```
|
|
|
|
The complete tree is reconstructed by recursively loading node objects reachable
|
|
from the root.
|
|
|
|
Hydration is an interpretation step, not part of object identity. A client may
|
|
hydrate a root as a plain tree, a graph with explicit sharing, or another runtime
|
|
representation as long as the observable Tree Calculus value is the same. The
|
|
filesystem CAS provides structural dedupe and portable identity; it does not by
|
|
itself guarantee that a hydrated runtime value is the cheapest representation for
|
|
all workloads.
|
|
|
|
Merkle nodes are useful for explicit DAG-oriented tooling, audit, and bundle
|
|
packing. They are not the default representation for module executable exports:
|
|
storing every subtree as a separate filesystem object is pathologically slow for
|
|
large normal forms.
|
|
|
|
For module-backed evaluation and imports, a complete normalized named term is
|
|
stored as one canonical object:
|
|
|
|
```text
|
|
kind: arboricx.tree-term.v1
|
|
hash: <whole-term object hash>
|
|
abi: arboricx.abi.tree.v1
|
|
```
|
|
|
|
The `arboricx.tree-term.v1` payload is a prefix encoding:
|
|
|
|
```text
|
|
Leaf = 0x00
|
|
Stem t = 0x01 Tree
|
|
Fork l r = 0x02 Tree Tree
|
|
```
|
|
|
|
## 7. Arboricx Indexed Bundles
|
|
|
|
Indexed `.arboricx` bundles remain the transport/execution format.
|
|
|
|
They are:
|
|
|
|
- compact;
|
|
- self-contained;
|
|
- deterministic;
|
|
- suitable for restricted runtimes;
|
|
- suitable for HTTP serving and deployment.
|
|
|
|
They are not the canonical long-lived deduplicated store representation.
|
|
|
|
### 7.1 Pack
|
|
|
|
Packing converts one or more CAS tree roots into an indexed bundle:
|
|
|
|
```text
|
|
CAS tree roots -> indexed Arboricx bundle
|
|
```
|
|
|
|
The packer traverses reachable Merkle nodes, emits a compact indexed node table,
|
|
and writes a bundle manifest with export names and root indices.
|
|
|
|
### 7.3 Unpack
|
|
|
|
Unpacking converts a bundle into CAS nodes:
|
|
|
|
```text
|
|
indexed Arboricx bundle -> CAS tree roots
|
|
```
|
|
|
|
The unpacker verifies the bundle structure, reconstructs each exported tree, and
|
|
stores the corresponding Merkle nodes. It returns the tree root hash for each
|
|
bundle export.
|
|
|
|
## 8. Module Manifest v1
|
|
|
|
A module is an immutable manifest object. The module identity is the hash of its
|
|
canonical manifest bytes.
|
|
|
|
A module name is not identity. It is a workspace alias to a module manifest hash.
|
|
|
|
### 8.1 Domain
|
|
|
|
Proposed domain:
|
|
|
|
```text
|
|
arboricx.module-manifest.v1
|
|
```
|
|
|
|
### 8.2 Purpose
|
|
|
|
A module manifest pairs human-facing export names with portable content objects
|
|
and optional portable contracts.
|
|
|
|
It exists to support:
|
|
|
|
- reproducible import resolution;
|
|
- executable export discovery;
|
|
- View Contract lookup for imported symbols;
|
|
- module-to-module reference tracking;
|
|
- transport/store interop.
|
|
|
|
It does not describe source provenance.
|
|
|
|
### 8.3 Conceptual shape
|
|
|
|
```text
|
|
moduleManifestV1:
|
|
imports:
|
|
- alias: <text>
|
|
kind: <object kind>
|
|
hash: <object hash>
|
|
|
|
exports:
|
|
- name: <text>
|
|
object:
|
|
kind: <object kind>
|
|
hash: <object hash>
|
|
abi: <abi identifier>
|
|
view: optional
|
|
kind: <view artifact kind>
|
|
hash: <view artifact hash>
|
|
catalog: optional
|
|
kind: <view catalog kind>
|
|
hash: <view catalog hash>
|
|
|
|
metadata: optional human-facing fields
|
|
```
|
|
|
|
### 8.4 Imports/references
|
|
|
|
The `imports` section is a manifest reference graph, not a store-level language
|
|
dependency graph.
|
|
|
|
Each entry records direct content-addressed references used by the module:
|
|
|
|
```text
|
|
alias: Prelude
|
|
kind: arboricx.module-manifest.v1
|
|
hash: <module hash>
|
|
```
|
|
|
|
This supports reproducibility, partial fetch, and audit. The content store core
|
|
stores this object but does not need to understand `Prelude` or import
|
|
semantics.
|
|
|
|
### 8.5 Exports
|
|
|
|
Each export is a record, not a single hash. This is required so executable
|
|
objects and advertised contracts cannot drift apart.
|
|
|
|
Minimal executable export:
|
|
|
|
```text
|
|
name: "id"
|
|
object:
|
|
kind: arboricx.tree-term.v1
|
|
hash: <whole-term hash>
|
|
abi: arboricx.abi.tree.v1
|
|
```
|
|
|
|
Export with View Contract:
|
|
|
|
```text
|
|
name: "map"
|
|
object:
|
|
kind: arboricx.tree-term.v1
|
|
hash: <whole-term hash>
|
|
abi: arboricx.abi.tree.v1
|
|
view:
|
|
kind: arboricx.view-contract.type.v1
|
|
hash: <view type hash>
|
|
```
|
|
|
|
The manifest preserves the pairing between exported executable and exported
|
|
contract. For workspace modules built from local source, annotated exports are
|
|
checked before the manifest is published; only exports that pass producer-side
|
|
View Contract checking receive direct `arboricx.view-contract.type.v1` refs.
|
|
|
|
### 8.6 Metadata
|
|
|
|
Metadata is optional and human-facing. Initial fields may include:
|
|
|
|
```text
|
|
package
|
|
version
|
|
description
|
|
license
|
|
createdBy
|
|
```
|
|
|
|
Metadata is not source provenance and is not required for execution or checking.
|
|
|
|
## 9. View Contract Artifacts
|
|
|
|
View Contract artifacts are portable Arboricx-layer data. They may be stored
|
|
as content objects and referenced by module exports. `tricu` may emit these
|
|
objects, but the object kind is not tricu-specific.
|
|
|
|
Current artifact kind:
|
|
|
|
```text
|
|
arboricx.view-contract.type.v1
|
|
```
|
|
|
|
`arboricx.view-contract.type.v1` is the direct export-view artifact. Its
|
|
payload is a canonical prefix binary encoding of the syntactic ViewType:
|
|
|
|
```text
|
|
Name = 0x00 u32be(byte-length) utf8-name
|
|
Ref = 0x01 u32be(byte-length) utf8-ref
|
|
List = 0x02 ViewType
|
|
Maybe = 0x03 ViewType
|
|
Pair = 0x04 ViewType ViewType
|
|
Result = 0x05 ViewType ViewType
|
|
Fn = 0x06 u32be(argument-count) ViewType* ViewType
|
|
```
|
|
|
|
`utf8-ref` is tagged text:
|
|
|
|
```text
|
|
i:<decimal-integer> numeric/legacy ref
|
|
s:<text> symbolic user ref
|
|
```
|
|
|
|
Symbolic refs are the preferred user-authored form; numeric refs remain useful
|
|
for generated code, fixtures, and old low-level examples.
|
|
|
|
The object hash domain is the object kind:
|
|
|
|
```text
|
|
arboricx.view-contract.type.v1 \0 <payload>
|
|
```
|
|
|
|
### 9.1 Export-level pairing
|
|
|
|
The module manifest is the canonical pairing of an executable export and its
|
|
advertised contract:
|
|
|
|
```text
|
|
export name -> tree-term hash + optional view artifact hash
|
|
```
|
|
|
|
This avoids drift such as:
|
|
|
|
```text
|
|
map -> tree A
|
|
map.view -> contract B
|
|
```
|
|
|
|
where aliases might be retargeted independently.
|
|
|
|
### 9.2 Import checking
|
|
|
|
When a source file imports a module, a frontend can resolve an imported export,
|
|
decode its direct `arboricx.view-contract.type.v1` ref, and emit typed program
|
|
evidence locally:
|
|
|
|
```text
|
|
imported List.map has view Fn [...]
|
|
```
|
|
|
|
For locally built workspace modules this is backed by producer-side checking
|
|
before the module manifest alias is published, including imported view facts from
|
|
dependencies used by the producer source. External or prebuilt manifests are
|
|
trusted boundary declarations for now; they are not accompanied by proof objects.
|
|
The checker still consumes only local numeric symbols and typed-program evidence.
|
|
Global content hashes do not become checker symbols.
|
|
|
|
Correct split:
|
|
|
|
```text
|
|
local checker symbol: 3
|
|
presentation label: "List.map"
|
|
resolved object: sha256:...
|
|
exported view: Fn [...]
|
|
```
|
|
|
|
### 9.3 Execution hydration versus contract evidence
|
|
|
|
Execution imports should use a narrow, demand-driven path:
|
|
|
|
```text
|
|
module import -> selected executable exports -> hydrate selected tree-term objects
|
|
```
|
|
|
|
This path should not compute a dependency closure over other module exports.
|
|
Each selected executable export is already a complete Tree Calculus value.
|
|
|
|
Contract-aware checking may use a broader path:
|
|
|
|
```text
|
|
module import -> selected exports -> exported view type refs -> typed-program evidence
|
|
```
|
|
|
|
That path emits portable evidence and leaves compatibility policy decisions to
|
|
the Tree Calculus checker. typed programs and reusable catalogs do not need their
|
|
own binary object kinds today: they are ordinary Tree Calculus data and can be
|
|
stored as `arboricx.tree-term.v1` when persistence is useful.
|
|
|
|
## 10. Workspace Aliases
|
|
|
|
A workspace is mutable human-facing state over immutable content.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
List -> module manifest hash
|
|
Prelude -> module manifest hash
|
|
map -> tree-term hash
|
|
httpServer -> bundle hash
|
|
```
|
|
|
|
Aliases should live under:
|
|
|
|
```text
|
|
store/aliases/
|
|
```
|
|
|
|
Initial categories:
|
|
|
|
```text
|
|
store/aliases/modules/<name>
|
|
store/aliases/names/<name>
|
|
store/aliases/packages/<name>
|
|
```
|
|
|
|
Alias file contents should be simple and explicit, for example:
|
|
|
|
```text
|
|
kind: arboricx.module-manifest.v1
|
|
hash: abc123...
|
|
```
|
|
|
|
Exact encoding can be decided with the first implementation. The important rule
|
|
is that aliases are mutable pointers, not content identity.
|
|
|
|
## 11. Existing Convention Alignment
|
|
|
|
This design intentionally preserves existing conventions where they already fit:
|
|
|
|
- SHA-256 domain-separated Merkle node hashing;
|
|
- `Leaf` / `Stem` / `Fork` node payload tags `0x00`, `0x01`, `0x02`;
|
|
- three-character object sharding from `lib/arboricx/server.tri`;
|
|
- indexed Arboricx bundles as compact transport objects;
|
|
- optional human-facing export names in manifests;
|
|
- View Contract checker evidence as portable Tree Calculus data.
|
|
|
|
It replaces or demotes conventions that do not fit:
|
|
|
|
- SQLite `terms.names` comma-separated aliases become workspace aliases/indexes;
|
|
- SQLite `terms.tags` comma-separated tags become optional metadata/indexes;
|
|
- file imports as AST flattening become transitional behavior;
|
|
- names cease to be semantic identity.
|
|
|
|
## 12. Implementation Sketch
|
|
|
|
A staged implementation can proceed as follows:
|
|
|
|
1. Add filesystem CAS helpers alongside the existing SQLite store.
|
|
2. Store/load Arboricx Merkle nodes using the filesystem layout.
|
|
3. Implement tree-term storage and reconstruction from filesystem CAS.
|
|
4. Implement pack from CAS tree terms/Merkle roots to indexed Arboricx bundle.
|
|
5. Implement unpack from indexed Arboricx bundle to CAS tree terms/Merkle roots.
|
|
6. Define a concrete module manifest encoding.
|
|
7. Store/load module manifests as content-addressed objects.
|
|
8. Add workspace alias read/write helpers.
|
|
9. Teach import resolution to target module manifests/exports.
|
|
10. Attach exported View Contract artifacts to module exports.
|
|
11. Gradually migrate existing `!import` users.
|
|
|
|
## 13. Deferred Decisions
|
|
|
|
These are intentionally left out of the first concrete format:
|
|
|
|
- package version solving;
|
|
- registry/remotes protocol;
|
|
- garbage collection/reachability;
|
|
- source/provenance/build-record objects;
|
|
- editor/update workflows;
|
|
- rich visibility/export rules;
|
|
- final import syntax;
|
|
- whether module manifests also need a tree-native encoding.
|
|
|
|
## 14. Summary
|
|
|
|
The concrete v1 direction is:
|
|
|
|
```text
|
|
Store:
|
|
filesystem-backed content-addressed objects
|
|
|
|
Hashing:
|
|
SHA256(domain || 0x00 || canonical payload)
|
|
|
|
Tree persistence:
|
|
Arboricx Merkle nodes
|
|
|
|
Transport:
|
|
indexed .arboricx bundles, packable from and unpackable to CAS roots
|
|
|
|
Modules:
|
|
immutable manifests pairing export names with object refs and optional View
|
|
Contract refs
|
|
|
|
Workspace:
|
|
mutable aliases from human names to immutable content hashes
|
|
```
|
|
|
|
This keeps the store portable, preserves Arboricx's compact transport role,
|
|
restores Merkle DAGs as the persistence model, and gives View Contracts a stable
|
|
module/export attachment point without making the store `tricu`-specific.
|