359 lines
5.0 KiB
Markdown
359 lines
5.0 KiB
Markdown
# TRICU MERKLE CONTENT STORE — HANDOFF DOC
|
||
|
||
## Objective
|
||
|
||
Replace the current **whole-term content store** with a **Merkle DAG–based content store** for Tree Calculus terms.
|
||
|
||
Goal:
|
||
|
||
* Canonical, cross-language, content-addressed representation
|
||
* Maximal structural deduplication
|
||
* Clean separation of:
|
||
|
||
* identity (hash)
|
||
* storage (nodes)
|
||
* transport (packages)
|
||
* execution (runtime graph)
|
||
|
||
---
|
||
|
||
## Current State (contentstore branch)
|
||
|
||
You currently have:
|
||
|
||
```text
|
||
Term (T)
|
||
-> serializeTerm (Cereal)
|
||
-> sha256(bytes)
|
||
-> store full term blob
|
||
```
|
||
|
||
This is:
|
||
|
||
* canonical at whole-term level
|
||
* NOT deduplicated internally
|
||
* NOT Merkle
|
||
|
||
---
|
||
|
||
## Target Architecture
|
||
|
||
### Core Concept
|
||
|
||
Each Tree Calculus node becomes a content-addressed object:
|
||
|
||
```text
|
||
Leaf:
|
||
hash = H( tag_leaf )
|
||
|
||
Stem:
|
||
hash = H( tag_stem || child_hash )
|
||
|
||
Fork:
|
||
hash = H( tag_fork || left_hash || right_hash )
|
||
```
|
||
|
||
Content store:
|
||
|
||
```text
|
||
Hash -> Node(tag, child_hashes)
|
||
```
|
||
|
||
A program is:
|
||
|
||
```text
|
||
root_hash
|
||
```
|
||
|
||
---
|
||
|
||
## Data Model (Introduce)
|
||
|
||
Define a new canonical node type:
|
||
|
||
```haskell
|
||
data Node
|
||
= NLeaf
|
||
| NStem Hash
|
||
| NFork Hash Hash
|
||
```
|
||
|
||
Define:
|
||
|
||
```haskell
|
||
type Hash = ByteString -- SHA-256
|
||
```
|
||
|
||
---
|
||
|
||
## Canonical Serialization (CRITICAL)
|
||
|
||
Define a **strict, minimal, cross-language spec**:
|
||
|
||
```text
|
||
Node payload:
|
||
Leaf: 0x00
|
||
Stem: 0x01 || child_hash
|
||
Fork: 0x02 || left_hash || right_hash
|
||
|
||
Node hash:
|
||
SHA256( UTF8("tricu.merkle.node.v1") || 0x00 || node_payload )
|
||
|
||
Store:
|
||
node_hash -> node_payload
|
||
```
|
||
|
||
The only thing I would avoid is storing the version inside every node payload unless you need every node to be self-describing. Put it in the hash preimage and in the store/package metadata. That gives versioning without bloating every node.
|
||
|
||
---
|
||
|
||
## Required Invariants
|
||
|
||
These MUST hold:
|
||
|
||
1. **Determinism**
|
||
|
||
```text
|
||
same tree → same hashes everywhere
|
||
```
|
||
|
||
2. **Structural identity**
|
||
|
||
```text
|
||
identical subtrees → identical hashes
|
||
```
|
||
|
||
3. **No dependence on DAG shape**
|
||
|
||
Tree identity must not depend on construction order.
|
||
|
||
4. **Hash correctness**
|
||
|
||
```text
|
||
lookup(hash) -> node
|
||
hash(node) == hash
|
||
```
|
||
|
||
---
|
||
|
||
## Core Functions to Implement
|
||
|
||
### 1. Convert Tree → Merkle DAG
|
||
|
||
```haskell
|
||
buildMerkle :: T -> State Store Hash
|
||
```
|
||
|
||
Behavior:
|
||
|
||
* recursively compute child hashes
|
||
* create Node
|
||
* store if not exists
|
||
* return hash
|
||
|
||
This is the entry point replacing current storage.
|
||
|
||
---
|
||
|
||
### 2. Store Interface
|
||
|
||
```haskell
|
||
putNode :: Node -> StoreM Hash
|
||
getNode :: Hash -> StoreM Node
|
||
```
|
||
|
||
Store layout can be:
|
||
|
||
```text
|
||
/data/<hash>
|
||
```
|
||
|
||
---
|
||
|
||
### 3. Reconstruct Tree (for execution)
|
||
|
||
```haskell
|
||
loadTree :: Hash -> StoreM T
|
||
```
|
||
|
||
Recursive:
|
||
|
||
* fetch node
|
||
* rebuild T
|
||
* optionally cache
|
||
|
||
---
|
||
|
||
### 4. Execution
|
||
|
||
Reuse existing evaluator:
|
||
|
||
```haskell
|
||
eval :: T -> T
|
||
```
|
||
|
||
No change required.
|
||
|
||
---
|
||
|
||
## Phase Plan
|
||
|
||
### Phase 1 — Minimal Merkle Store
|
||
|
||
* Implement Node type
|
||
* Implement canonical serialization
|
||
* Implement `buildMerkle`
|
||
* Replace current `put` logic
|
||
* Add `loadTree`
|
||
|
||
Goal: roundtrip correctness
|
||
|
||
---
|
||
|
||
### Phase 2 — Dedup Verification
|
||
|
||
Add diagnostics:
|
||
|
||
```haskell
|
||
countNodes :: Hash -> Int
|
||
```
|
||
|
||
Test:
|
||
|
||
* repeated structures only stored once
|
||
* identical subtrees share hash
|
||
|
||
---
|
||
|
||
### Phase 3 — Wire Format
|
||
|
||
Define transport:
|
||
|
||
```text
|
||
bundle = compress(
|
||
list of (hash, serialized_node)
|
||
)
|
||
```
|
||
|
||
Implement:
|
||
|
||
```haskell
|
||
exportClosure :: Hash -> Bundle
|
||
importBundle :: Bundle -> StoreM ()
|
||
```
|
||
|
||
---
|
||
|
||
### Phase 4 — Runtime Optimization
|
||
|
||
Optional:
|
||
|
||
* memoized load
|
||
* DAG-preserving runtime
|
||
* step counter in evaluator
|
||
|
||
---
|
||
|
||
## What NOT To Do
|
||
|
||
Do NOT:
|
||
|
||
* hash full trees anymore
|
||
* store serialized `T` directly
|
||
* allow multiple encodings
|
||
* include runtime state in nodes
|
||
* depend on evaluation for hashing
|
||
|
||
---
|
||
|
||
## Testing Requirements
|
||
|
||
Add tests for:
|
||
|
||
### Identity
|
||
|
||
```text
|
||
same term -> same hash
|
||
```
|
||
|
||
### Deduplication
|
||
|
||
```text
|
||
Fork A A stores A once
|
||
```
|
||
|
||
### Roundtrip
|
||
|
||
```text
|
||
T -> hash -> loadTree -> T (equal)
|
||
```
|
||
|
||
### Cross-run stability
|
||
|
||
Hash must not change between runs
|
||
|
||
---
|
||
|
||
## Optional Enhancements
|
||
|
||
Not required for initial implementation:
|
||
|
||
* lazy loading
|
||
* partial fetch (networked store)
|
||
* compression at storage layer
|
||
* typed wrappers
|
||
* DAG-aware evaluator
|
||
|
||
---
|
||
|
||
## Key Insight
|
||
|
||
You are not storing programs anymore.
|
||
|
||
You are storing:
|
||
|
||
```text
|
||
a canonical graph of computation
|
||
```
|
||
|
||
Everything else (execution, wire, language) sits on top.
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
You know this is working when:
|
||
|
||
* identical subtrees collapse globally
|
||
* hashes are stable across runs
|
||
* small programs reuse large portions of structure
|
||
* runtime can reconstruct and execute correctly
|
||
* wire bundles can reconstruct store elsewhere
|
||
|
||
---
|
||
|
||
## Final Mental Model
|
||
|
||
```text
|
||
Authoring: tricu source
|
||
Lowering: Tree Calculus (T)
|
||
|
||
Identity: Merkle hash(root)
|
||
|
||
Storage: Merkle DAG (node store)
|
||
|
||
Wire: compressed node bundles
|
||
|
||
Execution: reconstructed graph → reduce
|
||
```
|
||
|
||
---
|
||
|
||
If anything is unclear during implementation, prioritize:
|
||
|
||
```text
|
||
determinism > simplicity > performance
|
||
```
|
||
|
||
In order to run tests, simply `nix build .#`. All tests must pass without modification.
|