1 Commits

Author SHA1 Message Date
6b97b210ca Full Merkle tree resolution 2026-05-05 14:08:50 -05:00

358
MERKLE.md
View File

@@ -1,358 +0,0 @@
# TRICU MERKLE CONTENT STORE — HANDOFF DOC
## Objective
Replace the current **whole-term content store** with a **Merkle DAGbased content store** for Tree Calculus terms.
Goal:
* Canonical, cross-language, content-addressed representation
* Maximal structural deduplication
* Clean separation of:
* identity (hash)
* storage (nodes)
* transport (packages)
* execution (runtime graph)
---
## Current State (contentstore branch)
You currently have:
```text
Term (T)
-> serializeTerm (Cereal)
-> sha256(bytes)
-> store full term blob
```
This is:
* canonical at whole-term level
* NOT deduplicated internally
* NOT Merkle
---
## Target Architecture
### Core Concept
Each Tree Calculus node becomes a content-addressed object:
```text
Leaf:
hash = H( tag_leaf )
Stem:
hash = H( tag_stem || child_hash )
Fork:
hash = H( tag_fork || left_hash || right_hash )
```
Content store:
```text
Hash -> Node(tag, child_hashes)
```
A program is:
```text
root_hash
```
---
## Data Model (Introduce)
Define a new canonical node type:
```haskell
data Node
= NLeaf
| NStem Hash
| NFork Hash Hash
```
Define:
```haskell
type Hash = ByteString -- SHA-256
```
---
## Canonical Serialization (CRITICAL)
Define a **strict, minimal, cross-language spec**:
```text
Node payload:
Leaf: 0x00
Stem: 0x01 || child_hash
Fork: 0x02 || left_hash || right_hash
Node hash:
SHA256( UTF8("tricu.merkle.node.v1") || 0x00 || node_payload )
Store:
node_hash -> node_payload
```
The only thing I would avoid is storing the version inside every node payload unless you need every node to be self-describing. Put it in the hash preimage and in the store/package metadata. That gives versioning without bloating every node.
---
## Required Invariants
These MUST hold:
1. **Determinism**
```text
same tree → same hashes everywhere
```
2. **Structural identity**
```text
identical subtrees → identical hashes
```
3. **No dependence on DAG shape**
Tree identity must not depend on construction order.
4. **Hash correctness**
```text
lookup(hash) -> node
hash(node) == hash
```
---
## Core Functions to Implement
### 1. Convert Tree → Merkle DAG
```haskell
buildMerkle :: T -> State Store Hash
```
Behavior:
* recursively compute child hashes
* create Node
* store if not exists
* return hash
This is the entry point replacing current storage.
---
### 2. Store Interface
```haskell
putNode :: Node -> StoreM Hash
getNode :: Hash -> StoreM Node
```
Store layout can be:
```text
/data/<hash>
```
---
### 3. Reconstruct Tree (for execution)
```haskell
loadTree :: Hash -> StoreM T
```
Recursive:
* fetch node
* rebuild T
* optionally cache
---
### 4. Execution
Reuse existing evaluator:
```haskell
eval :: T -> T
```
No change required.
---
## Phase Plan
### Phase 1 — Minimal Merkle Store
* Implement Node type
* Implement canonical serialization
* Implement `buildMerkle`
* Replace current `put` logic
* Add `loadTree`
Goal: roundtrip correctness
---
### Phase 2 — Dedup Verification
Add diagnostics:
```haskell
countNodes :: Hash -> Int
```
Test:
* repeated structures only stored once
* identical subtrees share hash
---
### Phase 3 — Wire Format
Define transport:
```text
bundle = compress(
list of (hash, serialized_node)
)
```
Implement:
```haskell
exportClosure :: Hash -> Bundle
importBundle :: Bundle -> StoreM ()
```
---
### Phase 4 — Runtime Optimization
Optional:
* memoized load
* DAG-preserving runtime
* step counter in evaluator
---
## What NOT To Do
Do NOT:
* hash full trees anymore
* store serialized `T` directly
* allow multiple encodings
* include runtime state in nodes
* depend on evaluation for hashing
---
## Testing Requirements
Add tests for:
### Identity
```text
same term -> same hash
```
### Deduplication
```text
Fork A A stores A once
```
### Roundtrip
```text
T -> hash -> loadTree -> T (equal)
```
### Cross-run stability
Hash must not change between runs
---
## Optional Enhancements
Not required for initial implementation:
* lazy loading
* partial fetch (networked store)
* compression at storage layer
* typed wrappers
* DAG-aware evaluator
---
## Key Insight
You are not storing programs anymore.
You are storing:
```text
a canonical graph of computation
```
Everything else (execution, wire, language) sits on top.
---
## Success Criteria
You know this is working when:
* identical subtrees collapse globally
* hashes are stable across runs
* small programs reuse large portions of structure
* runtime can reconstruct and execute correctly
* wire bundles can reconstruct store elsewhere
---
## Final Mental Model
```text
Authoring: tricu source
Lowering: Tree Calculus (T)
Identity: Merkle hash(root)
Storage: Merkle DAG (node store)
Wire: compressed node bundles
Execution: reconstructed graph → reduce
```
---
If anything is unclear during implementation, prioritize:
```text
determinism > simplicity > performance
```
In order to run tests, simply `nix build .#`. All tests must pass without modification.