Compare commits
2 Commits
contentsto
...
dea4e986d3
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
dea4e986d3 | ||
|
|
c25170ebd5 |
358
MERKLE.md
Normal file
358
MERKLE.md
Normal file
@@ -0,0 +1,358 @@
|
||||
# TRICU MERKLE CONTENT STORE — HANDOFF DOC
|
||||
|
||||
## Objective
|
||||
|
||||
Replace the current **whole-term content store** with a **Merkle DAG–based content store** for Tree Calculus terms.
|
||||
|
||||
Goal:
|
||||
|
||||
* Canonical, cross-language, content-addressed representation
|
||||
* Maximal structural deduplication
|
||||
* Clean separation of:
|
||||
|
||||
* identity (hash)
|
||||
* storage (nodes)
|
||||
* transport (packages)
|
||||
* execution (runtime graph)
|
||||
|
||||
---
|
||||
|
||||
## Current State (contentstore branch)
|
||||
|
||||
You currently have:
|
||||
|
||||
```text
|
||||
Term (T)
|
||||
-> serializeTerm (Cereal)
|
||||
-> sha256(bytes)
|
||||
-> store full term blob
|
||||
```
|
||||
|
||||
This is:
|
||||
|
||||
* canonical at whole-term level
|
||||
* NOT deduplicated internally
|
||||
* NOT Merkle
|
||||
|
||||
---
|
||||
|
||||
## Target Architecture
|
||||
|
||||
### Core Concept
|
||||
|
||||
Each Tree Calculus node becomes a content-addressed object:
|
||||
|
||||
```text
|
||||
Leaf:
|
||||
hash = H( tag_leaf )
|
||||
|
||||
Stem:
|
||||
hash = H( tag_stem || child_hash )
|
||||
|
||||
Fork:
|
||||
hash = H( tag_fork || left_hash || right_hash )
|
||||
```
|
||||
|
||||
Content store:
|
||||
|
||||
```text
|
||||
Hash -> Node(tag, child_hashes)
|
||||
```
|
||||
|
||||
A program is:
|
||||
|
||||
```text
|
||||
root_hash
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Model (Introduce)
|
||||
|
||||
Define a new canonical node type:
|
||||
|
||||
```haskell
|
||||
data Node
|
||||
= NLeaf
|
||||
| NStem Hash
|
||||
| NFork Hash Hash
|
||||
```
|
||||
|
||||
Define:
|
||||
|
||||
```haskell
|
||||
type Hash = ByteString -- SHA-256
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Canonical Serialization (CRITICAL)
|
||||
|
||||
Define a **strict, minimal, cross-language spec**:
|
||||
|
||||
```text
|
||||
Node payload:
|
||||
Leaf: 0x00
|
||||
Stem: 0x01 || child_hash
|
||||
Fork: 0x02 || left_hash || right_hash
|
||||
|
||||
Node hash:
|
||||
SHA256( UTF8("tricu.merkle.node.v1") || 0x00 || node_payload )
|
||||
|
||||
Store:
|
||||
node_hash -> node_payload
|
||||
```
|
||||
|
||||
The only thing I would avoid is storing the version inside every node payload unless you need every node to be self-describing. Put it in the hash preimage and in the store/package metadata. That gives versioning without bloating every node.
|
||||
|
||||
---
|
||||
|
||||
## Required Invariants
|
||||
|
||||
These MUST hold:
|
||||
|
||||
1. **Determinism**
|
||||
|
||||
```text
|
||||
same tree → same hashes everywhere
|
||||
```
|
||||
|
||||
2. **Structural identity**
|
||||
|
||||
```text
|
||||
identical subtrees → identical hashes
|
||||
```
|
||||
|
||||
3. **No dependence on DAG shape**
|
||||
|
||||
Tree identity must not depend on construction order.
|
||||
|
||||
4. **Hash correctness**
|
||||
|
||||
```text
|
||||
lookup(hash) -> node
|
||||
hash(node) == hash
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Core Functions to Implement
|
||||
|
||||
### 1. Convert Tree → Merkle DAG
|
||||
|
||||
```haskell
|
||||
buildMerkle :: T -> State Store Hash
|
||||
```
|
||||
|
||||
Behavior:
|
||||
|
||||
* recursively compute child hashes
|
||||
* create Node
|
||||
* store if not exists
|
||||
* return hash
|
||||
|
||||
This is the entry point replacing current storage.
|
||||
|
||||
---
|
||||
|
||||
### 2. Store Interface
|
||||
|
||||
```haskell
|
||||
putNode :: Node -> StoreM Hash
|
||||
getNode :: Hash -> StoreM Node
|
||||
```
|
||||
|
||||
Store layout can be:
|
||||
|
||||
```text
|
||||
/data/<hash>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Reconstruct Tree (for execution)
|
||||
|
||||
```haskell
|
||||
loadTree :: Hash -> StoreM T
|
||||
```
|
||||
|
||||
Recursive:
|
||||
|
||||
* fetch node
|
||||
* rebuild T
|
||||
* optionally cache
|
||||
|
||||
---
|
||||
|
||||
### 4. Execution
|
||||
|
||||
Reuse existing evaluator:
|
||||
|
||||
```haskell
|
||||
eval :: T -> T
|
||||
```
|
||||
|
||||
No change required.
|
||||
|
||||
---
|
||||
|
||||
## Phase Plan
|
||||
|
||||
### Phase 1 — Minimal Merkle Store
|
||||
|
||||
* Implement Node type
|
||||
* Implement canonical serialization
|
||||
* Implement `buildMerkle`
|
||||
* Replace current `put` logic
|
||||
* Add `loadTree`
|
||||
|
||||
Goal: roundtrip correctness
|
||||
|
||||
---
|
||||
|
||||
### Phase 2 — Dedup Verification
|
||||
|
||||
Add diagnostics:
|
||||
|
||||
```haskell
|
||||
countNodes :: Hash -> Int
|
||||
```
|
||||
|
||||
Test:
|
||||
|
||||
* repeated structures only stored once
|
||||
* identical subtrees share hash
|
||||
|
||||
---
|
||||
|
||||
### Phase 3 — Wire Format
|
||||
|
||||
Define transport:
|
||||
|
||||
```text
|
||||
bundle = compress(
|
||||
list of (hash, serialized_node)
|
||||
)
|
||||
```
|
||||
|
||||
Implement:
|
||||
|
||||
```haskell
|
||||
exportClosure :: Hash -> Bundle
|
||||
importBundle :: Bundle -> StoreM ()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 4 — Runtime Optimization
|
||||
|
||||
Optional:
|
||||
|
||||
* memoized load
|
||||
* DAG-preserving runtime
|
||||
* step counter in evaluator
|
||||
|
||||
---
|
||||
|
||||
## What NOT To Do
|
||||
|
||||
Do NOT:
|
||||
|
||||
* hash full trees anymore
|
||||
* store serialized `T` directly
|
||||
* allow multiple encodings
|
||||
* include runtime state in nodes
|
||||
* depend on evaluation for hashing
|
||||
|
||||
---
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
Add tests for:
|
||||
|
||||
### Identity
|
||||
|
||||
```text
|
||||
same term -> same hash
|
||||
```
|
||||
|
||||
### Deduplication
|
||||
|
||||
```text
|
||||
Fork A A stores A once
|
||||
```
|
||||
|
||||
### Roundtrip
|
||||
|
||||
```text
|
||||
T -> hash -> loadTree -> T (equal)
|
||||
```
|
||||
|
||||
### Cross-run stability
|
||||
|
||||
Hash must not change between runs
|
||||
|
||||
---
|
||||
|
||||
## Optional Enhancements
|
||||
|
||||
Not required for initial implementation:
|
||||
|
||||
* lazy loading
|
||||
* partial fetch (networked store)
|
||||
* compression at storage layer
|
||||
* typed wrappers
|
||||
* DAG-aware evaluator
|
||||
|
||||
---
|
||||
|
||||
## Key Insight
|
||||
|
||||
You are not storing programs anymore.
|
||||
|
||||
You are storing:
|
||||
|
||||
```text
|
||||
a canonical graph of computation
|
||||
```
|
||||
|
||||
Everything else (execution, wire, language) sits on top.
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
You know this is working when:
|
||||
|
||||
* identical subtrees collapse globally
|
||||
* hashes are stable across runs
|
||||
* small programs reuse large portions of structure
|
||||
* runtime can reconstruct and execute correctly
|
||||
* wire bundles can reconstruct store elsewhere
|
||||
|
||||
---
|
||||
|
||||
## Final Mental Model
|
||||
|
||||
```text
|
||||
Authoring: tricu source
|
||||
Lowering: Tree Calculus (T)
|
||||
|
||||
Identity: Merkle hash(root)
|
||||
|
||||
Storage: Merkle DAG (node store)
|
||||
|
||||
Wire: compressed node bundles
|
||||
|
||||
Execution: reconstructed graph → reduce
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
If anything is unclear during implementation, prioritize:
|
||||
|
||||
```text
|
||||
determinism > simplicity > performance
|
||||
```
|
||||
|
||||
In order to run tests, simply `nix build .#`. All tests must pass without modification.
|
||||
Reference in New Issue
Block a user