5.0 KiB
TRICU MERKLE CONTENT STORE — HANDOFF DOC
Objective
Replace the current whole-term content store with a Merkle DAG–based content store for Tree Calculus terms.
Goal:
-
Canonical, cross-language, content-addressed representation
-
Maximal structural deduplication
-
Clean separation of:
- identity (hash)
- storage (nodes)
- transport (packages)
- execution (runtime graph)
Current State (contentstore branch)
You currently have:
Term (T)
-> serializeTerm (Cereal)
-> sha256(bytes)
-> store full term blob
This is:
- canonical at whole-term level
- NOT deduplicated internally
- NOT Merkle
Target Architecture
Core Concept
Each Tree Calculus node becomes a content-addressed object:
Leaf:
hash = H( tag_leaf )
Stem:
hash = H( tag_stem || child_hash )
Fork:
hash = H( tag_fork || left_hash || right_hash )
Content store:
Hash -> Node(tag, child_hashes)
A program is:
root_hash
Data Model (Introduce)
Define a new canonical node type:
data Node
= NLeaf
| NStem Hash
| NFork Hash Hash
Define:
type Hash = ByteString -- SHA-256
Canonical Serialization (CRITICAL)
Define a strict, minimal, cross-language spec:
Node payload:
Leaf: 0x00
Stem: 0x01 || child_hash
Fork: 0x02 || left_hash || right_hash
Node hash:
SHA256( UTF8("tricu.merkle.node.v1") || 0x00 || node_payload )
Store:
node_hash -> node_payload
The only thing I would avoid is storing the version inside every node payload unless you need every node to be self-describing. Put it in the hash preimage and in the store/package metadata. That gives versioning without bloating every node.
Required Invariants
These MUST hold:
- Determinism
same tree → same hashes everywhere
- Structural identity
identical subtrees → identical hashes
- No dependence on DAG shape
Tree identity must not depend on construction order.
- Hash correctness
lookup(hash) -> node
hash(node) == hash
Core Functions to Implement
1. Convert Tree → Merkle DAG
buildMerkle :: T -> State Store Hash
Behavior:
- recursively compute child hashes
- create Node
- store if not exists
- return hash
This is the entry point replacing current storage.
2. Store Interface
putNode :: Node -> StoreM Hash
getNode :: Hash -> StoreM Node
Store layout can be:
/data/<hash>
3. Reconstruct Tree (for execution)
loadTree :: Hash -> StoreM T
Recursive:
- fetch node
- rebuild T
- optionally cache
4. Execution
Reuse existing evaluator:
eval :: T -> T
No change required.
Phase Plan
Phase 1 — Minimal Merkle Store
- Implement Node type
- Implement canonical serialization
- Implement
buildMerkle - Replace current
putlogic - Add
loadTree
Goal: roundtrip correctness
Phase 2 — Dedup Verification
Add diagnostics:
countNodes :: Hash -> Int
Test:
- repeated structures only stored once
- identical subtrees share hash
Phase 3 — Wire Format
Define transport:
bundle = compress(
list of (hash, serialized_node)
)
Implement:
exportClosure :: Hash -> Bundle
importBundle :: Bundle -> StoreM ()
Phase 4 — Runtime Optimization
Optional:
- memoized load
- DAG-preserving runtime
- step counter in evaluator
What NOT To Do
Do NOT:
- hash full trees anymore
- store serialized
Tdirectly - allow multiple encodings
- include runtime state in nodes
- depend on evaluation for hashing
Testing Requirements
Add tests for:
Identity
same term -> same hash
Deduplication
Fork A A stores A once
Roundtrip
T -> hash -> loadTree -> T (equal)
Cross-run stability
Hash must not change between runs
Optional Enhancements
Not required for initial implementation:
- lazy loading
- partial fetch (networked store)
- compression at storage layer
- typed wrappers
- DAG-aware evaluator
Key Insight
You are not storing programs anymore.
You are storing:
a canonical graph of computation
Everything else (execution, wire, language) sits on top.
Success Criteria
You know this is working when:
- identical subtrees collapse globally
- hashes are stable across runs
- small programs reuse large portions of structure
- runtime can reconstruct and execute correctly
- wire bundles can reconstruct store elsewhere
Final Mental Model
Authoring: tricu source
Lowering: Tree Calculus (T)
Identity: Merkle hash(root)
Storage: Merkle DAG (node store)
Wire: compressed node bundles
Execution: reconstructed graph → reduce
If anything is unclear during implementation, prioritize:
determinism > simplicity > performance
In order to run tests, simply nix build .#. All tests must pass without modification.