# TRICU MERKLE CONTENT STORE — HANDOFF DOC ## Objective Replace the current **whole-term content store** with a **Merkle DAG–based content store** for Tree Calculus terms. Goal: * Canonical, cross-language, content-addressed representation * Maximal structural deduplication * Clean separation of: * identity (hash) * storage (nodes) * transport (packages) * execution (runtime graph) --- ## Current State (contentstore branch) You currently have: ```text Term (T) -> serializeTerm (Cereal) -> sha256(bytes) -> store full term blob ``` This is: * canonical at whole-term level * NOT deduplicated internally * NOT Merkle --- ## Target Architecture ### Core Concept Each Tree Calculus node becomes a content-addressed object: ```text Leaf: hash = H( tag_leaf ) Stem: hash = H( tag_stem || child_hash ) Fork: hash = H( tag_fork || left_hash || right_hash ) ``` Content store: ```text Hash -> Node(tag, child_hashes) ``` A program is: ```text root_hash ``` --- ## Data Model (Introduce) Define a new canonical node type: ```haskell data Node = NLeaf | NStem Hash | NFork Hash Hash ``` Define: ```haskell type Hash = ByteString -- SHA-256 ``` --- ## Canonical Serialization (CRITICAL) Define a **strict, minimal, cross-language spec**: ```text Node payload: Leaf: 0x00 Stem: 0x01 || child_hash Fork: 0x02 || left_hash || right_hash Node hash: SHA256( UTF8("tricu.merkle.node.v1") || 0x00 || node_payload ) Store: node_hash -> node_payload ``` The only thing I would avoid is storing the version inside every node payload unless you need every node to be self-describing. Put it in the hash preimage and in the store/package metadata. That gives versioning without bloating every node. --- ## Required Invariants These MUST hold: 1. **Determinism** ```text same tree → same hashes everywhere ``` 2. **Structural identity** ```text identical subtrees → identical hashes ``` 3. **No dependence on DAG shape** Tree identity must not depend on construction order. 4. **Hash correctness** ```text lookup(hash) -> node hash(node) == hash ``` --- ## Core Functions to Implement ### 1. Convert Tree → Merkle DAG ```haskell buildMerkle :: T -> State Store Hash ``` Behavior: * recursively compute child hashes * create Node * store if not exists * return hash This is the entry point replacing current storage. --- ### 2. Store Interface ```haskell putNode :: Node -> StoreM Hash getNode :: Hash -> StoreM Node ``` Store layout can be: ```text /data/ ``` --- ### 3. Reconstruct Tree (for execution) ```haskell loadTree :: Hash -> StoreM T ``` Recursive: * fetch node * rebuild T * optionally cache --- ### 4. Execution Reuse existing evaluator: ```haskell eval :: T -> T ``` No change required. --- ## Phase Plan ### Phase 1 — Minimal Merkle Store * Implement Node type * Implement canonical serialization * Implement `buildMerkle` * Replace current `put` logic * Add `loadTree` Goal: roundtrip correctness --- ### Phase 2 — Dedup Verification Add diagnostics: ```haskell countNodes :: Hash -> Int ``` Test: * repeated structures only stored once * identical subtrees share hash --- ### Phase 3 — Wire Format Define transport: ```text bundle = compress( list of (hash, serialized_node) ) ``` Implement: ```haskell exportClosure :: Hash -> Bundle importBundle :: Bundle -> StoreM () ``` --- ### Phase 4 — Runtime Optimization Optional: * memoized load * DAG-preserving runtime * step counter in evaluator --- ## What NOT To Do Do NOT: * hash full trees anymore * store serialized `T` directly * allow multiple encodings * include runtime state in nodes * depend on evaluation for hashing --- ## Testing Requirements Add tests for: ### Identity ```text same term -> same hash ``` ### Deduplication ```text Fork A A stores A once ``` ### Roundtrip ```text T -> hash -> loadTree -> T (equal) ``` ### Cross-run stability Hash must not change between runs --- ## Optional Enhancements Not required for initial implementation: * lazy loading * partial fetch (networked store) * compression at storage layer * typed wrappers * DAG-aware evaluator --- ## Key Insight You are not storing programs anymore. You are storing: ```text a canonical graph of computation ``` Everything else (execution, wire, language) sits on top. --- ## Success Criteria You know this is working when: * identical subtrees collapse globally * hashes are stable across runs * small programs reuse large portions of structure * runtime can reconstruct and execute correctly * wire bundles can reconstruct store elsewhere --- ## Final Mental Model ```text Authoring: tricu source Lowering: Tree Calculus (T) Identity: Merkle hash(root) Storage: Merkle DAG (node store) Wire: compressed node bundles Execution: reconstructed graph → reduce ``` --- If anything is unclear during implementation, prioritize: ```text determinism > simplicity > performance ``` In order to run tests, simply `nix build .#`. All tests must pass without modification.