merkle exec working 1

This commit is contained in:
James Eversole
2026-05-05 12:43:03 -05:00
parent 72e5810ca9
commit c25170ebd5
4 changed files with 500 additions and 4 deletions

358
MERKLE.md Normal file
View File

@@ -0,0 +1,358 @@
# TRICU MERKLE CONTENT STORE — HANDOFF DOC
## Objective
Replace the current **whole-term content store** with a **Merkle DAGbased content store** for Tree Calculus terms.
Goal:
* Canonical, cross-language, content-addressed representation
* Maximal structural deduplication
* Clean separation of:
* identity (hash)
* storage (nodes)
* transport (packages)
* execution (runtime graph)
---
## Current State (contentstore branch)
You currently have:
```text
Term (T)
-> serializeTerm (Cereal)
-> sha256(bytes)
-> store full term blob
```
This is:
* canonical at whole-term level
* NOT deduplicated internally
* NOT Merkle
---
## Target Architecture
### Core Concept
Each Tree Calculus node becomes a content-addressed object:
```text
Leaf:
hash = H( tag_leaf )
Stem:
hash = H( tag_stem || child_hash )
Fork:
hash = H( tag_fork || left_hash || right_hash )
```
Content store:
```text
Hash -> Node(tag, child_hashes)
```
A program is:
```text
root_hash
```
---
## Data Model (Introduce)
Define a new canonical node type:
```haskell
data Node
= NLeaf
| NStem Hash
| NFork Hash Hash
```
Define:
```haskell
type Hash = ByteString -- SHA-256
```
---
## Canonical Serialization (CRITICAL)
Define a **strict, minimal, cross-language spec**:
```text
Node payload:
Leaf: 0x00
Stem: 0x01 || child_hash
Fork: 0x02 || left_hash || right_hash
Node hash:
SHA256( UTF8("tricu.merkle.node.v1") || 0x00 || node_payload )
Store:
node_hash -> node_payload
```
The only thing I would avoid is storing the version inside every node payload unless you need every node to be self-describing. Put it in the hash preimage and in the store/package metadata. That gives versioning without bloating every node.
---
## Required Invariants
These MUST hold:
1. **Determinism**
```text
same tree → same hashes everywhere
```
2. **Structural identity**
```text
identical subtrees → identical hashes
```
3. **No dependence on DAG shape**
Tree identity must not depend on construction order.
4. **Hash correctness**
```text
lookup(hash) -> node
hash(node) == hash
```
---
## Core Functions to Implement
### 1. Convert Tree → Merkle DAG
```haskell
buildMerkle :: T -> State Store Hash
```
Behavior:
* recursively compute child hashes
* create Node
* store if not exists
* return hash
This is the entry point replacing current storage.
---
### 2. Store Interface
```haskell
putNode :: Node -> StoreM Hash
getNode :: Hash -> StoreM Node
```
Store layout can be:
```text
/data/<hash>
```
---
### 3. Reconstruct Tree (for execution)
```haskell
loadTree :: Hash -> StoreM T
```
Recursive:
* fetch node
* rebuild T
* optionally cache
---
### 4. Execution
Reuse existing evaluator:
```haskell
eval :: T -> T
```
No change required.
---
## Phase Plan
### Phase 1 — Minimal Merkle Store
* Implement Node type
* Implement canonical serialization
* Implement `buildMerkle`
* Replace current `put` logic
* Add `loadTree`
Goal: roundtrip correctness
---
### Phase 2 — Dedup Verification
Add diagnostics:
```haskell
countNodes :: Hash -> Int
```
Test:
* repeated structures only stored once
* identical subtrees share hash
---
### Phase 3 — Wire Format
Define transport:
```text
bundle = compress(
list of (hash, serialized_node)
)
```
Implement:
```haskell
exportClosure :: Hash -> Bundle
importBundle :: Bundle -> StoreM ()
```
---
### Phase 4 — Runtime Optimization
Optional:
* memoized load
* DAG-preserving runtime
* step counter in evaluator
---
## What NOT To Do
Do NOT:
* hash full trees anymore
* store serialized `T` directly
* allow multiple encodings
* include runtime state in nodes
* depend on evaluation for hashing
---
## Testing Requirements
Add tests for:
### Identity
```text
same term -> same hash
```
### Deduplication
```text
Fork A A stores A once
```
### Roundtrip
```text
T -> hash -> loadTree -> T (equal)
```
### Cross-run stability
Hash must not change between runs
---
## Optional Enhancements
Not required for initial implementation:
* lazy loading
* partial fetch (networked store)
* compression at storage layer
* typed wrappers
* DAG-aware evaluator
---
## Key Insight
You are not storing programs anymore.
You are storing:
```text
a canonical graph of computation
```
Everything else (execution, wire, language) sits on top.
---
## Success Criteria
You know this is working when:
* identical subtrees collapse globally
* hashes are stable across runs
* small programs reuse large portions of structure
* runtime can reconstruct and execute correctly
* wire bundles can reconstruct store elsewhere
---
## Final Mental Model
```text
Authoring: tricu source
Lowering: Tree Calculus (T)
Identity: Merkle hash(root)
Storage: Merkle DAG (node store)
Wire: compressed node bundles
Execution: reconstructed graph → reduce
```
---
If anything is unclear during implementation, prioritize:
```text
determinism > simplicity > performance
```
In order to run tests, simply `nix build .#`. All tests must pass without modification.

View File

@@ -20,6 +20,11 @@ import qualified Data.Map as Map
import qualified Data.Serialize as Cereal
import qualified Data.Text as T
data StoredNode = StoredNode ByteString deriving (Show)
instance FromRow StoredNode where
fromRow = StoredNode <$> field
data StoredTerm = StoredTerm
{ termHash :: Text
, termNames :: Text
@@ -60,6 +65,9 @@ initContentStore = do
\tags TEXT DEFAULT '')"
execute_ conn "CREATE INDEX IF NOT EXISTS terms_names_idx ON terms(names)"
execute_ conn "CREATE INDEX IF NOT EXISTS terms_tags_idx ON terms(tags)"
execute_ conn "CREATE TABLE IF NOT EXISTS merkle_nodes (\
\hash TEXT PRIMARY KEY, \
\node_data BLOB NOT NULL)"
return conn
getContentStorePath :: IO FilePath
@@ -92,35 +100,83 @@ deserializeTerm :: ByteString -> Either String T
deserializeTerm = Cereal.decodeLazy . LBS.fromStrict
hashTerm :: T -> Text
hashTerm = T.pack . show . (hash :: ByteString -> Digest SHA256) . serializeTerm
hashTerm = nodeHash . buildMerkle
storeTerm :: Connection -> [String] -> T -> IO Text
storeTerm conn newNamesStrList term = do
putStrLn $ "DEBUG: storeTerm called for names " ++ show newNamesStrList
let termBS = serializeTerm term
termHashText = hashTerm term
newNamesTextList = map T.pack newNamesStrList
metadataText = T.pack "{}"
putStrLn $ "DEBUG: termHash = " ++ T.unpack termHashText
-- Store all Merkle nodes for this term
putStrLn "DEBUG: storing merkle nodes"
_ <- storeMerkleNodes conn term
putStrLn "DEBUG: merkle nodes stored, querying existing"
existingNamesQuery <- query conn
"SELECT names FROM terms WHERE hash = ?"
(Only termHashText) :: IO [Only Text]
putStrLn $ "DEBUG: query result = " ++ show (length existingNamesQuery) ++ " results"
case existingNamesQuery of
[] -> do
putStrLn "DEBUG: inserting new term"
let allNamesToStore = serializeNameList newNamesTextList
execute conn
"INSERT INTO terms (hash, names, term_data, metadata, tags) VALUES (?, ?, ?, ?, ?)"
(termHashText, allNamesToStore, termBS, metadataText, T.pack "")
putStrLn "DEBUG: insert complete"
[(Only currentNamesText)] -> do
putStrLn $ "DEBUG: updating existing term, current names = " ++ T.unpack currentNamesText
let currentNamesList = parseNameList currentNamesText
let combinedNamesList = currentNamesList ++ newNamesTextList
let allNamesToStore = serializeNameList combinedNamesList
execute conn
"UPDATE terms SET names = ?, metadata = ? WHERE hash = ?"
(allNamesToStore, metadataText, termHashText)
putStrLn "DEBUG: update complete"
_ -> error $ "Multiple terms with same hash? " ++ show (length existingNamesQuery)
putStrLn "DEBUG: storeTerm returning"
return termHashText
-- | Store all nodes of a Merkle DAG by traversing the Term and building/storing nodes.
-- Returns the hash of the root node.
-- This function builds the Merkle DAG while storing nodes, ensuring children are
-- stored before their parents.
storeMerkleNodes :: Connection -> T -> IO MerkleHash
storeMerkleNodes _ Leaf = return $ nodeHash NLeaf -- NLeaf hash is constant
storeMerkleNodes conn (Stem t) = do
-- Recursively process child first and get its hash
childHash <- storeMerkleNodes conn t
-- Store this node
let thisNode = NStem childHash
putMerkleNode conn thisNode
return $ nodeHash thisNode
storeMerkleNodes conn (Fork l r) = do
-- Recursively process both children first and get their hashes
leftHash <- storeMerkleNodes conn l
rightHash <- storeMerkleNodes conn r
-- Store this node
let thisNode = NFork leftHash rightHash
putMerkleNode conn thisNode
return $ nodeHash thisNode
-- | Insert a Merkle node into the store (idempotent).
putMerkleNode :: Connection -> Node -> IO ()
putMerkleNode conn node =
execute conn "INSERT OR IGNORE INTO merkle_nodes (hash, node_data) VALUES (?, ?)"
(nodeHash node, serializeNode node)
-- | Retrieve a Merkle node by its hash.
getNodeMerkle :: Connection -> MerkleHash -> IO (Maybe Node)
getNodeMerkle conn h =
queryMaybeOne conn "SELECT node_data FROM merkle_nodes WHERE hash = ?" (Only h) >>= \case
Just (StoredNode bs) -> return $ Just (deserializeNode bs)
Nothing -> return Nothing
hashToTerm :: Connection -> Text -> IO (Maybe StoredTerm)
hashToTerm conn hashText =
queryMaybeOne conn (selectStoredTermFields <> " WHERE hash = ?") (Only hashText)

View File

@@ -1,12 +1,17 @@
module Research where
import Data.ByteArray (convert)
import Data.Char (chr, ord)
import Data.List (intercalate)
import Data.Map (Map)
import Data.Text (Text, replace)
import Data.Text (Text, replace, unpack)
import Data.Word (Word8)
import System.Console.CmdArgs (Data, Typeable)
import qualified Data.ByteString as BS
import qualified Data.Map as Map
import qualified Data.Text as T
import qualified Data.Text as T
import Crypto.Hash (hash, SHA256, Digest)
-- Tree Calculus Types
data T = Leaf | Stem T | Fork T T
@@ -54,6 +59,81 @@ data EvaluatedForm = TreeCalculus | FSL | AST | Ternary | Ascii | Decode
-- Environment containing previously evaluated TC terms
type Env = Map.Map String T
-- Merkle DAG Node types
-- Each Tree Calculus node becomes a content-addressed object.
type MerkleHash = Text
data Node
= NLeaf
| NStem MerkleHash
| NFork MerkleHash MerkleHash
deriving (Show, Eq, Ord)
-- | Canonical serialization of a Node for hashing.
-- Leaf: 0x00
-- Stem: 0x01 || child_hash (32 bytes)
-- Fork: 0x02 || left_hash (32 bytes) || right_hash (32 bytes)
serializeNode :: Node -> BS.ByteString
serializeNode NLeaf = BS.pack [0x00]
serializeNode (NStem h) = BS.pack [0x01] <> hexToBytes h
serializeNode (NFork l r) = BS.pack [0x02] <> hexToBytes l <> hexToBytes r
-- | Hash a node per the Merkle content-addressing spec.
-- hash = SHA256( "tricu.merkle.node.v1" <> 0x00 <> node_payload )
nodeHash :: Node -> MerkleHash
nodeHash node = bytesToHex (sha256WithPrefix (serializeNode node))
where sha256WithPrefix payload =
convert . (hash :: BS.ByteString -> Digest SHA256) $ utf8Tag <> BS.pack [0x00] <> payload
utf8Tag = BS.pack $ map fromIntegral $ BS.unpack "tricu.merkle.node.v1"
-- | Convert a Hex Text hash into raw 32-byte ByteString
hexToBytes :: Text -> BS.ByteString
hexToBytes h = BS.pack $ map (uncurry combinePair) pairs
where
pairs :: [(Char, Char)]
pairs = zip (unpack h) (drop 1 $ unpack h)
combinePair :: Char -> Char -> Word8
combinePair c1 c2 = fromIntegral (nib1 * 16 + nib2)
where nib1 = hexDigitToInt c1
nib2 = hexDigitToInt c2
hexDigitToInt :: Char -> Int
hexDigitToInt c
| '0' <= c && c <= '9' = ord c - ord '0'
| 'a' <= c && c <= 'f' = ord c - ord 'a' + 10
| 'A' <= c && c <= 'F' = ord c - ord 'A' + 10
| otherwise = error $ "Invalid hex digit: " ++ show c
-- | Deserialize a Node from canonical bytes.
deserializeNode :: BS.ByteString -> Node
deserializeNode bs
| BS.null bs || BS.head bs == 0x00 = NLeaf
| BS.head bs == 0x01 = NStem $ bytesToHex (BS.take 32 (BS.drop 1 bs))
| BS.head bs == 0x02 = NFork (bytesToHex (BS.take 32 (BS.drop 1 bs)))
(bytesToHex (BS.take 32 (BS.drop 33 (BS.drop 1 bs))))
| otherwise = error $ "Unknown node type tag: " ++ show (BS.head bs)
-- | Convert 32-byte ByteString back to hex Text
bytesToHex :: BS.ByteString -> Text
bytesToHex bs = T.pack $ concatMap byteToHexChars $ BS.unpack bs
where
byteToHexChars :: Word8 -> String
byteToHexChars w = [hexDigit (fromIntegral w `div` 16), hexDigit (fromIntegral w `mod` 16)]
hexDigit :: Int -> Char
hexDigit n
| n < 10 = chr (ord '0' + n)
| otherwise = chr (ord 'a' + n - 10)
-- | Build a Merkle DAG from a Tree Calculus term.
buildMerkle :: T -> Node
buildMerkle Leaf = NLeaf
buildMerkle (Stem t) = NStem (nodeHash child)
where child = buildMerkle t
buildMerkle (Fork l r) = NFork (nodeHash left) (nodeHash right)
where
left = buildMerkle l
right = buildMerkle r
-- Tree Calculus Reduction Rules
{-
The t operator is left associative.

View File

@@ -39,6 +39,7 @@ executable tricu
, fsnotify
, haskeline
, megaparsec
, memory
, mtl
, sqlite-simple
, tasty
@@ -82,6 +83,7 @@ test-suite tricu-tests
, fsnotify
, haskeline
, megaparsec
, memory
, mtl
, sqlite-simple
, tasty