merkle exec working 1
This commit is contained in:
358
MERKLE.md
Normal file
358
MERKLE.md
Normal file
@@ -0,0 +1,358 @@
|
||||
# TRICU MERKLE CONTENT STORE — HANDOFF DOC
|
||||
|
||||
## Objective
|
||||
|
||||
Replace the current **whole-term content store** with a **Merkle DAG–based content store** for Tree Calculus terms.
|
||||
|
||||
Goal:
|
||||
|
||||
* Canonical, cross-language, content-addressed representation
|
||||
* Maximal structural deduplication
|
||||
* Clean separation of:
|
||||
|
||||
* identity (hash)
|
||||
* storage (nodes)
|
||||
* transport (packages)
|
||||
* execution (runtime graph)
|
||||
|
||||
---
|
||||
|
||||
## Current State (contentstore branch)
|
||||
|
||||
You currently have:
|
||||
|
||||
```text
|
||||
Term (T)
|
||||
-> serializeTerm (Cereal)
|
||||
-> sha256(bytes)
|
||||
-> store full term blob
|
||||
```
|
||||
|
||||
This is:
|
||||
|
||||
* canonical at whole-term level
|
||||
* NOT deduplicated internally
|
||||
* NOT Merkle
|
||||
|
||||
---
|
||||
|
||||
## Target Architecture
|
||||
|
||||
### Core Concept
|
||||
|
||||
Each Tree Calculus node becomes a content-addressed object:
|
||||
|
||||
```text
|
||||
Leaf:
|
||||
hash = H( tag_leaf )
|
||||
|
||||
Stem:
|
||||
hash = H( tag_stem || child_hash )
|
||||
|
||||
Fork:
|
||||
hash = H( tag_fork || left_hash || right_hash )
|
||||
```
|
||||
|
||||
Content store:
|
||||
|
||||
```text
|
||||
Hash -> Node(tag, child_hashes)
|
||||
```
|
||||
|
||||
A program is:
|
||||
|
||||
```text
|
||||
root_hash
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Model (Introduce)
|
||||
|
||||
Define a new canonical node type:
|
||||
|
||||
```haskell
|
||||
data Node
|
||||
= NLeaf
|
||||
| NStem Hash
|
||||
| NFork Hash Hash
|
||||
```
|
||||
|
||||
Define:
|
||||
|
||||
```haskell
|
||||
type Hash = ByteString -- SHA-256
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Canonical Serialization (CRITICAL)
|
||||
|
||||
Define a **strict, minimal, cross-language spec**:
|
||||
|
||||
```text
|
||||
Node payload:
|
||||
Leaf: 0x00
|
||||
Stem: 0x01 || child_hash
|
||||
Fork: 0x02 || left_hash || right_hash
|
||||
|
||||
Node hash:
|
||||
SHA256( UTF8("tricu.merkle.node.v1") || 0x00 || node_payload )
|
||||
|
||||
Store:
|
||||
node_hash -> node_payload
|
||||
```
|
||||
|
||||
The only thing I would avoid is storing the version inside every node payload unless you need every node to be self-describing. Put it in the hash preimage and in the store/package metadata. That gives versioning without bloating every node.
|
||||
|
||||
---
|
||||
|
||||
## Required Invariants
|
||||
|
||||
These MUST hold:
|
||||
|
||||
1. **Determinism**
|
||||
|
||||
```text
|
||||
same tree → same hashes everywhere
|
||||
```
|
||||
|
||||
2. **Structural identity**
|
||||
|
||||
```text
|
||||
identical subtrees → identical hashes
|
||||
```
|
||||
|
||||
3. **No dependence on DAG shape**
|
||||
|
||||
Tree identity must not depend on construction order.
|
||||
|
||||
4. **Hash correctness**
|
||||
|
||||
```text
|
||||
lookup(hash) -> node
|
||||
hash(node) == hash
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Core Functions to Implement
|
||||
|
||||
### 1. Convert Tree → Merkle DAG
|
||||
|
||||
```haskell
|
||||
buildMerkle :: T -> State Store Hash
|
||||
```
|
||||
|
||||
Behavior:
|
||||
|
||||
* recursively compute child hashes
|
||||
* create Node
|
||||
* store if not exists
|
||||
* return hash
|
||||
|
||||
This is the entry point replacing current storage.
|
||||
|
||||
---
|
||||
|
||||
### 2. Store Interface
|
||||
|
||||
```haskell
|
||||
putNode :: Node -> StoreM Hash
|
||||
getNode :: Hash -> StoreM Node
|
||||
```
|
||||
|
||||
Store layout can be:
|
||||
|
||||
```text
|
||||
/data/<hash>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Reconstruct Tree (for execution)
|
||||
|
||||
```haskell
|
||||
loadTree :: Hash -> StoreM T
|
||||
```
|
||||
|
||||
Recursive:
|
||||
|
||||
* fetch node
|
||||
* rebuild T
|
||||
* optionally cache
|
||||
|
||||
---
|
||||
|
||||
### 4. Execution
|
||||
|
||||
Reuse existing evaluator:
|
||||
|
||||
```haskell
|
||||
eval :: T -> T
|
||||
```
|
||||
|
||||
No change required.
|
||||
|
||||
---
|
||||
|
||||
## Phase Plan
|
||||
|
||||
### Phase 1 — Minimal Merkle Store
|
||||
|
||||
* Implement Node type
|
||||
* Implement canonical serialization
|
||||
* Implement `buildMerkle`
|
||||
* Replace current `put` logic
|
||||
* Add `loadTree`
|
||||
|
||||
Goal: roundtrip correctness
|
||||
|
||||
---
|
||||
|
||||
### Phase 2 — Dedup Verification
|
||||
|
||||
Add diagnostics:
|
||||
|
||||
```haskell
|
||||
countNodes :: Hash -> Int
|
||||
```
|
||||
|
||||
Test:
|
||||
|
||||
* repeated structures only stored once
|
||||
* identical subtrees share hash
|
||||
|
||||
---
|
||||
|
||||
### Phase 3 — Wire Format
|
||||
|
||||
Define transport:
|
||||
|
||||
```text
|
||||
bundle = compress(
|
||||
list of (hash, serialized_node)
|
||||
)
|
||||
```
|
||||
|
||||
Implement:
|
||||
|
||||
```haskell
|
||||
exportClosure :: Hash -> Bundle
|
||||
importBundle :: Bundle -> StoreM ()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 4 — Runtime Optimization
|
||||
|
||||
Optional:
|
||||
|
||||
* memoized load
|
||||
* DAG-preserving runtime
|
||||
* step counter in evaluator
|
||||
|
||||
---
|
||||
|
||||
## What NOT To Do
|
||||
|
||||
Do NOT:
|
||||
|
||||
* hash full trees anymore
|
||||
* store serialized `T` directly
|
||||
* allow multiple encodings
|
||||
* include runtime state in nodes
|
||||
* depend on evaluation for hashing
|
||||
|
||||
---
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
Add tests for:
|
||||
|
||||
### Identity
|
||||
|
||||
```text
|
||||
same term -> same hash
|
||||
```
|
||||
|
||||
### Deduplication
|
||||
|
||||
```text
|
||||
Fork A A stores A once
|
||||
```
|
||||
|
||||
### Roundtrip
|
||||
|
||||
```text
|
||||
T -> hash -> loadTree -> T (equal)
|
||||
```
|
||||
|
||||
### Cross-run stability
|
||||
|
||||
Hash must not change between runs
|
||||
|
||||
---
|
||||
|
||||
## Optional Enhancements
|
||||
|
||||
Not required for initial implementation:
|
||||
|
||||
* lazy loading
|
||||
* partial fetch (networked store)
|
||||
* compression at storage layer
|
||||
* typed wrappers
|
||||
* DAG-aware evaluator
|
||||
|
||||
---
|
||||
|
||||
## Key Insight
|
||||
|
||||
You are not storing programs anymore.
|
||||
|
||||
You are storing:
|
||||
|
||||
```text
|
||||
a canonical graph of computation
|
||||
```
|
||||
|
||||
Everything else (execution, wire, language) sits on top.
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
You know this is working when:
|
||||
|
||||
* identical subtrees collapse globally
|
||||
* hashes are stable across runs
|
||||
* small programs reuse large portions of structure
|
||||
* runtime can reconstruct and execute correctly
|
||||
* wire bundles can reconstruct store elsewhere
|
||||
|
||||
---
|
||||
|
||||
## Final Mental Model
|
||||
|
||||
```text
|
||||
Authoring: tricu source
|
||||
Lowering: Tree Calculus (T)
|
||||
|
||||
Identity: Merkle hash(root)
|
||||
|
||||
Storage: Merkle DAG (node store)
|
||||
|
||||
Wire: compressed node bundles
|
||||
|
||||
Execution: reconstructed graph → reduce
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
If anything is unclear during implementation, prioritize:
|
||||
|
||||
```text
|
||||
determinism > simplicity > performance
|
||||
```
|
||||
|
||||
In order to run tests, simply `nix build .#`. All tests must pass without modification.
|
||||
@@ -20,6 +20,11 @@ import qualified Data.Map as Map
|
||||
import qualified Data.Serialize as Cereal
|
||||
import qualified Data.Text as T
|
||||
|
||||
data StoredNode = StoredNode ByteString deriving (Show)
|
||||
|
||||
instance FromRow StoredNode where
|
||||
fromRow = StoredNode <$> field
|
||||
|
||||
data StoredTerm = StoredTerm
|
||||
{ termHash :: Text
|
||||
, termNames :: Text
|
||||
@@ -60,6 +65,9 @@ initContentStore = do
|
||||
\tags TEXT DEFAULT '')"
|
||||
execute_ conn "CREATE INDEX IF NOT EXISTS terms_names_idx ON terms(names)"
|
||||
execute_ conn "CREATE INDEX IF NOT EXISTS terms_tags_idx ON terms(tags)"
|
||||
execute_ conn "CREATE TABLE IF NOT EXISTS merkle_nodes (\
|
||||
\hash TEXT PRIMARY KEY, \
|
||||
\node_data BLOB NOT NULL)"
|
||||
return conn
|
||||
|
||||
getContentStorePath :: IO FilePath
|
||||
@@ -92,35 +100,83 @@ deserializeTerm :: ByteString -> Either String T
|
||||
deserializeTerm = Cereal.decodeLazy . LBS.fromStrict
|
||||
|
||||
hashTerm :: T -> Text
|
||||
hashTerm = T.pack . show . (hash :: ByteString -> Digest SHA256) . serializeTerm
|
||||
hashTerm = nodeHash . buildMerkle
|
||||
|
||||
storeTerm :: Connection -> [String] -> T -> IO Text
|
||||
storeTerm conn newNamesStrList term = do
|
||||
putStrLn $ "DEBUG: storeTerm called for names " ++ show newNamesStrList
|
||||
let termBS = serializeTerm term
|
||||
termHashText = hashTerm term
|
||||
newNamesTextList = map T.pack newNamesStrList
|
||||
metadataText = T.pack "{}"
|
||||
|
||||
putStrLn $ "DEBUG: termHash = " ++ T.unpack termHashText
|
||||
-- Store all Merkle nodes for this term
|
||||
putStrLn "DEBUG: storing merkle nodes"
|
||||
_ <- storeMerkleNodes conn term
|
||||
putStrLn "DEBUG: merkle nodes stored, querying existing"
|
||||
existingNamesQuery <- query conn
|
||||
"SELECT names FROM terms WHERE hash = ?"
|
||||
(Only termHashText) :: IO [Only Text]
|
||||
putStrLn $ "DEBUG: query result = " ++ show (length existingNamesQuery) ++ " results"
|
||||
|
||||
case existingNamesQuery of
|
||||
[] -> do
|
||||
putStrLn "DEBUG: inserting new term"
|
||||
let allNamesToStore = serializeNameList newNamesTextList
|
||||
execute conn
|
||||
"INSERT INTO terms (hash, names, term_data, metadata, tags) VALUES (?, ?, ?, ?, ?)"
|
||||
(termHashText, allNamesToStore, termBS, metadataText, T.pack "")
|
||||
putStrLn "DEBUG: insert complete"
|
||||
[(Only currentNamesText)] -> do
|
||||
putStrLn $ "DEBUG: updating existing term, current names = " ++ T.unpack currentNamesText
|
||||
let currentNamesList = parseNameList currentNamesText
|
||||
let combinedNamesList = currentNamesList ++ newNamesTextList
|
||||
let allNamesToStore = serializeNameList combinedNamesList
|
||||
execute conn
|
||||
"UPDATE terms SET names = ?, metadata = ? WHERE hash = ?"
|
||||
(allNamesToStore, metadataText, termHashText)
|
||||
putStrLn "DEBUG: update complete"
|
||||
_ -> error $ "Multiple terms with same hash? " ++ show (length existingNamesQuery)
|
||||
|
||||
putStrLn "DEBUG: storeTerm returning"
|
||||
return termHashText
|
||||
|
||||
-- | Store all nodes of a Merkle DAG by traversing the Term and building/storing nodes.
|
||||
-- Returns the hash of the root node.
|
||||
-- This function builds the Merkle DAG while storing nodes, ensuring children are
|
||||
-- stored before their parents.
|
||||
storeMerkleNodes :: Connection -> T -> IO MerkleHash
|
||||
storeMerkleNodes _ Leaf = return $ nodeHash NLeaf -- NLeaf hash is constant
|
||||
storeMerkleNodes conn (Stem t) = do
|
||||
-- Recursively process child first and get its hash
|
||||
childHash <- storeMerkleNodes conn t
|
||||
-- Store this node
|
||||
let thisNode = NStem childHash
|
||||
putMerkleNode conn thisNode
|
||||
return $ nodeHash thisNode
|
||||
storeMerkleNodes conn (Fork l r) = do
|
||||
-- Recursively process both children first and get their hashes
|
||||
leftHash <- storeMerkleNodes conn l
|
||||
rightHash <- storeMerkleNodes conn r
|
||||
-- Store this node
|
||||
let thisNode = NFork leftHash rightHash
|
||||
putMerkleNode conn thisNode
|
||||
return $ nodeHash thisNode
|
||||
|
||||
|
||||
-- | Insert a Merkle node into the store (idempotent).
|
||||
putMerkleNode :: Connection -> Node -> IO ()
|
||||
putMerkleNode conn node =
|
||||
execute conn "INSERT OR IGNORE INTO merkle_nodes (hash, node_data) VALUES (?, ?)"
|
||||
(nodeHash node, serializeNode node)
|
||||
|
||||
-- | Retrieve a Merkle node by its hash.
|
||||
getNodeMerkle :: Connection -> MerkleHash -> IO (Maybe Node)
|
||||
getNodeMerkle conn h =
|
||||
queryMaybeOne conn "SELECT node_data FROM merkle_nodes WHERE hash = ?" (Only h) >>= \case
|
||||
Just (StoredNode bs) -> return $ Just (deserializeNode bs)
|
||||
Nothing -> return Nothing
|
||||
|
||||
hashToTerm :: Connection -> Text -> IO (Maybe StoredTerm)
|
||||
hashToTerm conn hashText =
|
||||
queryMaybeOne conn (selectStoredTermFields <> " WHERE hash = ?") (Only hashText)
|
||||
|
||||
@@ -1,12 +1,17 @@
|
||||
module Research where
|
||||
|
||||
import Data.ByteArray (convert)
|
||||
import Data.Char (chr, ord)
|
||||
import Data.List (intercalate)
|
||||
import Data.Map (Map)
|
||||
import Data.Text (Text, replace)
|
||||
import Data.Text (Text, replace, unpack)
|
||||
import Data.Word (Word8)
|
||||
import System.Console.CmdArgs (Data, Typeable)
|
||||
|
||||
import qualified Data.ByteString as BS
|
||||
import qualified Data.Map as Map
|
||||
import qualified Data.Text as T
|
||||
import qualified Data.Text as T
|
||||
import Crypto.Hash (hash, SHA256, Digest)
|
||||
|
||||
-- Tree Calculus Types
|
||||
data T = Leaf | Stem T | Fork T T
|
||||
@@ -54,6 +59,81 @@ data EvaluatedForm = TreeCalculus | FSL | AST | Ternary | Ascii | Decode
|
||||
-- Environment containing previously evaluated TC terms
|
||||
type Env = Map.Map String T
|
||||
|
||||
-- Merkle DAG Node types
|
||||
-- Each Tree Calculus node becomes a content-addressed object.
|
||||
|
||||
type MerkleHash = Text
|
||||
|
||||
data Node
|
||||
= NLeaf
|
||||
| NStem MerkleHash
|
||||
| NFork MerkleHash MerkleHash
|
||||
deriving (Show, Eq, Ord)
|
||||
|
||||
-- | Canonical serialization of a Node for hashing.
|
||||
-- Leaf: 0x00
|
||||
-- Stem: 0x01 || child_hash (32 bytes)
|
||||
-- Fork: 0x02 || left_hash (32 bytes) || right_hash (32 bytes)
|
||||
serializeNode :: Node -> BS.ByteString
|
||||
serializeNode NLeaf = BS.pack [0x00]
|
||||
serializeNode (NStem h) = BS.pack [0x01] <> hexToBytes h
|
||||
serializeNode (NFork l r) = BS.pack [0x02] <> hexToBytes l <> hexToBytes r
|
||||
|
||||
-- | Hash a node per the Merkle content-addressing spec.
|
||||
-- hash = SHA256( "tricu.merkle.node.v1" <> 0x00 <> node_payload )
|
||||
nodeHash :: Node -> MerkleHash
|
||||
nodeHash node = bytesToHex (sha256WithPrefix (serializeNode node))
|
||||
where sha256WithPrefix payload =
|
||||
convert . (hash :: BS.ByteString -> Digest SHA256) $ utf8Tag <> BS.pack [0x00] <> payload
|
||||
utf8Tag = BS.pack $ map fromIntegral $ BS.unpack "tricu.merkle.node.v1"
|
||||
|
||||
-- | Convert a Hex Text hash into raw 32-byte ByteString
|
||||
hexToBytes :: Text -> BS.ByteString
|
||||
hexToBytes h = BS.pack $ map (uncurry combinePair) pairs
|
||||
where
|
||||
pairs :: [(Char, Char)]
|
||||
pairs = zip (unpack h) (drop 1 $ unpack h)
|
||||
combinePair :: Char -> Char -> Word8
|
||||
combinePair c1 c2 = fromIntegral (nib1 * 16 + nib2)
|
||||
where nib1 = hexDigitToInt c1
|
||||
nib2 = hexDigitToInt c2
|
||||
hexDigitToInt :: Char -> Int
|
||||
hexDigitToInt c
|
||||
| '0' <= c && c <= '9' = ord c - ord '0'
|
||||
| 'a' <= c && c <= 'f' = ord c - ord 'a' + 10
|
||||
| 'A' <= c && c <= 'F' = ord c - ord 'A' + 10
|
||||
| otherwise = error $ "Invalid hex digit: " ++ show c
|
||||
|
||||
-- | Deserialize a Node from canonical bytes.
|
||||
deserializeNode :: BS.ByteString -> Node
|
||||
deserializeNode bs
|
||||
| BS.null bs || BS.head bs == 0x00 = NLeaf
|
||||
| BS.head bs == 0x01 = NStem $ bytesToHex (BS.take 32 (BS.drop 1 bs))
|
||||
| BS.head bs == 0x02 = NFork (bytesToHex (BS.take 32 (BS.drop 1 bs)))
|
||||
(bytesToHex (BS.take 32 (BS.drop 33 (BS.drop 1 bs))))
|
||||
| otherwise = error $ "Unknown node type tag: " ++ show (BS.head bs)
|
||||
|
||||
-- | Convert 32-byte ByteString back to hex Text
|
||||
bytesToHex :: BS.ByteString -> Text
|
||||
bytesToHex bs = T.pack $ concatMap byteToHexChars $ BS.unpack bs
|
||||
where
|
||||
byteToHexChars :: Word8 -> String
|
||||
byteToHexChars w = [hexDigit (fromIntegral w `div` 16), hexDigit (fromIntegral w `mod` 16)]
|
||||
hexDigit :: Int -> Char
|
||||
hexDigit n
|
||||
| n < 10 = chr (ord '0' + n)
|
||||
| otherwise = chr (ord 'a' + n - 10)
|
||||
|
||||
-- | Build a Merkle DAG from a Tree Calculus term.
|
||||
buildMerkle :: T -> Node
|
||||
buildMerkle Leaf = NLeaf
|
||||
buildMerkle (Stem t) = NStem (nodeHash child)
|
||||
where child = buildMerkle t
|
||||
buildMerkle (Fork l r) = NFork (nodeHash left) (nodeHash right)
|
||||
where
|
||||
left = buildMerkle l
|
||||
right = buildMerkle r
|
||||
|
||||
-- Tree Calculus Reduction Rules
|
||||
{-
|
||||
The t operator is left associative.
|
||||
|
||||
@@ -39,6 +39,7 @@ executable tricu
|
||||
, fsnotify
|
||||
, haskeline
|
||||
, megaparsec
|
||||
, memory
|
||||
, mtl
|
||||
, sqlite-simple
|
||||
, tasty
|
||||
@@ -82,6 +83,7 @@ test-suite tricu-tests
|
||||
, fsnotify
|
||||
, haskeline
|
||||
, megaparsec
|
||||
, memory
|
||||
, mtl
|
||||
, sqlite-simple
|
||||
, tasty
|
||||
|
||||
Reference in New Issue
Block a user