diff --git a/docs/self-hosted-arboricx-host.md b/docs/self-hosted-arboricx-host.md new file mode 100644 index 0000000..352246c --- /dev/null +++ b/docs/self-hosted-arboricx-host.md @@ -0,0 +1,384 @@ +# Self-hosted Arboricx Host Prototype + +This document describes how to build a minimal host-language shell that can execute Arboricx bundles through the self-hosted tricu Arboricx parser/executor. + +The intended reader is an implementation agent building a first prototype in a host language such as PHP. The same approach should generalize to any language with a small Tree Calculus evaluator. + +## Goal + +Build a tiny host program that can: + +1. Represent Tree Calculus values. +2. Reduce/evaluate Tree Calculus terms. +3. Load or embed the tricu Arboricx runtime kernel. +4. Read an application `.arboricx` bundle from disk. +5. Convert host inputs into canonical Tree Calculus values. +6. Apply the kernel to the application bundle and arguments. +7. Decode the result back into host values. + +A concrete target example: + +```tricu +-- Application bundle root is an unapplied function: +append "hello " +``` + +The host should be able to call that bundle with the host string `"james"` and receive: + +```text +hello james +``` + +Conceptually the host evaluates: + +```tricu +runArboricxArgs ["james"] +``` + +where `runArboricxArgs` comes from the self-hosted Arboricx runtime kernel. + +## Architectural overview + +There are two Arboricx bundles involved: + +1. **Kernel bundle** + - Contains the self-hosted Arboricx parser/executor written in tricu. + - Exposes ergonomic runtime entrypoints such as `runArboricxArgs`. + - This can be hardcoded as a Tree Calculus value in the host, or loaded by a minimal host-side Arboricx parser. + +2. **Application bundle** + - The bundle the user wants to execute. + - Example: a bundle whose exported root is `append "hello "`, waiting for one more string argument. + - The host reads this file as raw bytes and encodes those bytes as a Tree Calculus byte list. + +The minimal host does **not** need to understand the application bundle format if the kernel is already available as a Tree Calculus value. The host only passes the application bundle bytes to the kernel. + +## Required host components + +### 1. Tree representation + +The host needs a representation for the three Tree Calculus constructors: + +```text +Leaf +Stem child +Fork left right +``` + +Use whatever is idiomatic for the host language. In PHP, for a prototype, simple classes or tagged arrays are sufficient. + +Example shape: + +```php +abstract class T {} +final class Leaf extends T {} +final class Stem extends T { public T $child; } +final class Fork extends T { public T $left; public T $right; } +``` + +or tagged arrays: + +```php +['tag' => 'leaf'] +['tag' => 'stem', 'child' => $t] +['tag' => 'fork', 'left' => $l, 'right' => $r] +``` + +The evaluator and codecs only need these three constructors. + +### 2. Tree Calculus evaluator + +The host must implement Tree Calculus reduction. This is the core VM. + +The evaluator should use normal-order evaluation, matching the runtime semantics expected by Arboricx manifests: + +```text +runtimeEvaluation = "normal-order" +``` + +The evaluator only needs the Tree Calculus reduction rules. There is no parser requirement for the host prototype if terms are constructed directly as trees. + +Implementation notes: + +- Evaluation must support application: a tree applied to another tree. +- In this codebase, application is represented structurally as `Fork function argument` before reduction. +- The evaluator repeatedly reduces until normal form or until a configured step/fuel limit is reached. +- Add a fuel limit for the first prototype to avoid infinite reductions during debugging. + +Reference implementation locations: + +- Haskell evaluator/reduction: `src/Research.hs` +- JavaScript Arboricx runtime evaluator: `ext/js/src/` if present in the checkout + +Use those as references for exact reduction behavior. + +### 3. Kernel availability + +The host needs access to the self-hosted Arboricx runtime kernel as a Tree Calculus value. + +There are two viable bootstrap strategies. + +#### Strategy A: hardcode the kernel tree + +For the first host prototype, this is recommended. + +Workflow: + +1. Compile/export the tricu kernel entrypoint as an Arboricx bundle or tree value. +2. Convert the selected exported kernel function into a host-language Tree Calculus literal. +3. Commit/embed that literal in the host implementation. + +Then the host does not need any Arboricx parser of its own for the kernel. It only needs Tree Calculus reduction. + +#### Strategy B: bootstrap the kernel from an Arboricx bundle + +Alternatively, the host can implement a minimal Arboricx parser just sufficient to load the kernel bundle. + +This is more work up front, but avoids hardcoding a huge tree literal. + +If using this strategy, the host-side parser needs to: + +1. Parse the Arboricx container. +2. Parse enough manifest/export data to locate the desired kernel export. +3. Parse node records. +4. Reconstruct the selected root Tree Calculus value from the Merkle node DAG. + +This logic is exactly what the tricu self-hosted kernel does, so the hardcoded-kernel path is simpler for early ports. + +## Kernel entrypoints + +The ergonomic runtime API currently lives in `lib/arboricx.tri`. + +Primary entrypoints: + +```tricu +readArboricxExecutableByName nameBytes bundleBytes +readArboricxExecutable bundleBytes +runArboricxByName nameBytes bundleBytes arg +runArboricx bundleBytes arg +runArboricxArgsByName nameBytes bundleBytes args +runArboricxArgs bundleBytes args +``` + +Recommended host entrypoint: + +```tricu +runArboricxArgs +``` + +It accepts: + +1. Raw application bundle bytes as a Tree Calculus byte list. +2. A Tree Calculus list of arguments. + +It returns a result-wrapped value. + +For named exports, use: + +```tricu +runArboricxArgsByName +``` + +It accepts: + +1. Export name as bytes. +2. Application bundle bytes as bytes. +3. Argument list. + +### Applying the kernel in the host evaluator + +If the host has the Tree Calculus value for `runArboricxArgs`, call it by constructing nested application trees. + +In Tree Calculus application form: + +```text +((runArboricxArgs bundleBytesTree) argsTree) +``` + +Structurally, if `app(f, x)` constructs `Fork(f, x)`, then: + +```php +$expr = app(app($kernelRunArboricxArgs, $bundleBytesTree), $argsTree); +$result = normalize($expr); +``` + +For named export execution: + +```text +(((runArboricxArgsByName nameBytesTree) bundleBytesTree) argsTree) +``` + +Structurally: + +```php +$expr = app( + app( + app($kernelRunArboricxArgsByName, $nameBytesTree), + $bundleBytesTree + ), + $argsTree +); +$result = normalize($expr); +``` + +## Result convention + +The runtime API returns results using the tricu `ok` / `err` convention from `lib/binary.tri`: + +```tricu +ok value rest = pair true (pair value rest) +err code rest = pair false (pair code rest) +``` + +The host should unwrap this result before decoding the final value. + +Expected success shape: + +```tricu +ok value rest +``` + +For typical execution, `value` is the application result. `rest` is usually not important to the host shell unless debugging parser behavior. + +Expected error shape: + +```tricu +err code rest +``` + +The error code is a Tree Calculus number. Error constants are defined in: + +- `lib/binary.tri` +- `lib/arboricx-common.tri` + +A prototype host can simply report the numeric error code and optionally dump a compact representation of `rest`. + +## Example execution flow + +Suppose the application bundle exports this root: + +```tricu +append "hello " +``` + +The bundle root is an unapplied function waiting for one more string argument. + +Host flow: + +1. Load kernel entrypoint tree: + + ```php + $runArboricxArgs = loadHardcodedKernelEntrypoint('runArboricxArgs'); + ``` + +2. Read application bundle bytes: + + ```php + $bytes = file_get_contents('append-hello.arboricx'); + ``` + +3. Encode bundle bytes as a Tree Calculus byte list: + + ```php + $bundleBytesTree = encodeBytes($bytes); + ``` + +4. Encode host argument(s): + + ```php + $arg = encodeString('james'); + $args = encodeList([$arg]); + ``` + +5. Build application expression: + + ```php + $expr = app(app($runArboricxArgs, $bundleBytesTree), $args); + ``` + +6. Evaluate: + + ```php + $result = normalize($expr); + ``` + +7. Unwrap `ok` result: + + ```php + [$ok, $value, $rest] = unwrapResult($result); + if (!$ok) { throw new RuntimeException('Arboricx error'); } + ``` + +8. Decode the value: + + ```php + echo decodeString($value); // hello james + ``` + +## What the kernel does internally + +`runArboricxArgs` performs the following steps inside Tree Calculus: + +1. Parse and validate the raw Arboricx bundle bytes. +2. Parse the manifest. +3. Select the default export: + - use export named `main` if present, + - otherwise use the sole export if exactly one exists, + - otherwise return an error. +4. Read the nodes section. +5. Reconstruct the selected root tree from the Merkle DAG. +6. Apply each host-provided argument in order. +7. Return `ok result rest` or an `err`. + +`runArboricxArgsByName` is identical except that it selects a named export. + +## Tests proving the expected behavior + +The relevant Haskell tests are in `test/Spec.hs` under `manifestReadingTests`. + +Important cases: + +- `readArboricxExecutable: reconstructs default export tree` +- `readArboricxExecutableByName: selects named export` +- `runArboricx: applies host-provided argument to default export` +- `runArboricxArgs: applies host-provided argument list in order` + +These tests demonstrate the host-shell contract: + +- application bundle bytes are supplied as a Tree Calculus byte list, +- host arguments are supplied as canonical Tree Calculus values, +- execution returns a result-wrapped Tree Calculus value. + +## Minimal PHP prototype checklist + +A PHP prototype should implement: + +- [ ] Tree data constructors: `Leaf`, `Stem`, `Fork`. +- [ ] Application helper: `app($f, $x) = Fork($f, $x)`. +- [ ] Normal-order Tree Calculus reducer. +- [ ] Fuel/step limit for debugging. +- [ ] Hardcoded kernel entrypoint tree for `runArboricxArgs`. +- [ ] Encode application bundle file bytes into a Tree Calculus byte list. +- [ ] Encode host argument values into Tree Calculus values. +- [ ] Build expression: `((runArboricxArgs bundleBytes) args)`. +- [ ] Normalize expression. +- [ ] Unwrap `ok` / `err` result. +- [ ] Decode result value into host type. + +For exact codec details, reference the Haskell implementation in `src/Research.hs` and the existing JS runtime if available. + +## Current recommendation + +For the first PHP implementation: + +1. Hardcode only the `runArboricxArgs` kernel entrypoint as a Tree Calculus value. +2. Do not implement host-side Arboricx parsing yet. +3. Implement only enough codecs for: + - bytes, + - strings, + - lists, + - result unwrapping. +4. Use one test fixture: an Arboricx bundle whose root is `append "hello "`. +5. Assert that calling it with `"james"` returns `"hello james"`. + +Once that works, add named export support via `runArboricxArgsByName` and expand codecs as needed. diff --git a/lib/arboricx-manifest.tri b/lib/arboricx-manifest.tri index 09a375e..5fd30e5 100644 --- a/lib/arboricx-manifest.tri +++ b/lib/arboricx-manifest.tri @@ -243,35 +243,39 @@ getExportNames_ = y (self acc exports : getExportNames = (exports : getExportNames_ t exports) --- Select an export: prefer explicit name, then "main", then single, then error. -selectExport_ = y (self exports name nameBytes : - matchBool - -- Explicit name given - (matchBool - nothing - (err errMissingSection t) - (_ _ : nothing) - (findExportByName exports nameBytes)) - -- No explicit name: try "main" - (matchBool - nothing - (matchBool - (equal? (length exports) 1) - (ok (head exports) t) +mainExportName = "main" + +maybeExportToResult = (maybeExport : + triage + (err errMissingSection t) + (export : ok export t) + (_ _ : err errMissingSection t) + maybeExport) + +selectSingleExport = (exports : + matchList + (err errMissingSection t) + (export rest : + matchBool + (ok export t) (err errMissingSection t) - (bytesEq? (exportName (head exports)) nameBytes)) - (_ _ : nothing) - (findExportByName exports nameBytes)) - -- Single export: auto-select - (matchBool - (equal? (length exports) 1) - (ok (head exports) t) - (err errMissingSection t) - (emptyList? exports)) + (emptyList? rest)) exports) +selectDefaultExport = (exports : + triage + (selectSingleExport exports) + (export : ok export t) + (_ _ : err errMissingSection t) + (findExportByName exports mainExportName)) + +-- Select an export: explicit name if provided, otherwise "main", otherwise +-- the sole export if the bundle has exactly one export. selectExport = (exports nameBytes : - selectExport_ exports nameBytes nameBytes) + matchBool + (selectDefaultExport exports) + (maybeExportToResult (findExportByName exports nameBytes)) + (emptyList? nameBytes)) selectExportOpt = (exports optNameBytes : selectExport exports optNameBytes) @@ -304,7 +308,7 @@ manifestRuntimeAbi = (core : pairFirst (pairSecond (pairSecond (pairSecond (pair manifestCapabilities = (core : pairFirst (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond core)))))))))) manifestClosureByte = (core : pairFirst (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond core))))))))))) manifestRoots = (core : pairFirst (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond core)))))))))))) -manifestExports = (core : pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond core))))))))))) +manifestExports = (core : pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond (pairSecond core)))))))))))) -- Helper: compare a manifest field against an expected byte string. manifestFieldMatch? = (actual expected : bytesEq? actual expected) diff --git a/lib/arboricx.tri b/lib/arboricx.tri index 8c459b6..4fadc3b 100644 --- a/lib/arboricx.tri +++ b/lib/arboricx.tri @@ -16,3 +16,38 @@ readArboricxBundle = (bs : (validCore _ : ok (pair validCore metadataWithExtensions) afterContainer)) parsedManifest)) sections)) + +-- Select an export from a validated bundle and reconstruct its root tree. +-- Returns ok executable afterContainer, or propagates parse/selection/node errors. +readArboricxExecutableByName = (nameBytes bs : + bindResult (readArboricxBundle bs) + (bundleResult afterBundle : + matchPair + (validCore _ : + bindResult (selectExport (manifestExports validCore) nameBytes) + (selectedExport _ : + readArboricxTreeFromHash (exportRoot selectedExport) bs)) + bundleResult)) + +readArboricxExecutable = (bs : + readArboricxExecutableByName [] bs) + +applyArgs = (f args : + foldl + (acc arg : acc arg) + f + args) + +runArboricxByName = (nameBytes bs arg : + bindResult (readArboricxExecutableByName nameBytes bs) + (executable rest : ok (executable arg) rest)) + +runArboricx = (bs arg : + runArboricxByName [] bs arg) + +runArboricxArgsByName = (nameBytes bs args : + bindResult (readArboricxExecutableByName nameBytes bs) + (executable rest : ok (applyArgs executable args) rest)) + +runArboricxArgs = (bs args : + runArboricxArgsByName [] bs args) diff --git a/test/Spec.hs b/test/Spec.hs index c9be3af..a59b971 100644 --- a/test/Spec.hs +++ b/test/Spec.hs @@ -2734,4 +2734,65 @@ manifestReadingTests = testGroup "Manifest Reading Tests" let env = evalTricu library (parseTricu input) let algoT = result env toString algoT @?= Right "sha256" + + , testCase "readArboricxExecutable: reconstructs default export tree" $ do + (srcConn, termHash, originalTerm) <- storeTermInTempDB $ unlines + [ "main = t t" ] + wireData <- exportBundle srcConn [termHash] + let input = "matchResult " + ++ " (code rest : err code rest) " + ++ " (tree rest : ok tree []) " + ++ " (readArboricxExecutable " ++ bytesExpr (map toInteger $ BS.unpack wireData) ++ ")" + library <- evaluateFile "./lib/arboricx.tri" + let env = evalTricu library (parseTricu input) + result env @?= okT originalTerm (bytesT []) + close srcConn + + , testCase "readArboricxExecutableByName: selects named export" $ do + srcConn <- newContentStore + let parsed = parseTricu $ unlines + [ "leaf = t" + , "stem = t t" + , "main = stem" + ] + env = evalTricu Map.empty parsed + leafTerm = maybe (error "leaf missing") id (Map.lookup "leaf" env) + stemTerm = maybe (error "stem missing") id (Map.lookup "stem" env) + leafHash <- storeTerm srcConn ["leaf"] leafTerm + stemHash <- storeTerm srcConn ["stem"] stemTerm + wireData <- exportNamedBundle srcConn [("leaf", leafHash), ("stem", stemHash)] + let input = "matchResult " + ++ " (code rest : err code rest) " + ++ " (tree rest : ok tree []) " + ++ " (readArboricxExecutableByName " ++ bytesExpr (map (fromIntegral . fromEnum) "stem") ++ " " ++ bytesExpr (map toInteger $ BS.unpack wireData) ++ ")" + library <- evaluateFile "./lib/arboricx.tri" + let resultEnv = evalTricu library (parseTricu input) + result resultEnv @?= okT stemTerm (bytesT []) + close srcConn + + , testCase "runArboricx: applies host-provided argument to default export" $ do + (srcConn, termHash, _) <- storeTermInTempDB $ unlines + [ "main = (x : x)" ] + wireData <- exportBundle srcConn [termHash] + let input = "matchResult " + ++ " (code rest : err code rest) " + ++ " (value rest : value) " + ++ " (runArboricx " ++ bytesExpr (map toInteger $ BS.unpack wireData) ++ " \"hello\")" + library <- evaluateFile "./lib/arboricx.tri" + let env = evalTricu library (parseTricu input) + toString (result env) @?= Right "hello" + close srcConn + + , testCase "runArboricxArgs: applies host-provided argument list in order" $ do + (srcConn, termHash, _) <- storeTermInTempDB $ unlines + [ "main = (x y : x)" ] + wireData <- exportBundle srcConn [termHash] + let input = "matchResult " + ++ " (code rest : err code rest) " + ++ " (value rest : value) " + ++ " (runArboricxArgs " ++ bytesExpr (map toInteger $ BS.unpack wireData) ++ " [(\"left\") (\"right\")])" + library <- evaluateFile "./lib/arboricx.tri" + let env = evalTricu library (parseTricu input) + toString (result env) @?= Right "left" + close srcConn ]