Files
tricu/docs/self-hosted-arboricx-host.md

14 KiB

Self-hosted Arboricx Host Prototype

This document describes how to build a minimal host-language shell that can execute Arboricx bundles through the self-hosted tricu Arboricx parser/executor.

The intended reader is an implementation agent building a first prototype in a host language such as PHP. The same approach should generalize to any language with a small Tree Calculus evaluator.

See also: docs/host-abi.md for the precise host-facing ABI value tags and typed runner contract.

Goal

Build a tiny host program that can:

  1. Represent Tree Calculus values.
  2. Reduce/evaluate Tree Calculus terms.
  3. Load or embed the tricu Arboricx runtime kernel.
  4. Read an application .arboricx bundle from disk.
  5. Convert host inputs into canonical Tree Calculus values.
  6. Apply the kernel to the application bundle and arguments.
  7. Unwrap a standardized host ABI result.
  8. Decode the host ABI payload back into host values.

A concrete target example:

-- Application bundle root is an unapplied function:
append "hello "

The host should be able to call that bundle with the host string "james" and receive:

hello james

With the Host ABI layer, the preferred conceptual call is:

runArboricxToString <applicationBundleBytes> ["james"]

This returns:

ok (hostString "hello james") rest

where runArboricxToString comes from the self-hosted Arboricx runtime kernel.

Architectural overview

There are two Arboricx bundles involved:

  1. Kernel bundle

    • Contains the self-hosted Arboricx parser/executor written in tricu.
    • Exposes ergonomic runtime entrypoints such as runArboricxArgs and Host ABI entrypoints such as runArboricxToString.
    • This can be hardcoded as a Tree Calculus value in the host, or loaded by a minimal host-side Arboricx parser.
  2. Application bundle

    • The bundle the user wants to execute.
    • Example: a bundle whose exported root is append "hello ", waiting for one more string argument.
    • The host reads this file as raw bytes and encodes those bytes as a Tree Calculus byte list.

The minimal host does not need to understand the application bundle format if the kernel is already available as a Tree Calculus value. The host only passes the application bundle bytes to the kernel.

Required host components

1. Tree representation

The host needs a representation for the three Tree Calculus constructors:

Leaf
Stem child
Fork left right

Use whatever is idiomatic for the host language. In PHP, for a prototype, simple classes or tagged arrays are sufficient.

Example shape:

abstract class T {}
final class Leaf extends T {}
final class Stem extends T { public T $child; }
final class Fork extends T { public T $left; public T $right; }

or tagged arrays:

['tag' => 'leaf']
['tag' => 'stem', 'child' => $t]
['tag' => 'fork', 'left' => $l, 'right' => $r]

The evaluator and codecs only need these three constructors.

2. Tree Calculus evaluator

The host must implement Tree Calculus reduction. This is the core VM.

The evaluator should use normal-order evaluation, matching the runtime semantics expected by Arboricx manifests:

runtimeEvaluation = "normal-order"

The evaluator only needs the Tree Calculus reduction rules. There is no parser requirement for the host prototype if terms are constructed directly as trees.

Implementation notes:

  • Evaluation must support application: a tree applied to another tree.
  • In this codebase, application is represented structurally as Fork function argument before reduction.
  • The evaluator repeatedly reduces until normal form or until a configured step/fuel limit is reached.
  • Add a fuel limit for the first prototype to avoid infinite reductions during debugging.

Reference implementation locations:

  • Haskell evaluator/reduction: src/Research.hs
  • JavaScript Arboricx runtime evaluator: ext/js/src/ if present in the checkout

Use those as references for exact reduction behavior.

3. Kernel availability

The host needs access to the self-hosted Arboricx runtime kernel as a Tree Calculus value.

There are two viable bootstrap strategies.

Strategy A: hardcode the kernel tree

For the first host prototype, this is recommended.

Workflow:

  1. Compile/export the tricu kernel entrypoint as an Arboricx bundle or tree value.
  2. Convert the selected exported kernel function into a host-language Tree Calculus literal.
  3. Commit/embed that literal in the host implementation.

Then the host does not need any Arboricx parser of its own for the kernel. It only needs Tree Calculus reduction.

Strategy B: bootstrap the kernel from an Arboricx bundle

Alternatively, the host can implement a minimal Arboricx parser just sufficient to load the kernel bundle.

This is more work up front, but avoids hardcoding a huge tree literal.

If using this strategy, the host-side parser needs to:

  1. Parse the Arboricx container.
  2. Parse enough manifest/export data to locate the desired kernel export.
  3. Parse node records.
  4. Reconstruct the selected root Tree Calculus value from the Merkle node DAG.

This logic is exactly what the tricu self-hosted kernel does, so the hardcoded-kernel path is simpler for early ports.

Kernel entrypoints

The ergonomic runtime API currently lives in lib/arboricx.tri.

Raw execution entrypoints

These return raw application results inside the existing ok / err result protocol:

readArboricxExecutableByName nameBytes bundleBytes
readArboricxExecutable bundleBytes
runArboricxByName nameBytes bundleBytes arg
runArboricx bundleBytes arg
runArboricxArgsByName nameBytes bundleBytes args
runArboricxArgs bundleBytes args

runArboricxArgs accepts:

  1. Raw application bundle bytes as a Tree Calculus byte list.
  2. A Tree Calculus list of arguments.

For named exports, use runArboricxArgsByName, which accepts:

  1. Export name as bytes.
  2. Application bundle bytes as bytes.
  3. Argument list.

Host ABI typed entrypoints

For host-language ports, prefer the Host ABI typed runners. These wrap successful outputs in a tagged host ABI value so every host can decode the same envelope shape.

Default export variants:

runArboricxToTree bundleBytes args
runArboricxToString bundleBytes args
runArboricxToNumber bundleBytes args
runArboricxToBool bundleBytes args
runArboricxToList bundleBytes args
runArboricxToBytes bundleBytes args

Named export variants:

runArboricxByNameToTree nameBytes bundleBytes args
runArboricxByNameToString nameBytes bundleBytes args
runArboricxByNameToNumber nameBytes bundleBytes args
runArboricxByNameToBool nameBytes bundleBytes args
runArboricxByNameToList nameBytes bundleBytes args
runArboricxByNameToBytes nameBytes bundleBytes args

Recommended first host entrypoint for the append "hello " example:

runArboricxToString

Applying the kernel in the host evaluator

If the host has the Tree Calculus value for runArboricxToString, call it by constructing nested application trees.

In Tree Calculus application form:

((runArboricxToString bundleBytesTree) argsTree)

Structurally, if app(f, x) constructs Fork(f, x), then:

$expr = app(app($kernelRunArboricxToString, $bundleBytesTree), $argsTree);
$result = normalize($expr);

For named export execution:

(((runArboricxByNameToString nameBytesTree) bundleBytesTree) argsTree)

Structurally:

$expr = app(
    app(
        app($kernelRunArboricxByNameToString, $nameBytesTree),
        $bundleBytesTree
    ),
    $argsTree
);
$result = normalize($expr);

Result convention and Host ABI envelope

All runtime APIs return the existing tricu ok / err convention from lib/binary.tri:

ok value rest = pair true (pair value rest)
err code rest = pair false (pair code rest)

The host should always unwrap this outer result first.

Raw runners

Raw runners such as runArboricxArgs return:

ok rawApplicationValue rest

The host must know how to interpret rawApplicationValue.

Host ABI typed runners

Typed runners such as runArboricxToString return:

ok hostAbiValue rest

A host ABI value has shape:

pair tag payload

The payload is still the canonical/raw Tree Calculus representation for that type.

Initial tags are specified in docs/host-abi.md:

hostTreeTag   = 0
hostStringTag = 1
hostNumberTag = 2
hostBoolTag   = 3
hostListTag   = 4
hostBytesTag  = 5

For example:

runArboricxToString bundleBytes ["james"]

returns:

ok (hostString "hello james") rest

which is structurally:

ok (pair hostStringTag "hello james") rest

Error shape

Expected error shape:

err code rest

The error code is a Tree Calculus number. Error constants are defined in:

  • lib/binary.tri
  • lib/arboricx-common.tri
  • lib/arboricx.tri for Host ABI codec errors, currently errHostCodecFailed = 14

Typed runners return errHostCodecFailed if the application result cannot be interpreted as the requested type.

A prototype host can report the numeric error code and optionally dump a compact representation of rest.

Example execution flow

Suppose the application bundle exports this root:

append "hello "

The bundle root is an unapplied function waiting for one more string argument.

Host flow:

  1. Load kernel entrypoint tree:

    $runArboricxToString = loadHardcodedKernelEntrypoint('runArboricxToString');
    
  2. Read application bundle bytes:

    $bytes = file_get_contents('append-hello.arboricx');
    
  3. Encode bundle bytes as a Tree Calculus byte list:

    $bundleBytesTree = encodeBytes($bytes);
    
  4. Encode host argument(s):

    $arg = encodeString('james');
    $args = encodeList([$arg]);
    
  5. Build application expression:

    $expr = app(app($runArboricxToString, $bundleBytesTree), $args);
    
  6. Evaluate:

    $result = normalize($expr);
    
  7. Unwrap ok result:

    [$ok, $hostValue, $rest] = unwrapResult($result);
    if (!$ok) { throw new RuntimeException('Arboricx error'); }
    
  8. Unwrap Host ABI envelope:

    [$tag, $payload] = unwrapHostValue($hostValue);
    if ($tag !== HOST_STRING_TAG) { throw new RuntimeException('Expected string'); }
    
  9. Decode the payload:

    echo decodeString($payload); // hello james
    

What the kernel does internally

runArboricxToString performs the following steps inside Tree Calculus:

  1. Parse and validate the raw Arboricx bundle bytes.
  2. Parse the manifest.
  3. Select the default export:
    • use export named main if present,
    • otherwise use the sole export if exactly one exists,
    • otherwise return an error.
  4. Read the nodes section.
  5. Reconstruct the selected root tree from the Merkle DAG.
  6. Apply each host-provided argument in order.
  7. Validate that the raw result is string-like.
  8. Return ok (hostString result) rest, or an err.

runArboricxByNameToString is identical except that it selects a named export.

Other typed runners follow the same pattern for their requested output type.

Tests proving the expected behavior

The relevant Haskell tests are in test/Spec.hs under manifestReadingTests.

Important cases:

  • readArboricxExecutable: reconstructs default export tree
  • readArboricxExecutableByName: selects named export
  • runArboricx: applies host-provided argument to default export
  • runArboricxArgs: applies host-provided argument list in order
  • host ABI: constructors expose tag and payload
  • runArboricxToTree: wraps raw result as hostTree
  • runArboricxToString: wraps string result as hostString
  • runArboricxToNumber: wraps number result as hostNumber
  • runArboricxToBool: rejects non-bool result

These tests demonstrate the host-shell contract:

  • application bundle bytes are supplied as a Tree Calculus byte list,
  • host arguments are supplied as canonical Tree Calculus values,
  • execution returns an outer result-wrapped value,
  • Host ABI typed runners return a tagged ABI envelope inside ok.

Minimal PHP prototype checklist

A PHP prototype should implement:

  • Tree data constructors: Leaf, Stem, Fork.
  • Application helper: app($f, $x) = Fork($f, $x).
  • Normal-order Tree Calculus reducer.
  • Fuel/step limit for debugging.
  • Hardcoded kernel entrypoint tree for runArboricxToString for the first string-output prototype.
  • Encode application bundle file bytes into a Tree Calculus byte list.
  • Encode host argument values into Tree Calculus values.
  • Build expression: ((runArboricxToString bundleBytes) args).
  • Normalize expression.
  • Unwrap outer ok / err result.
  • Unwrap Host ABI pair tag payload envelope.
  • Decode payload according to tag.

For exact codec details, reference the Haskell implementation in src/Research.hs and the existing JS runtime if available.

Current recommendation

For the first PHP implementation:

  1. Hardcode only the runArboricxToString kernel entrypoint as a Tree Calculus value.
  2. Do not implement host-side Arboricx parsing yet.
  3. Implement only enough codecs for:
    • bytes,
    • strings,
    • lists,
    • result unwrapping,
    • Host ABI envelope unwrapping.
  4. Use one test fixture: an Arboricx bundle whose root is append "hello ".
  5. Assert that calling it with "james" returns an outer ok, then a hostString, then payload "hello james".

Once that works, add named export support via runArboricxByNameToString and expand Host ABI tags/codecs as needed.