Files
tricu/AGENTS.md

317 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS.md - tricu Project Guide
> For AI agents and contributors working in this repository.
## 1. Build & Test
```bash
# Full build + tests
nix build .#
```
### ⚠️ Never call `cabal` directly
> **Rule of thumb:** if it builds, links, or tests, it goes through `nix`.
## 2. Project Overview
**tricu** (pronounced "tree-shoe") is a programming-language experiment written in Haskell. It implements [Triage Calculus](https://olydis.medium.com/a-visual-introduction-to-tree-calculus-2f4a34ceffc2), an extension of Barry Jay's Tree Calculus, with lambda-abstraction sugar that gets eliminated back to pure tree calculus terms.
### Core types (in `src/Research.hs`)
| Type | Description |
|------|-------------|
| `T = Leaf \| Stem T \| Fork T T` | Tree Calculus term (the runtime value) |
| `TricuAST` | Parsed AST with `SDef`, `SApp`, `SLambda`, etc. |
| `LToken` | Lexer tokens |
| `Node` / `MerkleHash` | Content-addressed Merkle DAG nodes |
### Source modules
| Module | Purpose |
|--------|---------|
| `Main.hs` | CLI entry point (`cmdargs`), three modes: `repl`, `eval`, `decode` |
| `Eval.hs` | Interpreter: `evalTricu`, `result`, `evalSingle` |
| `Parser.hs` | Megaparsec parser → `TricuAST` |
| `Lexer.hs` | Megaparsec lexer → `LToken` |
| `FileEval.hs` | File loading, module imports, `!import` |
| `REPL.hs` | Interactive Read-Eval-Print Loop (haskeline) |
| `Research.hs` | Core types, `apply` reduction, booleans, marshalling (`ofString`, `ofNumber`), output formatters (`toAscii`, `toTernaryString`, `decodeResult`) |
| `ContentStore.hs` | SQLite-backed term persistence |
| `Wire.hs` | Arborix portable wire format — encode/decode/import/export of Merkle-DAG bundle blobs |
### File extensions
- `.hs` - Haskell source
- `.tri` - tricu language source (used in `lib/`, `test/`, `demos/`)
## 3. Test Suite
Tests live in `test/Spec.hs` and use **Tasty** + **HUnit**.
```bash
nix flake check # or: nix build .#test
```
### Test groups
| Group | What it covers |
|-------|----------------|
| `lexer` | Megaparsec lexer - identifiers, keywords, strings, escapes, invalid tokens |
| `parser` | Parser - defs, lambda, applications, lists, comments, parentheses |
| `simpleEvaluation` | Core `apply` reduction rules, variable substitution, immutability |
| `lambdas` | Lambda elimination, SKI calculus, higher-order functions, currying, shadowing, free vars |
| `providedLibraries` | `lib/list.tri` - triage, booleans, list ops (`head`, `tail`, `map`, `emptyList?`, `append`, `equal?`) |
| `fileEval` | Loading `.tri` files, multi-file context, decode |
| `modules` | `!import`, cyclic deps, namespacing, multi-level imports, unresolved vars, local namespaces |
| `demos` | `demos/*.tri` - structural equality, `toSource`, `size`, level-order traversal |
| `decoding` | `decodeResult` - Leaf, numbers, strings, lists, mixed |
| `elimLambdaSingle` | Lambda elimination: eta reduction, SDef binding, semantics preservation |
| `stressElimLambda` | Lambda elimination stress test: 200 vars, 800-body curried lambda |
### Suggesting tests
You do not write or modify tests. The user writes tests to constrain your outputs. You must adhere your code to tests or suggest modifications to tests.
If the user gives you explicit permission to implement a test you may proceed.
## 4. tricu Language Quick Reference
```
t → Leaf (the base term)
t t → Stem Leaf
t t t → Fork Leaf Leaf
x = t → Define term x = Leaf
id = (a : a) → Lambda identity (eliminates to tree calculus)
head (map f xs) → From lib/list.tri
!import "./path.tri" NS → Import file under namespace
-- line comment
```
## 5. Output Formats
The `eval` command accepts `--form` (shorthand `-t`):
| Format | Value | Description |
|--------|-------|-------------|
| `tree` | `TreeCalculus` | Simple `t` form (default) |
| `fsl` | `FSL` | Full show representation |
| `ast` | `AST` | Parsed AST representation |
| `ternary` | `Ternary` | Ternary string encoding |
| `ascii` | `Ascii` | ASCII-art tree diagram |
| `decode` | `Decode` | Human-readable (strings, numbers, lists) |
## 6. Content Addressing
Each `T` term is content-addressed via a Merkle DAG:
```
NLeaf → 0x00
NStem(h) → 0x01 || h (32 bytes)
NFork(l,r) → 0x02 || l (32 bytes) || r (32 bytes)
hash = SHA256("arborix.merkle.node.v1" <> 0x00 <> serialized_node)
```
This is stored in SQLite via `ContentStore.hs`. Hash suffixes on identifiers (e.g., `foo_abc123...`) are validated: 1664 hex characters (SHA256).
## 7. Arborix Portable Wire Format
The **Arborix wire format** (module `Wire.hs`) defines a portable binary bundle for exchanging Tree Calculus terms, their Merkle DAGs, and associated metadata. It is versioned and schema-driven.
### Header
```
+------------------+-----------------+------------------+----------------+
| Magic (8 bytes) | Major (2 bytes) | Minor (2 bytes) | Section Count |
| | | | (4 bytes) |
+------------------+-----------------+------------------+----------------+
| Flags (8 bytes) | Dir Offset (8 bytes)
+------------------+-----------------+------------------+
```
- **Magic**: `ARBORIX\0` (`0x41 0x52 0x42 0x4f 0x52 0x49 0x58 0x00`)
- **Header length**: 32 bytes
- **Major version**: `1` | **Minor version**: `0`
### Section Directory
Immediately follows the header. Each section entry is 60 bytes:
```
+------------------+------------------+-----------------+------------------+
| Type (4 bytes) | Version (2 bytes)| Flags (2 bytes) | Compression (2) |
+------------------+------------------+-----------------+------------------+
| Digest Algo (2) | Offset (8 bytes) | Length (8 bytes)| SHA256 digest (32)|
+------------------+------------------+-----------------+------------------+
```
Known section types:
| Type | Name | Required | Description |
|------|-----------|----------|-------------|
| 1 | manifest | Yes | JSON manifest metadata |
| 2 | nodes | Yes | Binary Merkle node payloads |
### Section 1 — Manifest (JSON)
The manifest describes the bundle's semantics, exports, and schema. Key fields:
| Field | Value | Description |
|-------|-------|-------------|
| `schema` | `"arborix.bundle.manifest.v1"` | Manifest schema version |
| `bundleType` | `"tree-calculus-executable-object"` | Bundle category |
| `tree.calculus` | `"tree-calculus.v1"` | Tree calculus version |
| `tree.nodeHash.algorithm` | `"sha256"` | Hash algorithm |
| `tree.nodeHash.domain` | `"arborix.merkle.node.v1"` | Hash domain string |
| `tree.nodePayload` | `"arborix.merkle.payload.v1"` | Payload encoding |
| `runtime.semantics` | `"tree-calculus.v1"` | Evaluation semantics |
| `runtime.abi` | `"arborix.abi.tree.v1"` | Runtime ABI |
| `closure` | `"complete"` | Bundle must be a complete DAG |
| `roots` | `[{"hash": "...", "role": "..."}]` | Named root hashes |
| `exports` | `[{"name": "...", "root": "..."}]` | Export aliases for roots |
| `metadata.createdBy` | `"arborix"` | Originator |
### Section 2 — Nodes (Binary)
```
+------------------+-------------------+-------------------+-----------------+
| Node Count (8) | Hash (32 bytes) | Payload Len (4) | Payload (N) |
+------------------+-------------------+-------------------+-----------------+
```
Each node entry contains:
- 32-byte Merkle hash (hex-encoded in identifiers, raw in binary)
- 4-byte big-endian payload length
- N bytes of serialized node payload (`0x00` for Leaf, `0x01 || hash` for Stem, `0x02 || left || right` for Fork)
### Bundle verification flow
1. Check magic bytes
2. Validate major version
3. Parse section directory
4. For each section: verify SHA256 digest against actual bytes
5. Decode JSON manifest
6. Decode binary node entries into Merkle DAG
7. Verify all root hashes present in manifest exist in node map
8. Verify export root hashes present
9. Verify children references are complete (no dangling nodes)
10. Reject unknown critical sections
### Data types (Wire.hs)
| Type | Purpose |
|------|---------|
| `Bundle` | Top-level bundle: version, roots, nodes map, manifest |
| `BundleManifest` | JSON metadata: schema, tree spec, runtime spec, roots, exports |
| `TreeSpec` | Tree calculus version + hash algorithm + payload encoding |
| `NodeHashSpec` | Hash algorithm and domain string |
| `RuntimeSpec` | Semantics, evaluation order, ABI, capabilities |
| `BundleRoot` | Root hash + role (`"default"` or `"root"`) |
| `BundleExport` | Export name + root hash + kind + ABI |
| `BundleMetadata` | Optional package, version, description, license, createdBy |
| `ClosureMode` | `ClosureComplete` or `ClosurePartial` |
### Key functions
| Function | Signature | Purpose |
|----------|-----------|---------|
| `encodeBundle` | `Bundle → ByteString` | Serialize bundle to wire bytes |
| `decodeBundle` | `ByteString → Either String Bundle` | Parse wire bytes into Bundle |
| `verifyBundle` | `Bundle → Either String ()` | Validate DAG, manifest, roots |
| `collectReachableNodes` | `Connection → MerkleHash → IO [(MerkleHash, ByteString)]` | Traverse DAG from root |
| `exportBundle` | `Connection → [MerkleHash] → IO ByteString` | Build bundle from content store |
| `exportNamedBundle` | `Connection → [(Text, MerkleHash)] → IO ByteString` | Build with named roots |
| `importBundle` | `Connection → ByteString → IO [MerkleHash]` | Import bundle into content store |
## 8. Directory Layout
```
tricu/
├── flake.nix # Nix flake: packages, tests, devShell
├── tricu.cabal # Cabal package (used via callCabal2nix)
├── src/ # Haskell modules
│ ├── Main.hs
│ ├── Eval.hs
│ ├── Parser.hs
│ ├── Lexer.hs
│ ├── FileEval.hs
│ ├── REPL.hs
│ ├── Research.hs
│ ├── ContentStore.hs
│ └── Wire.hs # Arborix portable wire format
├── test/
│ ├── Spec.hs # Tasty + HUnit tests
│ ├── *.tri # tricu test programs
│ └── local-ns/ # Module namespace test files
├── lib/
│ ├── base.tri
│ ├── list.tri
│ └── patterns.tri
├── demos/
│ ├── equality.tri
│ ├── size.tri
│ ├── toSource.tri
│ ├── levelOrderTraversal.tri
│ └── patternMatching.tri
└── AGENTS.md # This file
```
## 9. JS Arborix Runtime
A JavaScript implementation of the Arborix portable bundle runtime lives in `ext/js/`.
It is a reference implementation — not a tricu source parser. It reads `.tri.bundle` files produced by the Haskell toolchain, verifies Merkle node hashes, reconstructs tree values, and reduces them.
From project root:
```bash
node ext/js/src/cli.js inspect test/fixtures/id.tri.bundle
node ext/js/src/cli.js run test/fixtures/true.tri.bundle
```
The JS runtime implements:
- Bundle binary format parsing (header, section directory, manifest, nodes)
- SHA-256 Merkle node hash verification against canonical payloads
- Closure verification (all child references present)
- Tree reconstruction from node DAG
- Core `apply` reduction rules
- Basic codecs (decodeResult)
- CLI: `inspect` and `run` commands
## 10. Content Store Workflow (Custom DB)
The content store location is controlled by the `TRICU_DB_PATH` environment variable. When set, `eval` mode automatically loads all stored terms into the initial environment, so you can call any previously imported/evaluated term by name.
```bash
# Use a local DB
export TRICU_DB_PATH=/tmp/tricu-local.db
# Import terms from the standard library
./result/bin/tricu import -f lib/list.tri
# Now use them in eval mode
echo "not? (t t)" | ./result/bin/tricu eval -t decode
# Output: t
echo "not? (t t t)" | ./result/bin/tricu eval -t decode
# Output: Stem Leaf
echo "equal? (t t) (t t t)" | ./result/bin/tricu eval -t decode
# Output: t
# Check what's in the store
./result/bin/tricu
t> !definitions
```
Without `TRICU_DB_PATH` set, `eval` uses only the terms defined in the input file(s).
## 11. Development Tips
- **REPL:** `nix run .#` starts the interactive tricu REPL.
- **Evaluate files:** `nix run .# -- eval -f demos/equality.tri`
- **GHC options:** `-threaded -rtsopts -with-rtsopts=-N` for parallel runtime. Use `-N` RTS flag for multi-core.
- **Upx** is in the devShell for binary compression if needed.