Files
tricu/notes/recursive-consumers.md
James Eversole e8ab61dbaa Data-first recursive consumers in readBytes
Reorder recursive byte-stream consumers so the consumed input is inspected
before loop-control arguments can drive evaluation. Previously, partially
applying `readBytes` to a known count, such as `readBytes 2`, allowed the
evaluator to specialize the recursive worker using known counter values
while the byte stream was still abstract. This caused symbolic recursion
over unknown input and produced an enormous normal form.

The recursive worker now takes the byte stream first and immediately
case-analyzes it. As a result, partial application blocks at the input
boundary instead of unrolling the counter loop.

This preserves the fully-applied behavior of `readBytes`, while making partial
application such as `readBytes 2` normalize safely.
2026-05-07 10:07:43 -05:00

268 lines
5.1 KiB
Markdown

# Recursive Consumer Argument Order
## Core issue
Partial application is generally fine in tricu. The problem appears with recursive consumer functions when loop-control arguments are known before the consumed data is available.
The concrete case was `readBytes`.
This worked:
```tricu
(readBytes 2) [(1) (2) (3)]
```
This used to explode in space:
```tricu
readBytes 2
```
At first this looked like a general partial-application problem, but it was not. Other partial applications, such as partially applying `map`, normalized safely. The issue was the argument order and recursive shape of `readBytes_`.
## What went wrong
The original worker had loop-control arguments before the byte stream:
```tricu
readBytes_ = y (self n i bs original acc : ...)
readBytes = (n bs : readBytes_ n 0 bs bs t)
```
After partially applying:
```tricu
readBytes 2
```
the evaluator knew:
```text
n = 2
i = 0
```
but did not know:
```text
bs
original
acc
```
Because the counter values were known, the evaluator could reduce checks like:
```tricu
equal? i n
```
and begin unrolling recursion symbolically before the byte stream existed. That produced a large residual tree describing possible stream cases, rests, and accumulated values.
The bug was not recursion itself. The bug was allowing counters to drive recursion while the consumed structure was still abstract.
## Why `map`-style partial application is safe
A partially-applied list consumer such as:
```tricu
map (i : append i " world!")
```
is safe because recursion is blocked on the missing list argument. The function cannot recurse until it sees whether the list is empty or a cons cell.
Safe shape:
```text
waiting for input
recursion blocked until input is supplied
```
Unsafe shape:
```text
waiting for input
known counters still allow symbolic recursion
```
## Fix
Put the consumed data first in the recursive worker and make the first major operation inspect that data.
Corrected shape:
```tricu
readBytes_ = y (self bs n i original acc :
matchList
(matchBool
(ok (reverse acc) bs)
(err errUnexpectedEof original)
(equal? i n))
(h r :
matchBool
(ok (reverse acc) bs)
(self r n (succ i) original (pair h acc))
(equal? i n))
bs)
readBytes = (n bs : readBytes_ bs n 0 bs t)
```
Now:
```tricu
readBytes 2
```
becomes:
```tricu
bs : readBytes_ bs 2 0 bs t
```
Since `bs` is abstract and the worker immediately performs:
```tricu
matchList ... bs
```
evaluation blocks at the data boundary instead of unrolling the counter loop.
## General rule
For recursive consumers, the consumed structure should drive evaluation.
Prefer:
```tricu
worker = y (self input control state :
matchInput
baseCase
(piece rest : ... self rest control nextState ...)
input)
```
Avoid:
```tricu
worker = y (self control state input :
if controlDone
done
(... self nextControl nextState rest ...))
```
In practice:
```text
worker input control state
```
is safer than:
```text
worker control state input
```
## Accumulators
Be careful not to finalize or transform an abstract accumulator too early.
For example:
```tricu
ok (reverse acc) bs
```
is fine when reached after concrete input has driven the recursion, but it can become pathological if reached while `acc` is still abstract.
Guidelines:
- Accumulate cheaply during recursion.
- Finalize, reverse, or validate only after input has forced the function to a concrete success point.
- Do not let counters select a success branch while the accumulator is still abstract.
## Parser guidance
For byte or parser consumers, prefer streaming over global slicing of unknown input.
Prefer:
```tricu
read one byte
compare or accumulate
recurse on rest
```
Avoid relying on:
```tricu
taken = bytesTake n bs
rest = bytesDrop n bs
enough = bytesLength taken == n
```
The slice-based version may be correct on concrete input but can behave badly when partially applied over abstract input.
Streaming alone is not enough; the recursive worker must also be data-first.
## Checklist
When writing a recursive consumer, ask:
1. What structure is consumed?
2. What argument should block recursion when unknown?
3. Are counters available before the consumed structure?
4. Could partial application specialize the loop before data arrives?
5. Does any branch process an abstract accumulator or rest value?
6. Does the worker put consumed data before counters and state?
## Safe and unsafe examples
Safe:
```tricu
readU8
bs : readU8 bs
readBytes 2 [(1) (2) (3)]
(readBytes 2) [(1) (2) (3)]
map (i : append i " world!")
```
Previously unsafe before the data-first rewrite:
```tricu
readBytes 2
readBytes_ 2 0
```
## Implication for Arborix
Arborix parsers will include many recursive consumers:
- read N bytes
- read N section records
- scan records for an ID
- parse node records
- validate closures
These should use data-first recursive workers.
Avoid:
```tricu
readSectionRecords_ count index bs acc
```
Prefer:
```tricu
readSectionRecords_ bs count index acc
```
## Short rule
```text
Put consumed data first in recursive workers.
Let data shape drive recursion.
Do not let counters unroll over abstract input.
```