Reorder recursive byte-stream consumers so the consumed input is inspected before loop-control arguments can drive evaluation. Previously, partially applying `readBytes` to a known count, such as `readBytes 2`, allowed the evaluator to specialize the recursive worker using known counter values while the byte stream was still abstract. This caused symbolic recursion over unknown input and produced an enormous normal form. The recursive worker now takes the byte stream first and immediately case-analyzes it. As a result, partial application blocks at the input boundary instead of unrolling the counter loop. This preserves the fully-applied behavior of `readBytes`, while making partial application such as `readBytes 2` normalize safely.
5.1 KiB
Recursive Consumer Argument Order
Core issue
Partial application is generally fine in tricu. The problem appears with recursive consumer functions when loop-control arguments are known before the consumed data is available.
The concrete case was readBytes.
This worked:
(readBytes 2) [(1) (2) (3)]
This used to explode in space:
readBytes 2
At first this looked like a general partial-application problem, but it was not. Other partial applications, such as partially applying map, normalized safely. The issue was the argument order and recursive shape of readBytes_.
What went wrong
The original worker had loop-control arguments before the byte stream:
readBytes_ = y (self n i bs original acc : ...)
readBytes = (n bs : readBytes_ n 0 bs bs t)
After partially applying:
readBytes 2
the evaluator knew:
n = 2
i = 0
but did not know:
bs
original
acc
Because the counter values were known, the evaluator could reduce checks like:
equal? i n
and begin unrolling recursion symbolically before the byte stream existed. That produced a large residual tree describing possible stream cases, rests, and accumulated values.
The bug was not recursion itself. The bug was allowing counters to drive recursion while the consumed structure was still abstract.
Why map-style partial application is safe
A partially-applied list consumer such as:
map (i : append i " world!")
is safe because recursion is blocked on the missing list argument. The function cannot recurse until it sees whether the list is empty or a cons cell.
Safe shape:
waiting for input
recursion blocked until input is supplied
Unsafe shape:
waiting for input
known counters still allow symbolic recursion
Fix
Put the consumed data first in the recursive worker and make the first major operation inspect that data.
Corrected shape:
readBytes_ = y (self bs n i original acc :
matchList
(matchBool
(ok (reverse acc) bs)
(err errUnexpectedEof original)
(equal? i n))
(h r :
matchBool
(ok (reverse acc) bs)
(self r n (succ i) original (pair h acc))
(equal? i n))
bs)
readBytes = (n bs : readBytes_ bs n 0 bs t)
Now:
readBytes 2
becomes:
bs : readBytes_ bs 2 0 bs t
Since bs is abstract and the worker immediately performs:
matchList ... bs
evaluation blocks at the data boundary instead of unrolling the counter loop.
General rule
For recursive consumers, the consumed structure should drive evaluation.
Prefer:
worker = y (self input control state :
matchInput
baseCase
(piece rest : ... self rest control nextState ...)
input)
Avoid:
worker = y (self control state input :
if controlDone
done
(... self nextControl nextState rest ...))
In practice:
worker input control state
is safer than:
worker control state input
Accumulators
Be careful not to finalize or transform an abstract accumulator too early.
For example:
ok (reverse acc) bs
is fine when reached after concrete input has driven the recursion, but it can become pathological if reached while acc is still abstract.
Guidelines:
- Accumulate cheaply during recursion.
- Finalize, reverse, or validate only after input has forced the function to a concrete success point.
- Do not let counters select a success branch while the accumulator is still abstract.
Parser guidance
For byte or parser consumers, prefer streaming over global slicing of unknown input.
Prefer:
read one byte
compare or accumulate
recurse on rest
Avoid relying on:
taken = bytesTake n bs
rest = bytesDrop n bs
enough = bytesLength taken == n
The slice-based version may be correct on concrete input but can behave badly when partially applied over abstract input.
Streaming alone is not enough; the recursive worker must also be data-first.
Checklist
When writing a recursive consumer, ask:
- What structure is consumed?
- What argument should block recursion when unknown?
- Are counters available before the consumed structure?
- Could partial application specialize the loop before data arrives?
- Does any branch process an abstract accumulator or rest value?
- Does the worker put consumed data before counters and state?
Safe and unsafe examples
Safe:
readU8
bs : readU8 bs
readBytes 2 [(1) (2) (3)]
(readBytes 2) [(1) (2) (3)]
map (i : append i " world!")
Previously unsafe before the data-first rewrite:
readBytes 2
readBytes_ 2 0
Implication for Arborix
Arborix parsers will include many recursive consumers:
- read N bytes
- read N section records
- scan records for an ID
- parse node records
- validate closures
These should use data-first recursive workers.
Avoid:
readSectionRecords_ count index bs acc
Prefer:
readSectionRecords_ bs count index acc
Short rule
Put consumed data first in recursive workers.
Let data shape drive recursion.
Do not let counters unroll over abstract input.