Files

James Eversole e8ab61dbaa Data-first recursive consumers in readBytes

Reorder recursive byte-stream consumers so the consumed input is inspected
before loop-control arguments can drive evaluation. Previously, partially
applying `readBytes` to a known count, such as `readBytes 2`, allowed the
evaluator to specialize the recursive worker using known counter values
while the byte stream was still abstract. This caused symbolic recursion
over unknown input and produced an enormous normal form.

The recursive worker now takes the byte stream first and immediately
case-analyzes it. As a result, partial application blocks at the input
boundary instead of unrolling the counter loop.

This preserves the fully-applied behavior of `readBytes`, while making partial
application such as `readBytes 2` normalize safely.

2026-05-07 10:07:43 -05:00

5.1 KiB

Raw Blame History

Recursive Consumer Argument Order

Core issue

Partial application is generally fine in tricu. The problem appears with recursive consumer functions when loop-control arguments are known before the consumed data is available.

The concrete case was readBytes.

This worked:

(readBytes 2) [(1) (2) (3)]

This used to explode in space:

readBytes 2

At first this looked like a general partial-application problem, but it was not. Other partial applications, such as partially applying map, normalized safely. The issue was the argument order and recursive shape of readBytes_.

What went wrong

The original worker had loop-control arguments before the byte stream:

readBytes_ = y (self n i bs original acc : ...)
readBytes = (n bs : readBytes_ n 0 bs bs t)

After partially applying:

readBytes 2

the evaluator knew:

n = 2
i = 0

but did not know:

bs
original
acc

Because the counter values were known, the evaluator could reduce checks like:

equal? i n

and begin unrolling recursion symbolically before the byte stream existed. That produced a large residual tree describing possible stream cases, rests, and accumulated values.

The bug was not recursion itself. The bug was allowing counters to drive recursion while the consumed structure was still abstract.

Why `map`-style partial application is safe

A partially-applied list consumer such as:

map (i : append i " world!")

is safe because recursion is blocked on the missing list argument. The function cannot recurse until it sees whether the list is empty or a cons cell.

Safe shape:

waiting for input
recursion blocked until input is supplied

Unsafe shape:

waiting for input
known counters still allow symbolic recursion

Fix

Put the consumed data first in the recursive worker and make the first major operation inspect that data.

Corrected shape:

readBytes_ = y (self bs n i original acc :
  matchList
    (matchBool
      (ok (reverse acc) bs)
      (err errUnexpectedEof original)
      (equal? i n))
    (h r :
      matchBool
        (ok (reverse acc) bs)
        (self r n (succ i) original (pair h acc))
        (equal? i n))
    bs)

readBytes = (n bs : readBytes_ bs n 0 bs t)

Now:

readBytes 2

becomes:

bs : readBytes_ bs 2 0 bs t

Since bs is abstract and the worker immediately performs:

matchList ... bs

evaluation blocks at the data boundary instead of unrolling the counter loop.

General rule

For recursive consumers, the consumed structure should drive evaluation.

Prefer:

worker = y (self input control state :
  matchInput
    baseCase
    (piece rest : ... self rest control nextState ...)
    input)

Avoid:

worker = y (self control state input :
  if controlDone
    done
    (... self nextControl nextState rest ...))

In practice:

worker input control state

is safer than:

worker control state input

Accumulators

Be careful not to finalize or transform an abstract accumulator too early.

For example:

ok (reverse acc) bs

is fine when reached after concrete input has driven the recursion, but it can become pathological if reached while acc is still abstract.

Guidelines:

Accumulate cheaply during recursion.
Finalize, reverse, or validate only after input has forced the function to a concrete success point.
Do not let counters select a success branch while the accumulator is still abstract.

Parser guidance

For byte or parser consumers, prefer streaming over global slicing of unknown input.

Prefer:

read one byte
compare or accumulate
recurse on rest

Avoid relying on:

taken = bytesTake n bs
rest = bytesDrop n bs
enough = bytesLength taken == n

The slice-based version may be correct on concrete input but can behave badly when partially applied over abstract input.

Streaming alone is not enough; the recursive worker must also be data-first.

Checklist

When writing a recursive consumer, ask:

What structure is consumed?
What argument should block recursion when unknown?
Are counters available before the consumed structure?
Could partial application specialize the loop before data arrives?
Does any branch process an abstract accumulator or rest value?
Does the worker put consumed data before counters and state?

Safe and unsafe examples

Safe:

readU8
bs : readU8 bs
readBytes 2 [(1) (2) (3)]
(readBytes 2) [(1) (2) (3)]
map (i : append i " world!")

Previously unsafe before the data-first rewrite:

readBytes 2
readBytes_ 2 0

Implication for Arborix

Arborix parsers will include many recursive consumers:

read N bytes
read N section records
scan records for an ID
parse node records
validate closures

These should use data-first recursive workers.

Avoid:

readSectionRecords_ count index bs acc

Prefer:

readSectionRecords_ bs count index acc

Short rule

Put consumed data first in recursive workers.
Let data shape drive recursion.
Do not let counters unroll over abstract input.

5.1 KiB Raw Blame History

Recursive Consumer Argument Order

Core issue

What went wrong

Why map-style partial application is safe

Fix

General rule

Accumulators

Parser guidance

Checklist

Safe and unsafe examples

Implication for Arborix

Short rule

5.1 KiB

Raw Blame History

Why `map`-style partial application is safe