* utils: document binarySearch
* nes-datagen: generate training data from continuous recordings
Continuous enhanced telemetry now ships sliding-window recordings that, unlike per-request alternative-action recordings, carry no requestTime. The datagen pipeline needs a point to split each recording into edit history before/after, so this adds a pluggable pivot strategy (starting with Random, selectable via --pivot-strategy) and a new continuous/ pipeline module that replays a recording at the chosen pivot to produce a processed row.
Along the way this consolidates the pipeline's error and index handling: a shared WithRowIndex<T> replaces the ad-hoc { originalRowIndex, ... } pairs, per-record processing returns Result<IProcessedRow, Error> instead of field-presence unions, and failures surface as original Error objects (no string round-tripping). The telemetry sender's continuous payload is now the documented IContinuousRecording type.
Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
* nes-datagen: label alt-action replay errors by originalRowIndex
Address PR review: the alternative-action path mislabeled diagnostics when
earlier records failed to parse.
- processAllRows: push replay errors with the row's true `originalRowIndex`
instead of its position in the filtered `rows` array (parse failures make
`rows` sparse, so the two diverge).
- loadAndProduceProcessedRows: resolve `languageForRow` via an
`originalRowIndex`-keyed Map rather than positional `rows[i]`, matching how
callers pass `e.originalRowIndex`.
- Clarify the `recordCount` doc: it counts successfully-parsed records (parse
failures are counted separately in `parseErrors`).
- Add a regression spec asserting replay errors carry the row index, not the
array position.
Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
* Allow invoking simulationMain with alternative action input
* Address review comments: rename CLI opts, extract pipeline, fix correctness issues
- Rename CLI options with --train- prefix (--train-input, --train-strategy,
--train-out, --train-row-offset, --train-worker) and document all options
- Extract runInputPipeline/runInputPipelineParallel to test/pipeline/trainPipeline.ts
- Preserve original row index through parse/replay/prompt pipeline to fix
sample numbering drift when rows are filtered out
- Fix parseSuggestedEdit: use JSON.parse for escaped text, handle missing delimiter
- Fix line number regex to accept optional space after | (WithoutSpace format)
- Clamp concurrency to >= 1, type samples as ISample[], wrap dispose in try/finally
- Gate verbose logging in loadAndParseInput behind verbose flag
- Use splitLines from existing utility instead of local duplicate
* move nes-datagen to a subcommand
* more code reuse around setting promptStrategy and model config
* Address review: use ResponseFormat, Limiter, assertNever, and raw messages
* minor refactor runPipeline
* finalize
* use POT instead of custom code
* move files from script/ to test/pipeline/
---------
Co-authored-by: ulugbekna <ulugbekna@gmail.com>