Files
Ulugbek Abdullaev 6bd7400f1c nes-datagen: generate training data from continuous recordings (#323855)
* utils: document binarySearch

* nes-datagen: generate training data from continuous recordings

Continuous enhanced telemetry now ships sliding-window recordings that, unlike per-request alternative-action recordings, carry no requestTime. The datagen pipeline needs a point to split each recording into edit history before/after, so this adds a pluggable pivot strategy (starting with Random, selectable via --pivot-strategy) and a new continuous/ pipeline module that replays a recording at the chosen pivot to produce a processed row.

Along the way this consolidates the pipeline's error and index handling: a shared WithRowIndex<T> replaces the ad-hoc { originalRowIndex, ... } pairs, per-record processing returns Result<IProcessedRow, Error> instead of field-presence unions, and failures surface as original Error objects (no string round-tripping). The telemetry sender's continuous payload is now the documented IContinuousRecording type.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>

* nes-datagen: label alt-action replay errors by originalRowIndex

Address PR review: the alternative-action path mislabeled diagnostics when
earlier records failed to parse.

- processAllRows: push replay errors with the row's true `originalRowIndex`
  instead of its position in the filtered `rows` array (parse failures make
  `rows` sparse, so the two diverge).
- loadAndProduceProcessedRows: resolve `languageForRow` via an
  `originalRowIndex`-keyed Map rather than positional `rows[i]`, matching how
  callers pass `e.originalRowIndex`.
- Clarify the `recordCount` doc: it counts successfully-parsed records (parse
  failures are counted separately in `parseErrors`).
- Add a regression spec asserting replay errors carry the row index, not the
  array position.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
2026-07-01 18:41:11 +05:00
..