mirror of
https://github.com/microsoft/vscode.git
synced 2026-05-19 14:49:48 +01:00
bca6df1d1d
* fix(nes-datagen): discard xtab-275 oracle edits outside edit window
The NES model can only edit lines inside the prompt's <|code_to_edit|>
window [K, N). formatAsEditWindowOnly was expanding the window to cover
stray oracle edits, so the assistant text spilled out of the window and
duplicated surrounding context when applied.
Instead, keep edits in order and drop the first edit that isn't fully
contained in [K, N) along with every later edit (later offsets assume
earlier edits were applied). Add unit tests covering both the spillover
repro and the all-in-window control case.
* refactor(nes-datagen): extract filterEditsInsideEditWindow helper
Pull the line-containment filter out of formatAsEditWindowOnly into a
reusable helper, plus a small getEditLineRange utility to dedupe the
offset->line conversion. No behavior change.
* refactor(nes-datagen): route xtab-275 dropped-edit warnings through pipeline logger
Replace the console.warn in formatAsEditWindowOnly with a structured
return shape ({ assistant, droppedCount }) and a ResponseLogger threaded
through generateResponse / generateAllResponses. The pipeline now logs
dropped edits via its existing log callback (so warnings are visible to
dataset curators and captured by the e2e test logs), and surfaces the
count on IGeneratedResponse.droppedEditCount.
* fix(nes-datagen): correct netLineChange math for full-line deletions
splitLines("L6\n") returns ['L6', ''] (length 2), so deleting a single
terminated line was counted as -2 lines instead of -1. Compute the delta
by counting newlines in the old segment vs. the new text, which gives
the correct line-count delta for any combination of insertions,
deletions, or replacements without the trailing-empty quirk.
* test(nes-datagen): cover line-changing and boundary-straddling oracle edits
Add cases for in-window inserts that grow the slice, in-window deletes
that shrink it, edits that straddle the window boundary (must be
discarded), and the all-edits-outside fallback (assistant equals the
original window slice with droppedCount equal to all edits).
* fix(nes-datagen): filter oracle edits independently against edit window
Oracle edits come from StringEdit.compose().replacements upstream
(processor.ts) and applyEditsToContent applies them all to the original
doc by sorting offset-descending — so each edit's offsets are
independent. The earlier drop-and-truncate rule (which discarded every
edit after the first out-of-window one) was over-conservative and threw
away in-window edits that were perfectly applicable.
Filter each edit on its own merits and update the test that relied on
the truncate behavior to assert independent filtering instead.
* feat(nes-datagen): surface dropped-edit count in pipeline summary
Sum droppedEditCount across all generated responses and append it to
the [4/5] log line when nonzero, so dataset curators running the
pipeline can see at a glance how many oracle edits were discarded as
out-of-window. Silent omission was the original bug; making it visible
in the summary closes the feedback loop.