mirror of
https://github.com/microsoft/vscode.git
synced 2026-07-01 20:17:05 +01:00
a3e359ab0c
* nes-datagen: add cursor-jump (NCLP) task
Extend nes-datagen with a next-cursor-line prediction task alongside
the existing xtab path. Detects the user's next intentional cursor
move after the request bookmark and emits a training sample with the
production cursor-prediction prompt + the observed jump as the
expected response.
Three sub-modes via --sample-task:
- cursor-same-file: a jump farther than N lines from cursor at
request time
- cursor-cross-file: focus/selection on a different file
- cursor-both: either of the above
Reuses the production cursor-prediction prompt by capturing it via
the telemetry builder and a no-op fetcher; the cross-file target
line is resolved from a request-time content snapshot + post-request
replay so previously-opened targets get a correct line number
instead of being silently labelled :0.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: replace SAMPLE_TASK_VALUES tuple with a string enum
Convert the string-union + as-const tuple to a proper NesDatagenSampleTask
string enum. CLI surface is string-enum members keep theunchanged
kebab-case wire values ('xtab', 'cursor-same-file', ...). All consumers
(dispatch, fixtures, response metadata typing) updated to reference enum
members instead of string literals.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: lower default --same-file-jump-min-above to 2
Upward cursor jumps (back to a definition, an import, etc.) are
typically tighter than downward jumps after the user has been
writing. Lower the default threshold to match.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: rename NCLP to cursor-jump throughout
Drop the NCLP abbreviation in favor of the more descriptive
'cursor-jump' name already used in the production xtab provider.
cursorJumpPromptStep,
cursorJumpResponseStep), the capture request ids,
and all surrounding doc comments / test descriptions.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: build documentIndexMapping from whole recording
path map from only the
pre-request slice and then re-walked the post-request slice to
backfill any documentEncountered entries that arrived later. Pass the
whole recording into documentIndexMapping instead so the helper sees
every document the user touched in a single pass; the backfill loop is
gone.
splitRecordingAtRequestTime now also returns the full entries array so
both callers can reuse it without re-deriving it from altAction.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: use shared Result type in cursor-jump detectors
Drop the bespoke { ok, value | reason } discriminated union in
detectJump.ts and reuse the existing Result<T, E> from
src/util/common/result. JumpDetectionResult<T> is now just an alias
for Result<T, string>.
.isOk(),
.err) and the spec file accordingly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: strip raw cursor-jump prompt from emitted telemetry
cursorJumpRawMessages and cursorJumpKeptRange were added to
IStatelessNextEditTelemetry so in-process debug / datagen tooling
could read them back via getStatelessNextEditTelemetry(). However
LlmNESTelemetryBuilder.build() spreads ...this._statelessNextEditTelemetry
into the emitted payload, so those two fields would leak to telemetry
cursorJumpRawMessages can contain full prompt content (sourcesinks
code), which must never leave the process.
Destructure them out before spreading into the build() payload. They
remain readable via getStatelessNextEditTelemetry() for tooling.
Documented the privacy contract on the IStatelessNextEditTelemetry
field declarations so future edits don't forget.
Addresses copilot-pull-request-reviewer feedback on PR #320113.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: fail cross-file detection when no selection lands on target
detectCrossFileJump previously returned Result.ok with toLine
undefined when only a focused event was seen for the target doc (no
selectionChanged). That left generateCrossFileResponse to drop the
sample later while the detector still reported a successful jump.
Treat focused-without-selectionChanged as a failed detection
('crossFileTargetNoSelection') so callers can skip early, and tighten
ICrossFileJump.toLine to non-undefined now that ok results always
have a usable line number. Removes the dead error path in
generateCrossFileResponse.
Adds a regression test that focused-only triggers the new error.
Addresses copilot-pull-request-reviewer feedback on PR #320113.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: capture cursor-jump prompt via logContext, not telemetry
The datagen pipeline previously stashed the raw cursor-jump prompt and
keptRange on IStatelessNextEditTelemetry so cursorJumpPromptStep.ts could
read them back via LlmNESTelemetryBuilder.getStatelessNextEditTelemetry().
That leaked raw prompts into the telemetry payload (worked around by a
destructure-strip hack in LlmNESTelemetryBuilder.build()) and was
asymmetric with the xtab path, which captures via
InlineEditRequestLogContext.rawMessages.
Move the cursor-jump capture vehicle onto InlineEditRequestLogContext to
match xtab:
- Add cursorJumpRawMessages / cursorJumpKeptRange fields and
setCursorJumpPrompt(messages, keptRange) to InlineEditRequestLogContext.
- XtabNextCursorPredictor.predictNextCursorPosition now takes a logContext
parameter and writes to it directly. The xtabProvider callsite passes
the same logContext it already had in scope.
- cursorJumpPromptStep reads from logContext instead of the telemetry
builder.
- Remove cursorJumpRawMessages / cursorJumpKeptRange from
IStatelessNextEditTelemetry, plus the corresponding setter/getter on
StatelessNextEditTelemetryBuilder and the getter on
LlmNESTelemetryBuilder.
- Revert the destructure-strip hack in LlmNESTelemetryBuilder.build().
The pre-existing cursorJumpPrompt telemetry field (JSON-stringified, fed
by setCursorJumpPrompt(messages)) is intentional and unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: cursor-jump ground truth is first user EDIT, not cursor landing
Selection-based detection treated peek, navigation, IDE auto-scroll, and
recursive cursor settling as if they were the user's next intended edit
location. The model's job is to predict where the user will EDIT next, so
key off the first 'changed' event after the request bookmark instead.
Same-file detector:
- Walks for the first 'changed' on the active doc; uses the first edit's
start offset to compute toLine; applies the linesAbove/linesBelow
threshold. Bails with editsAnotherFileFirst when a non-active doc is
edited first (lets the cross-file detector claim the sample in
cursor-both mode). 'selectionChanged' is no longer consulted, so the
settle-after-edit filter is gone it was a workaround for thetoo
selection-based approach.
Cross-file detector:
- Walks for the first 'changed' on a non-active doc; uses the first
edit's start offset, resolved against the target doc's snapshot
just-before applying the event. Drops focused / selectionChanged
heuristics and the crossFileTargetNoSelection error path (a focused
event without an edit no longer counts; background peek can't
pollute the dataset).
buildLineResolver: tightened i <= entryIndex to i < entryIndex so the
resolver returns the pre-edit line when entryIndex is itself a 'changed'
event. The bound is equivalent for the old selectionChanged caller.
Spec: switched ground-truth events from selChanged to changed; added
coverage for first-edit-of-multi-edit, editsAnotherFileFirst, and
active-doc-then-other-doc ordering.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* xtab: predictNextCursorPosition takes RequestTracingContext
Every other helper in xtabProvider takes RequestTracingContext (the
{ tracer, logContext, telemetry } bundle). The cursor predictor was the
odd one out, taking the three pieces as separate positional params with
the latter two that asymmetry made the new logContext-captureoptional
plumbing look more invasive than it is and forced an awkward
?.setCursorJumpPrompt chain at the use site.
Switch the predictor to take RequestTracingContext directly:
- Export RequestTracingContext from xtabProvider so the predictor can
type-import it (TS-erased to avoid the runtime circular import).
- predictNextCursorPosition signature collapses from 5 params to 3.
- Drop the optional chains; tracing.telemetry / tracing.logContext are
always present in production and the spec constructs a real bundle.
- Spec adds a createTestTracingContext helper using the cheap
InlineEditRequestLogContext / StatelessNextEditTelemetryBuilder
constructors already used by other inlineEdits specs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: rename splitRecording 'entries' field to 'wholeRecording'
Review feedback: the field on the splitRecordingAtRequestTime return
shape was named 'entries' but in context it carries the whole unsplit
recording (i.e. before slicing into prior/after parts). 'wholeRecording'
matches the comment at the consumer (documentIndexMapping callsite).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: inline JumpDetectionResult<T> as Result<T, string>
Review feedback: the one-line alias was used in exactly two places in
the same file and gave nothing over the underlying Result type.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: discriminated union for sample task + jump metadata
Review feedback: ISampleMetadata had 'task' + an optional 'jump' field
with toFilePath also optional. That let xtab samples accidentally carry
a jump and let cursor-cross-file samples omit toFilePath. Replace with
a discriminated union on task:
- xtab: no jump
- cursorSameFile: jump with fromLine/toLine/distance
- cursorCrossFile: jump with required toFilePath
assembleSample now takes a single SampleClassification arg, removing
the parallel task/jump parameters that callers had to keep in sync.
cursorJumpResponseStep is split into ISameFileGeneratedResponse and
ICrossFileGeneratedResponse so the generator return types map cleanly
to the union variants without a non-null assertion at the assembly
site.
DetectedJump no longer needs an assistantTask hint: the pipeline
constructs the classification directly from the response shape.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix dup import
* nes-datagen: address round-4 review feedback on pipeline.ts
Five review threads on pipeline. all addressed in-place.ts
- modelResponse: cursor samples were emitting an empty string for the
expected response. Populate it with the assistant content (which IS
the expected output) so downstream tooling has the gold label.
- Promise.all unbounded throws: wrap the limiter callback body in
try/catch so an unexpected exception from generateCursorPromptFromRecording
becomes a recorded per-row error instead of aborting the whole batch
via Promise.all's first-rejection semantics.
- Inline import for OffsetRange: replace the inline import('...').OffsetRange
type expression with a regular top-of-file import.
- Duplicated config-override block: both pipelines applied the same
applyConfigFile + four setConfig debounce/cache disables. Extract
into applyBatchModeConfig(configService, configs) and call from both.
- runInputPipeline parallelism + memory: add a doc comment clarifying
that this is the single-process entry point, that cursor-jump tasks
also benefit from runInputPipelineParallel (--sample-task is
propagated to workers), and that loadAndParseInput is in-memory by
design (sized per worker; use --parallelism > 1 for large inputs).
Full architectural unification of the parallel and non-parallel
paths is intentionally left as a follow- the surface area isup
large and out of scope for this PR.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* nes-datagen: add e2e tests for cursor-jump pipeline
Mirrors the existing xtab pipeline.e2e.spec.ts: drives two fixture rows
(a same-file jump and a cross-file jump) through the full
`runInputPipeline` for each `sampleTask` mode (cursor-same-file,
cursor-cross-file, cursor-both) and asserts on the JSONL output.
Coverage:
- only the matching row is emitted per mode; both rows are emitted in
cursor-both
- emitted samples carry strategy=next-cursor-line-prediction and the
correct discriminated `task` field (cursor-same-file / cursor-cross-file)
- assistant message targets the jumped-to line / file
- metadata.modelResponse mirrors the assistant content (the round-4 fix)
- --row-offset is reflected in metadata.rowIndex
Test fixtures are constructed in
`fixtures/cursorJumpFixtureData.ts` with synthesized recordings: an
explicit no-op edit + selectionChanged before the bookmark so the
cursor-prediction path's recent-edit gating is satisfied, then a
single post-request `changed` event the detector picks up.
The cursor pipeline needs a prompting strategy whose response handler
tolerates an empty stream — use `xtabUnifiedModel` in a dedicated
`cursorJumpConfig.json` (the existing patchBased02 config crashes on
empty output, which is acceptable in production but breaks the
prompt-only capture path).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Strengthen cursor-jump e2e assertions
Replace fuzzy matchers (toMatch(/25/), arrayContaining for tasks) with
exact assertions on assistant content, metadata.task, and metadata.jump.
In cursor-both, locate samples by filePath so a row→classification swap
would now be caught instead of passing.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Make cursor-jump e2e helper accept partial nesDatagen overrides
Helper previously took Partial<RunPipelineOptions>; if a caller passed
`nesDatagen`, the spread fully replaced the default block and the
configured path. Now the helper accepts a partial nesDatagen overlay
and merges field-by-field, so the row-offset test only specifies the
two fields it actually changes and there are no non-null assertions.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add within-threshold cursor-jump negative fixture
Scenario C: cursor on line 10, post-request edit on line 12 (only 2
lines below). Default threshold is ±5 lines, so neither the same-file
nor the cross-file generator should emit a sample for this row.
Asserted in cursor-both via a dedicated 'does not emit a sample for
the within-threshold row' test, and implicitly in cursor-same-file /
cursor-cross-file (their existing count==1 assertions would fail if
the threshold guard regressed).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Ulugbek Abdullaev <ulugbekna@github.com>
374 lines
19 KiB
TypeScript
374 lines
19 KiB
TypeScript
/*---------------------------------------------------------------------------------------------
|
|
* Copyright (c) Microsoft Corporation. All rights reserved.
|
|
* Licensed under the MIT License. See License.txt in the project root for license information.
|
|
*--------------------------------------------------------------------------------------------*/
|
|
import minimist from 'minimist';
|
|
import { EmbeddingType } from '../../src/platform/embeddings/common/embeddingsComputer';
|
|
import { CacheMode } from './simulationContext';
|
|
|
|
/** Number of runs that are stored in baseline.json */
|
|
export const BASELINE_RUN_COUNT = 10;
|
|
|
|
export enum NesDatagenSampleTask {
|
|
Xtab = 'xtab',
|
|
CursorSameFile = 'cursor-same-file',
|
|
CursorCrossFile = 'cursor-cross-file',
|
|
CursorBoth = 'cursor-both',
|
|
}
|
|
|
|
export type NesDatagen = {
|
|
readonly input: string;
|
|
readonly output: string | undefined;
|
|
readonly rowOffset: number;
|
|
readonly workerMode: boolean;
|
|
readonly sampleTask: NesDatagenSampleTask;
|
|
/** Minimum same-file lines above the request cursor for a move to count as a jump. */
|
|
readonly sameFileJumpMinAbove: number;
|
|
/** Minimum same-file lines below the request cursor for a move to count as a jump. */
|
|
readonly sameFileJumpMinBelow: number;
|
|
};
|
|
|
|
export class SimulationOptions {
|
|
public static fromProcessArgs(): SimulationOptions {
|
|
return new SimulationOptions(process.argv);
|
|
}
|
|
|
|
public static fromArray(argv: readonly string[]): SimulationOptions {
|
|
return new SimulationOptions(argv);
|
|
}
|
|
|
|
private readonly argv: minimist.ParsedArgs;
|
|
|
|
public readonly help: boolean;
|
|
public readonly listModels: boolean;
|
|
public readonly listTests: boolean;
|
|
public readonly listSuites: boolean;
|
|
public readonly jsonOutput: boolean;
|
|
public readonly nRuns: number;
|
|
public readonly chatModel: string | undefined;
|
|
public readonly smartChatModel: string | undefined;
|
|
public readonly fastChatModel: string | undefined;
|
|
public readonly fastRewriteModel: string | undefined;
|
|
public readonly summarizeHistory: boolean;
|
|
public readonly swebenchPrompt: boolean;
|
|
public readonly embeddingType: EmbeddingType | undefined;
|
|
public readonly boost: boolean;
|
|
public readonly parallelism: number;
|
|
public readonly lmCacheMode: CacheMode;
|
|
public readonly modelCacheMode: CacheMode;
|
|
public readonly resourcesCacheMode: CacheMode;
|
|
public readonly cachePath: string | undefined;
|
|
public readonly externalBaseline: string | undefined;
|
|
public readonly externalScenarios: string | undefined;
|
|
public readonly output: string | undefined;
|
|
public readonly inline: boolean;
|
|
public readonly sidebar: boolean;
|
|
public readonly applyChatCodeBlocks: boolean;
|
|
public readonly stageCacheEntries: boolean;
|
|
public readonly ci: boolean;
|
|
public readonly gc: boolean;
|
|
public readonly externalCacheLayersPath: string | undefined;
|
|
public readonly verbose: number | boolean | undefined;
|
|
public readonly grep: string[] | string | undefined;
|
|
public readonly omitGrep: string | undefined;
|
|
public readonly heapSnapshots: boolean | string | undefined;
|
|
/** --scenario-test, --scenarioTest Run tests from provided scenario test file name */
|
|
public readonly scenarioTest: string | undefined;
|
|
public readonly isUpdateBaseline: boolean;
|
|
public readonly noFetch: boolean;
|
|
public readonly noCachePointer: boolean;
|
|
/**
|
|
* A label for the current simulation run, to be displayed in the UI for distinguishing between runs.
|
|
*/
|
|
public readonly label: string;
|
|
public readonly runServerPoweredNesProvider: boolean;
|
|
public readonly nes: 'external' | 'coffe' | undefined;
|
|
public readonly nesUrl: string | undefined;
|
|
public readonly nesApiKey: string | undefined;
|
|
|
|
public readonly nesDatagen: NesDatagen | undefined;
|
|
|
|
public readonly subcommand: 'nes-datagen' | undefined;
|
|
|
|
public readonly disabledTools: Set<string>;
|
|
|
|
/** If true, all tests are run in the extension host */
|
|
public readonly inExtensionHost: boolean;
|
|
/** Extensions to ensure are available in the extension host */
|
|
public readonly installExtensions: string[];
|
|
/** Whether to run headless (defaults to false) */
|
|
public readonly headless: boolean;
|
|
/** @internal Only run a single test number */
|
|
public readonly runNumber: number;
|
|
/** Explicit workspace URI to use for stest --in-extension-host */
|
|
public readonly useScenarioWorkspace: boolean;
|
|
|
|
/** If true, will try to use code search using our service. */
|
|
public readonly useExperimentalCodeSearchService: boolean;
|
|
|
|
public readonly configFile: string | undefined;
|
|
|
|
public readonly modelConfigFile: string | undefined;
|
|
|
|
protected constructor(processArgv: readonly string[]) {
|
|
const argv = minimist(processArgv.slice(2));
|
|
this.argv = argv;
|
|
this.help = boolean(argv['help'], false);
|
|
this.listModels = boolean(argv['list-models'], false);
|
|
this.listTests = boolean(argv['list-tests'], false);
|
|
this.listSuites = boolean(argv['list-suites'], false);
|
|
this.jsonOutput = boolean(argv['json'], false);
|
|
this.isUpdateBaseline = boolean(argv['update-baseline'] ?? argv['u'], false);
|
|
this.boost = boolean(argv['boost'], false);
|
|
const fetch = boolean(argv['fetch'], true);
|
|
this.noFetch = !fetch; // `--no-fetch` becomes argv[`fetch`] because of how minimist works
|
|
const cachePointer = boolean(argv['cache-pointer'], true);
|
|
this.noCachePointer = !cachePointer; // `--no-cache-pointer` becomes argv[`cache-pointer`] because of how minimist works
|
|
this.nRuns = typeof argv['n'] === 'number' ? argv['n'] : (this.isUpdateBaseline || argv['ci'] ? BASELINE_RUN_COUNT : 10);
|
|
this.chatModel = this.argv['model'];
|
|
this.smartChatModel = this.argv['smart-model'];
|
|
this.fastChatModel = this.argv['fast-model'];
|
|
this.fastRewriteModel = this.argv['fast-rewrite-model'];
|
|
this.summarizeHistory = boolean(argv['summarize-history'], true);
|
|
this.swebenchPrompt = boolean(argv['swebench-prompt'], false);
|
|
this.embeddingType = cliOptionsToWellKnownEmbeddingsType(this.argv['embedding-model']);
|
|
this.parallelism = this.argv['parallelism'] ?? this.argv['p'] ?? 20;
|
|
this.modelCacheMode = this.argv['skip-model-cache'] ? CacheMode.Disable : CacheMode.Default;
|
|
this.lmCacheMode = (
|
|
this.argv['skip-cache'] ? CacheMode.Disable
|
|
: (this.argv['require-cache'] ? CacheMode.Require : CacheMode.Default)
|
|
);
|
|
this.resourcesCacheMode = (
|
|
this.argv['skip-resources-cache'] ? CacheMode.Disable : CacheMode.Default
|
|
);
|
|
this.externalScenarios = this.argv['external-scenarios'];
|
|
this.externalBaseline = this.argv['external-baseline']; // must be set after `externalScenarios`
|
|
this.validateExternalBaseline();
|
|
this.output = this.argv['output'];
|
|
this.cachePath = this.argv['cache-location'];
|
|
this.inline = boolean(this.argv['inline'], false);
|
|
this.sidebar = boolean(this.argv['sidebar'], false);
|
|
this.applyChatCodeBlocks = boolean(this.argv['apply-chat-code-blocks'], false);
|
|
this.stageCacheEntries = boolean(this.argv['stage-cache-entries'], false);
|
|
this.ci = boolean(this.argv['ci'], false);
|
|
this.gc = boolean(this.argv['gc'], false);
|
|
this.externalCacheLayersPath = argv['external-cache-layers-path'];
|
|
this.verbose = this.argv['verbose'];
|
|
this.grep = argv['grep'];
|
|
this.omitGrep = argv['omit-grep'];
|
|
this.heapSnapshots = argv['heap-snapshots'];
|
|
this.scenarioTest = argv['scenarioTest'] ?? argv['scenario-test'];
|
|
this.label = argv['label'] ?? '';
|
|
|
|
this.inExtensionHost = boolean(argv['in-extension-host'], false);
|
|
this.installExtensions = argv['install-extension'] ? argv['install-extension'].split(',') : [];
|
|
this.headless = boolean(argv['headless'], true);
|
|
this.runNumber = Number(argv['run-number']) || 0;
|
|
|
|
this.runServerPoweredNesProvider = boolean(argv['runServerPoweredNesProvider'], false);
|
|
|
|
this.nes = SimulationOptions.validateNesArgument(argv['nes']);
|
|
|
|
this.nesUrl = argv['nes-url'];
|
|
// [SuppressMessage("Microsoft.Security", "CS002:SecretInNextLine", Justification="used for local simulation tests")]
|
|
this.nesApiKey = argv['nes-api-key'];
|
|
SimulationOptions.validateNesUrlOverride(this.nesUrl, this.nesApiKey);
|
|
|
|
this.disabledTools = argv['disable-tools'] ? new Set(argv['disable-tools'].split(',')) : new Set();
|
|
this.useScenarioWorkspace = boolean(argv['scenario-workspace-folder'], false);
|
|
|
|
this.useExperimentalCodeSearchService = boolean(argv['use-experimental-code-search-service'], false);
|
|
|
|
const isNesDatagen = (argv._ as string[]).includes('nes-datagen');
|
|
this.subcommand = isNesDatagen ? 'nes-datagen' : undefined;
|
|
this.nesDatagen = isNesDatagen && argv['input']
|
|
? {
|
|
input: argv['input'],
|
|
output: argv['out'],
|
|
rowOffset: typeof argv['row-offset'] === 'number' ? argv['row-offset'] : 0,
|
|
workerMode: boolean(argv['worker'], false),
|
|
sampleTask: SimulationOptions.validateSampleTask(argv['sample-task']),
|
|
sameFileJumpMinAbove: typeof argv['same-file-jump-min-above'] === 'number' ? argv['same-file-jump-min-above'] : 2,
|
|
sameFileJumpMinBelow: typeof argv['same-file-jump-min-below'] === 'number' ? argv['same-file-jump-min-below'] : 5,
|
|
}
|
|
: undefined;
|
|
|
|
this.configFile = argv['config-file'];
|
|
this.modelConfigFile = argv['model-config-file'];
|
|
}
|
|
|
|
public printHelp(): void {
|
|
console.log([
|
|
`Example usages: `,
|
|
` npm run simulate`,
|
|
` npm run simulate -- --external-scenarios=<path> --inline --output=<path>`,
|
|
` npm run simulate -- --external-scenarios=<path> --sidebar --output=<path>`,
|
|
` npm run simulate -- --external-scenarios=<path> --nes --output=<path>`,
|
|
` npm run simulate -- --update-baseline`,
|
|
``,
|
|
` -u, --update-baseline Updates scores in baseline.json if they change as a result of your changes to prompts sent to the model`,
|
|
` --external-scenarios Path to a directory containing scenarios to run`,
|
|
` --inline Run inline chat external scenarios`,
|
|
` --sidebar Run sidebar chat external scenarios`,
|
|
` --nes Run NES external scenarios`,
|
|
` --output Path to a directory where to generate output`,
|
|
` --n Run each scenario N times`,
|
|
` --ci Equivalent to --n=${BASELINE_RUN_COUNT} but throws if the baseline is not up-to-date`,
|
|
` --gc Used with --require-cache to compact cache layers into the baseline cache`,
|
|
` --external-cache-layers-path Used to specify the path to the external cache layers`,
|
|
` --grep Run a test which contains the passed-in string`,
|
|
` --omit-grep Run a test which does not contain the passed-in string`,
|
|
` --embedding-model Specify the model to use for the embedding endpoint (default: ada)`,
|
|
` Values: ada, text3small, text3large`,
|
|
` --list-models List available chat models`,
|
|
` --model Specify the model to use for the chat endpoint (use --list-models to see valid options)`,
|
|
` --smart-model Specify the model to use in place of the smarter slower model, i.e GPT 4o`,
|
|
` --fast-model Specify the model to use in place of the faster / less smart model, i.e GPT 4o mini`,
|
|
` --fast-rewrite-model [experimental] Specify the model to use for the fast rewrite endpoint`,
|
|
` -p, --parallelism [experimental] Run tests in parallel (default: 1)`,
|
|
` --skip-cache [experimental] Do not use the cache for language model requests`,
|
|
` --require-cache [experimental] Require cache hits, fail on cache misses`,
|
|
` --regenerate-cache [experimental] Fetch all responses and refresh the cache`,
|
|
` --skip-resources-cache [experimental] Do not use the cache for computed resources`,
|
|
` --skip-model-cache [experimental] Do not use the cache for model metadata`,
|
|
` --stage-cache-entries [experimental] Stage cache files that were used in current simulation run`,
|
|
` --list-tests List tests without running them`,
|
|
` --json Print output in JSONL format`,
|
|
` --verbose Print more information about test and assertion failures`,
|
|
` --scenario-test Run tests from provided scenario test file name, e.g., 'docComment.stest' or 'docComment.stest.ts' (--scenarioTest is supported but will be deprecated in future)`,
|
|
` --no-fetch Do not send requests to the model endpoint (uses cache but doesn't write to it) (useful to make sure prompts are unchanged by observing cache misses)`,
|
|
` --no-cache-pointer [experimental] Do not write files to outcome/`,
|
|
` --label A label for the current simulation run, to be displayed in the UI for distinguishing between runs`,
|
|
` --nes-url To override endpoint URL for NES (must be used with --nes-api-key)`,
|
|
` --nes-api-key API key for endpoint URL provided via NES (must be used with --nes-url)`,
|
|
` --runServerPoweredNesProvider Run stests against the http server powered NES provider (server must be run at port 8001)`,
|
|
` --disable-tools A comma-separated list of tools to disable`,
|
|
` --swebench-prompt Use the headless swebench prompt for agent mode`,
|
|
` --summarize-history Enable experimental conversation history summarization in agent mode`,
|
|
` --scenario-workspace-folder If true, runs the stest inline in the scenario's workspace folder`,
|
|
` --config-file Path to a JSON file containing configuration options`,
|
|
` --model-config-file Path to a JSON file containing model configuration options`,
|
|
``,
|
|
`Subcommands:`,
|
|
` nes-datagen Generate training data from alternative action recordings`,
|
|
` Run 'npm run simulate -- nes-datagen --help' for options`,
|
|
``,
|
|
].join('\n'));
|
|
}
|
|
|
|
public printTrainHelp(): void {
|
|
console.log([
|
|
`Usage: npm run simulate -- --config-file=<path> [global options] nes-datagen --input=<path> [options]`,
|
|
``,
|
|
`Generate training data by replaying alternative action recordings through the NES prompt pipeline.`,
|
|
`The prompting strategy is read from the model configuration in --config-file.`,
|
|
``,
|
|
`Options:`,
|
|
` --input Path to a JSON or JSON Lines file with training data recordings (required)`,
|
|
` Format is inferred from the extension: .jsonl/.ndjson → JSON Lines, otherwise JSON array`,
|
|
` --out Output path for the JSON Lines file. Default: <input-path>_output.jsonl`,
|
|
` --sample-task Which target to generate (default: xtab)`,
|
|
` Values: xtab, cursor-same-file, cursor-cross-file, cursor-both`,
|
|
` xtab → edit-prediction sample (assistant = an edit)`,
|
|
` cursor-same-file → next-cursor-line sample restricted to the active file`,
|
|
` cursor-cross-file → next-cursor-line sample for a jump to another file`,
|
|
` cursor-both → tries same-file first, falls back to cross-file (one sample per row)`,
|
|
` --same-file-jump-min-above Minimum lines above request cursor for a same-file move to count as a jump (default: 2)`,
|
|
` --same-file-jump-min-below Minimum lines below request cursor for a same-file move to count as a jump (default: 5)`,
|
|
``,
|
|
`Global options (placed before 'nes-datagen'):`,
|
|
` --config-file Path to a JSON config file (required for nes-datagen)`,
|
|
` Must include "github.copilot.chat.inlineEdits.xtabProvider.modelConfiguration"`,
|
|
` with at least { "modelName", "promptingStrategy", "includeTagsInCurrentFile" }`,
|
|
` -p, --parallelism Number of parallel workers (default: 20)`,
|
|
` --verbose Print detailed progress and error information`,
|
|
` --help Show this help message`,
|
|
``,
|
|
`Examples:`,
|
|
` npm run simulate -- --config-file=config.json nes-datagen --input=data.json`,
|
|
` npm run simulate -- --config-file=config.json --parallelism=10 --verbose nes-datagen --input=data.json`,
|
|
` npm run simulate -- --config-file=config.json nes-datagen --input=data.json --sample-task=cursor-same-file`,
|
|
` npm run simulate -- --config-file=config.json nes-datagen --input=data.json --sample-task=cursor-cross-file`,
|
|
` npm run simulate -- --config-file=config.json nes-datagen --input=data.json --sample-task=cursor-both --same-file-jump-min-above=8 --same-file-jump-min-below=8`,
|
|
``,
|
|
].join('\n'));
|
|
}
|
|
|
|
private validateExternalBaseline() {
|
|
if (this.externalBaseline && !this.externalScenarios) {
|
|
throw new Error('External scenarios must be provided for external baseline to work.');
|
|
}
|
|
}
|
|
|
|
private static validateNesArgument(nes: unknown): 'external' | 'coffe' | undefined {
|
|
if (nes === undefined || nes === null) {
|
|
return undefined;
|
|
}
|
|
if (typeof nes === 'boolean') { // this's for backward compat because previously it was possible to just pass `--nes` to run external stests against NES
|
|
return 'external';
|
|
}
|
|
if (typeof nes !== 'string') {
|
|
throw new Error(`--nes must be a string, but got: ${typeof nes}`);
|
|
}
|
|
switch (nes) {
|
|
case 'external':
|
|
case 'coffe':
|
|
return nes;
|
|
default:
|
|
throw new Error(`--nes can only be 'external' or 'coffe', but got: ${nes}`);
|
|
}
|
|
}
|
|
|
|
private static validateNesUrlOverride(nesUrl: string | undefined, nesApiKey: string | undefined): void {
|
|
if (nesUrl !== undefined && nesApiKey === undefined) {
|
|
throw new Error(`--nesApiKey must be provided when --nesUrl is set`);
|
|
}
|
|
if (nesUrl === undefined && nesApiKey !== undefined) {
|
|
throw new Error(`--nesUrl must be provided when --nesApiKey is set`);
|
|
}
|
|
}
|
|
|
|
private static validateSampleTask(value: unknown): NesDatagenSampleTask {
|
|
if (value === undefined || value === null) {
|
|
return NesDatagenSampleTask.Xtab;
|
|
}
|
|
if (typeof value !== 'string') {
|
|
throw new Error(`--sample-task must be a string, but got: ${typeof value}`);
|
|
}
|
|
const allowed = Object.values(NesDatagenSampleTask) as string[];
|
|
if (!allowed.includes(value)) {
|
|
throw new Error(`--sample-task must be one of [${allowed.join(', ')}], but got: ${value}`);
|
|
}
|
|
return value as NesDatagenSampleTask;
|
|
}
|
|
}
|
|
|
|
function cliOptionsToWellKnownEmbeddingsType(model: string | undefined): EmbeddingType | undefined {
|
|
switch (model) {
|
|
case 'text3small':
|
|
case EmbeddingType.text3small_512.id:
|
|
return EmbeddingType.text3small_512;
|
|
|
|
case 'metis':
|
|
case EmbeddingType.metis_1024_I16_Binary.id:
|
|
return EmbeddingType.metis_1024_I16_Binary;
|
|
|
|
case undefined:
|
|
return undefined;
|
|
|
|
default:
|
|
throw new Error(`Unknown embedding model: ${model}`);
|
|
}
|
|
}
|
|
|
|
function boolean(value: any, defaultValue: boolean): boolean {
|
|
if (typeof value === 'undefined') {
|
|
return defaultValue;
|
|
}
|
|
if (value === 'false') {
|
|
// treat the string 'false' as false
|
|
return false;
|
|
}
|
|
return Boolean(value);
|
|
}
|