Commit Graph

47 Commits

Author SHA1 Message Date
Tyler James Leonhardt 8c15ca4852 Move IRequestLogger into common (#309121)
* Move IRequestLogger into common

so that it can be used in common files.

also two tiny feedbacks in claude from https://github.com/microsoft/vscode/pull/309119

* format
2026-04-10 17:58:44 -07:00
Christof Marti a2188f2bcc Add support for CLS to setConfigs mid-runtime (#308723)
Co-authored-by: Andrea Mah <andreamah@microsoft.com>
2026-04-09 12:04:17 +02:00
Christof Marti 43f7117748 Surface network errors with proxies (microsoft/vscode#298236) 2026-04-08 08:13:59 +00:00
Andrea Mah 731c50545b Let CLS set ConfigOverrides (#4625)
* let IInlineCompletionsProviderOptions set configOverrides

* Add OverridableConfigurationService to support overrides

* add validators to OverridableConfigurationService

---------

Co-authored-by: Raymond Zhao <7199958+rzhao271@users.noreply.github.com>
2026-03-26 21:31:10 +00:00
Ulugbek Abdullaev 2e22fb812c nes: cursor jump: support using vanilla models from CAPI (#4521)
* nes: cursor jump: support using vanilla models from CAPI

* fix: register IEndpointProvider in NES setupServices
2026-03-20 13:38:41 +00:00
Christof Marti eed329d32b Fix: Log status code and request id on connection error (#4533) 2026-03-19 21:26:27 +00:00
Giuseppe Cianci 593d52ec2a NES: Wait for user idle before sending enhanced telemetry (#4426)
* improve telemetry

* fix timer cleanup

* refactor: simplify _waitForIdleThenSend and fix leak in scheduleSendingEnhancedTelemetry

- Fix map entry overwrite leak: clean up existing entry before overwriting
- Replace manual disposed flag + scattered clearTimeout with DisposableStore
- Remove redundant resetIdleTimer() call (autorunWithChanges fires on construction)
- Eliminate forward-reference to valueUnsub in cleanup closure

* refactor: use RunOnceScheduler and disposableTimeout in _waitForIdleThenSend

Replace hand-rolled idle timer (manual setTimeout/clearTimeout/toDisposable)
with RunOnceScheduler from async.ts, which already handles cancel-and-reschedule.
Replace manual hard-cap timer with disposableTimeout, which auto-clears on dispose.

This reduces ~15 lines of manual timer management to 3 declarative lines.

* feat: enhance telemetry by adding sending reasons and updating telemetry sender initialization

* enhance telemetry, track selected lines more efficiently

* fix instantiation of telemetrysender

* add temporary debug logging

* Revert "add temporary debug logging"

This reverts commit ef384f62dfc784fd4e91a239571d35614b3cee8a.

* update doc comment

* share idle detection

* refactor: move idle detector to own class

* add debug logging

* Revert "add debug logging"

This reverts commit ed4551bc1215fa5eae44ed0c41f04c23530eb1a9.

* fix ts error

* fix ref release

* fix copilot findings

* dispose builder when rescheduled

* add debug logging

* Revert "add debug logging"

This reverts commit 9673e2b5b8a602ed5a6f3f822254c74eed895796.

---------

Co-authored-by: ulugbekna <ulugbekna@gmail.com>
2026-03-18 20:20:11 +00:00
Logan Ramos 93525d253a Add telemetry to keep an eye on the number of network requests we're sending (#4321)
* Add telemetry to keep an eye on the number of network requests we're sending

* Fix unreachable code

* Update src/extension/completions-core/vscode-node/lib/src/networking.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix github api call site

* Apply callsite post spread

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-10 17:42:45 +00:00
Christof Marti 09126ae9ea WebSocket request/response headers (#4166) 2026-03-03 21:25:38 +00:00
Zhichao Li ddb6f98ce6 feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat (#3917)
* feat: add OTel GenAI instrumentation foundation

Phase 0 complete:
- spec.md: Full spec with decisions, GenAI semconv, dual-write, eval signals,
  lessons from Gemini CLI + Claude Code
- plan.md: E2E demo plan (chat ext + eval repo + Azure backend)
- src/platform/otel/: IOTelService, config, attributes, metrics, events,
  message formatters, NodeOTelService, file exporters
- package.json: Added @opentelemetry/* dependencies

OTel opt-in behind OTEL_EXPORTER_OTLP_ENDPOINT env var.

* refactor: reorder OTel type imports for consistency

* refactor: reorder OTel type imports for consistency

* feat(otel): wire OTel spans into chat extension — Phase 1 core

- Register IOTelService in DI (NodeOTelService when enabled, NoopOTelService when disabled)
- Add OTelContrib lifecycle contribution for OTel init/shutdown
- Add `chat {model}` inference span in ChatMLFetcherImpl._doFetchAndStreamChat()
- Add `execute_tool {name}` span in ToolsService.invokeTool()
- Add `invoke_agent {participant}` parent span in ToolCallingLoop.run()
- Record gen_ai.client.operation.duration, tool call count/duration, agent metrics
- Thread IOTelService through all ToolCallingLoop subclasses
- Update test files with NoopOTelService
- Zero overhead when OTel is disabled (noop providers, no dynamic imports)

* feat(otel): add embeddings span, config UI settings, and unit tests

- Add `embeddings {model}` span in RemoteEmbeddingsComputer.computeEmbeddings()
- Add VS Code settings under github.copilot.chat.otel.* in package.json
  (enabled, exporterType, otlpEndpoint, captureContent, outfile)
- Wire VS Code settings into resolveOTelConfig in services.ts
- Add unit tests for:
  - resolveOTelConfig: env precedence, kill switch, all config paths (16 tests)
  - NoopOTelService: zero-overhead noop behavior (8 tests)
  - GenAiMetrics: metric recording with correct attributes (7 tests)

* test(otel): add unit tests for messageFormatters, genAiEvents, fileExporters

- messageFormatters: 18 tests covering toInputMessages, toOutputMessages,
  toSystemInstructions, toToolDefinitions (edge cases, empty inputs, invalid JSON)
- genAiEvents: 9 tests covering all 4 event emitters, content capture on/off
- fileExporters: 5 tests covering write/read round-trip for span, log, metric
  exporters plus aggregation temporality

Total OTel test suite: 63 tests across 6 files

* feat(otel): record token usage and time-to-first-token metrics

Add gen_ai.client.token.usage (input/output) and copilot_chat.time_to_first_token
histogram metrics at the fetchMany success path where token counts and TTFT
are available from the processSuccessfulResponse result.

* docs: finalize sprint plan with completion status

* style: apply formatter changes to OTel files

* feat(otel): emit gen_ai.client.inference.operation.details event with token usage

Wire emitInferenceDetailsEvent into fetchMany success path where full
token usage (prompt_tokens, completion_tokens), resolved model, request ID,
and finish reasons are available from processSuccessfulResponse.

This follows the OTel GenAI spec pattern:
- Spans: timing + hierarchy + error tracking
- Events: full request/response details including token counts

The data mirrors what RequestLogger captures for chat-export-logs.json.

* feat(otel): add aggregated token usage to invoke_agent span

Per the OTel GenAI agent spans spec, add gen_ai.usage.input_tokens and
gen_ai.usage.output_tokens as Recommended attributes on the invoke_agent span.

Tokens are accumulated across all LLM turns by listening to onDidReceiveResponse
events during the agent loop, then set on the span before it ends.

Ref: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/

* feat(otel): add token usage attributes to chat inference span

Defer the `chat {model}` span completion from _doFetchAndStreamChat to
fetchMany where processSuccessfulResponse has extracted token counts.

The chat span now carries:
- gen_ai.usage.input_tokens (prompt_tokens)
- gen_ai.usage.output_tokens (completion_tokens)
- gen_ai.response.model (resolved model)

The span handle is returned from _doFetchAndStreamChat via the result
object so fetchMany can set attributes and end it after tokens are known.

This matches the chat-export-logs.json pattern where each request entry
carries full usage data alongside the response.

* style: apply formatter changes

* fix: correct import paths in otelContrib and add IOTelService to test

* feat: add diagnostic span exporter to log first successful export and failures

* feat: add content capture to OTel spans (messages, responses, tool args/results)

- Chat spans: add copilot.debug_name attribute for identifying orphan spans
- Chat spans: capture gen_ai.input.messages and gen_ai.output.messages when captureContent enabled
- Tool spans: capture gen_ai.tool.call.arguments and gen_ai.tool.call.result when captureContent enabled
- Extension chat endpoint: capture input/output messages when captureContent enabled
- Add CopilotAttr.DEBUG_NAME constant

* fix: register IOTelService in chatLib setupServices for NES test

* fix: register OTel ConfigKey settings in Advanced namespace for configurations test

* fix: register IOTelService in shared test services (createExtensionUnitTestingServices)

* fix: register IOTelService in platform test services

* feat(otel): enhance GenAI span attributes per OTel semantic conventions

- Change gen_ai.provider.name from 'openai' to 'github' for CAPI models
- Rename CopilotAttr to CopilotChatAttr, prefix values with copilot_chat.*
- Add GITHUB to GenAiProviderName enum
- Replace copilot.debug_name with gen_ai.agent.name on chat spans
- Add gen_ai.request.temperature, gen_ai.request.top_p to chat spans
- Add gen_ai.response.id, gen_ai.response.finish_reasons on success
- Add gen_ai.usage.cache_read.input_tokens from cached_tokens
- Add copilot_chat.request.max_prompt_tokens and copilot_chat.time_to_first_token
- Add gen_ai.tool.description to execute_tool spans
- Fix gen_ai.tool.call.id to read chatStreamToolCallId (was reading nonexistent prop)
- Fix tool result capture to handle PromptTsxPart and DataPart (not just TextPart)
- Add gen_ai.input.messages and gen_ai.output.messages to invoke_agent span (opt-in)
- Move gen_ai.tool.definitions from chat spans to invoke_agent span (opt-in)
- Add gen_ai.system_instructions to chat spans (opt-in)
- Fix error.type raw strings to use StdAttr.ERROR_TYPE constant
- Centralize hardcoded copilot.turn_count and copilot.endpoint_type into CopilotChatAttr
- Add COPILOT_OTEL_CAPTURE_CONTENT=true to launch.json for testing
- Document span hierarchy fixes needed in plan.md

* feat(otel): connect subagent spans to parent trace via context propagation

- Add TraceContext type and getActiveTraceContext() to IOTelService
- Add storeTraceContext/getStoredTraceContext for cross-boundary propagation
- Add parentTraceContext option to SpanOptions for explicit parent linking
- Implement in NodeOTelService using OTel remote span context
- Capture trace context when execute_tool runSubagent fires (keyed by toolCallId)
- Restore parent context in subagent invoke_agent span (via subAgentInvocationId)
- Auto-cleanup stored contexts after 5 minutes to prevent memory leaks
- Update test mocks with new IOTelService methods
- Update plan.md with investigation findings

* fix(otel): fix subagent trace context key to use parentRequestId

The previous implementation stored trace context keyed by chatStreamToolCallId
(model-assigned tool call ID), but looked it up by subAgentInvocationId
(VS Code internal invocation.callId UUID). These are different IDs that don't
match across the IPC boundary.

Fix: key by chatRequestId on store side (available on invocation options),
and look up by parentRequestId on subagent side (same value, available on
ChatRequest). Both reference the parent agent's request ID.

Verified: 21-span trace with subagent correctly nested under parent agent.

* fix(otel): add model attrs to invoke_agent and max_prompt_tokens to BYOK chat

- Set gen_ai.request.model on invoke_agent span from endpoint
- Track gen_ai.response.model from last LLM response resolvedModel
- Add copilot_chat.request.max_prompt_tokens to BYOK chat spans
- Document upstream gaps in plan.md (BYOK token usage, programmatic tool IDs)

* test(otel): add trace context propagation tests for subagent linkage

Tests verify:
- storeTraceContext/getStoredTraceContext round-trip and single-use semantics
- getActiveTraceContext returns context inside startActiveSpan
- parentTraceContext makes child span inherit traceId from parent
- Independent spans get different traceIds without parentTraceContext
- Full subagent flow: store context in tool call, retrieve in subagent

* fix(otel): add finish_reasons and ttft to BYOK chat spans, document orphan spans

- Set gen_ai.response.finish_reasons on BYOK chat success
- Set copilot_chat.time_to_first_token on BYOK chat success
- Document Gap 4: duplicate orphan spans from CopilotLanguageModelWrapper
- Identify all orphan span categories (title, progressMessages, promptCategorization, wrapper)

* docs(otel): update Gap 4 analysis — wrapper spans have actual token usage data

The copilotLanguageModelWrapper orphan spans are the actual CAPI HTTP
handlers, not duplicates. They contain real token usage, cache read tokens,
resolved model names, and temperature — all missing from the consumer-side
extChatEndpoint spans due to VS Code LM API limitations.

Updated plan.md with:
- Side-by-side attribute comparison table
- Three fix approaches (context propagation, span suppression, enrichment)
- Recommendation: Option 1 (propagate trace context through IPC)

* feat(otel): propagate trace context through BYOK IPC to link wrapper spans

- Pass _otelTraceContext through modelOptions alongside _capturingTokenCorrelationId
- Inject IOTelService into CopilotLanguageModelWrapper
- Wrap makeRequest in startActiveSpan with parentTraceContext when available
- This creates a byok-provider bridge span that makes chatMLFetcher's chat span
  a child of the original invoke_agent trace, bringing real token usage data
  into the agent trace hierarchy

* debug(otel): add debug attribute to verify trace context capture in BYOK path

* fix(otel): remove debug attribute, BYOK trace context propagation verified working

Verified: 63-span trace with Azure BYOK (gpt-5) correctly shows:
- byok-provider bridge spans linking wrapper chat spans into agent trace
- Real token usage (in:21458 out:1730 cache:19072) visible on wrapper chat spans
- hasCtx:true on all extChatEndpoint spans confirming context capture
- Two subagent invoke_agent spans correctly nested under main agent
- Zero orphan copilotLanguageModelWrapper spans

* refactor(otel): replace byok-provider bridge span with invisible context propagation

Add runWithTraceContext() to IOTelService — sets parent trace context
without creating a visible span. The wrapper's chat spans now appear
directly as children of invoke_agent, eliminating the noisy
byok-provider intermediary span.

Before: invoke_agent → byok-provider → chat (wrapper)
After:  invoke_agent → chat (wrapper)

* refactor(otel): remove duplicate BYOK consumer-side chat span

The extChatEndpoint no longer creates its own chat span. The wrapper's
chatMLFetcher span (via CopilotLanguageModelWrapper) is the single source
of truth with full token usage, cache data, and resolved model.

Before: invoke_agent → chat (empty, extChatEndpoint) + chat (rich, wrapper)
After:  invoke_agent → chat (rich, wrapper only)

* fix(otel): restore chat span for non-wrapper BYOK providers (Anthropic, Gemini)

The previous commit removed the extChatEndpoint chat span, which was correct
for Azure/OpenAI BYOK (served by CopilotLanguageModelWrapper via chatMLFetcher).
But Anthropic and Gemini BYOK providers call their native SDKs directly,
bypassing CopilotLanguageModelWrapper — so they need the consumer-side span.

Now: always create a chat span in extChatEndpoint with basic metadata
(model, provider, response.id, finish_reasons). For wrapper-based providers,
the chatMLFetcher also creates a richer sibling span with token usage.

* fix(otel): skip consumer chat span for wrapper-based BYOK providers

Only create the extChatEndpoint chat span for non-wrapper providers
(Anthropic, Gemini) that need it as their only span. Wrapper-based
providers (Azure, OpenAI, OpenRouter, Ollama, xAI) get a single rich
span from chatMLFetcher via CopilotLanguageModelWrapper.

Result: 1 chat span per LLM call for all provider types.

* fix: remove unnecessary 'google' from non-wrapper vendor set

* feat(otel): add rich chat span with usage data for Anthropic BYOK provider

Move chat span creation into AnthropicLMProvider where actual API response
data (token usage, cache reads) is available. The span is linked to the
agent trace via runWithTraceContext and enriched with:
- gen_ai.usage.input_tokens / output_tokens
- gen_ai.usage.cache_read.input_tokens
- gen_ai.response.model / response.id / finish_reasons

Remove consumer-side extChatEndpoint span for all vendors (nonWrapperVendors
now empty) since both wrapper-based and Anthropic providers create their
own spans with full data.

Next: apply same pattern to Gemini provider.

* feat(otel): add rich chat span for Gemini BYOK, clean up extChatEndpoint

- Add OTel chat span with full usage data to GeminiNativeBYOKLMProvider
- Remove all consumer-side span code from extChatEndpoint (dead code)
- Each provider now owns its chat span with real API response data:
  * CAPI: chatMLFetcher
  * OpenAI-compat BYOK: CopilotLanguageModelWrapper → chatMLFetcher
  * Anthropic: AnthropicLMProvider
  * Gemini: GeminiNativeBYOKLMProvider
- Fix Gemini test to pass IOTelService

* feat(otel): enrich Anthropic/Gemini chat spans with full metadata

Add to both providers:
- copilot_chat.request.max_prompt_tokens (model.maxInputTokens)
- server.address (api.anthropic.com / generativelanguage.googleapis.com)
- gen_ai.conversation.id (requestId)
- copilot_chat.time_to_first_token (result.ttft)

Now matches CAPI chat span attribute parity.

* feat(otel): add server.address to CAPI/Azure BYOK chat spans

Extract hostname from urlOrRequestMetadata when it's a URL string
and set as server.address on the chat span. Works for both CAPI
and CopilotLanguageModelWrapper (Azure BYOK) paths.

* feat(otel): add max_tokens and output_messages to Anthropic/Gemini chat spans

- gen_ai.request.max_tokens from model.maxOutputTokens
- gen_ai.output.messages (opt-in) from response text
- Closes remaining attribute gaps vs CAPI/Azure BYOK spans

* fix(otel): capture tool calls in output_messages for chat spans

When model responds with tool calls instead of text, the output_messages
attribute was empty. Now captures both text parts and tool call parts
in the output_messages, matching the OTel GenAI output messages schema.

Also: Azure BYOK invoke_agent zero tokens is a known upstream gap —
extChatEndpoint returns hardcoded usage:0 since VS Code LM API doesn't
expose actual usage from the provider side.

* fix(otel): capture tool calls in output_messages for Anthropic/Gemini BYOK spans

Same fix as CAPI — when model responds with tool calls, include them
in gen_ai.output.messages alongside text parts. All three provider
paths (CAPI, Anthropic, Gemini) now consistently capture both text
and tool call parts in output messages.

* fix(otel): add input_messages and agent_name to Anthropic/Gemini chat spans

- gen_ai.input.messages (opt-in) captured from provider messages parameter
- gen_ai.agent.name set to AnthropicBYOK / GeminiBYOK for identification

Closes the last attribute gaps vs CAPI/Azure BYOK chat spans.

* fix(otel): fix input_messages serialization for Anthropic/Gemini BYOK

- Map enum role values to names (1→user, 2→assistant, 3→system)
- Extract text from LanguageModelTextPart content arrays instead of
  showing '[complex]' for all messages
- Use OTel GenAI input messages schema with role + parts format

* docs(otel): add remaining metrics/events work to plan.md

Coverage matrix showing:
- Anthropic/Gemini BYOK missing: operation.duration, token.usage,
  time_to_first_token metrics, and inference.details event
- CAPI and Azure BYOK (via wrapper) fully covered
- Tool/agent/session metrics covered across all providers
- 4 tasks (M1-M4) to close the gap

* feat(otel): add metrics and inference events to Anthropic/Gemini BYOK providers

Both providers now record:
- gen_ai.client.operation.duration histogram
- gen_ai.client.token.usage histograms (input + output)
- copilot_chat.time_to_first_token histogram
- gen_ai.client.inference.operation.details log event

All metrics/events now have full parity across CAPI, Azure BYOK,
Anthropic BYOK, and Gemini BYOK.

* fix(otel): fix LoggerProvider constructor — use 'processors' key (SDK v2)

The OTel SDK v2 changed the LoggerProvider constructor option from
'logRecordProcessors' to 'processors'. The old key was silently
ignored, causing all log records to be dropped.

This is why logs never appeared in Loki despite traces working fine.

* docs: add agent monitoring guide with OTel usage and Claude/Gemini comparison

* docs: remove Claude/Gemini comparison from monitoring guide

* docs: add OTel comparison with Claude Code and Gemini CLI

* docs: reorganize monitoring docs — user guide + dev architecture

- agent_monitoring.md: polished user-facing guide (for VS Code website)
- agent_monitoring_arch.md: developer-facing architecture & instrumentation guide
- Removed internal plan/spec/comparison files from repo (moved to ~/Documents)

* fix(otel): restore _doFetchViaHttp body and _fetchWithInstrumentation after rebase

* fix(otel): propagate otelSpan through WebSocket/HTTP routing paths

The otelSpan was created in _doFetchAndStreamChat but not included
in returns from _doFetchViaWebSocket and _doFetchViaHttp, causing
the caller (fetchMany) to always receive undefined for otelSpan.

Fix: await both routing paths and spread otelSpan into the result.

* docs(otel): improve monitoring docs, add collector setup, fix trace context

- Expand agent_monitoring.md with detailed span/metric/event attribute tables
- Add BYOK provider coverage, subagent trace propagation docs
- Add Backend Considerations: Azure App Insights (via collector), Langfuse, Grafana
- Add End-to-End Setup & Verification section with KQL examples
- Add OTel Collector config + docker-compose for Azure App Insights
- Fix: emit inference details event before span.end() in chatMLFetcher
  (fixes 'No trace ID' log records in App Insights)
- Fix: pass active context in emitLogRecord for trace correlation
- Update launch.json to point at OTel Collector (localhost:4328)

* docs(otel): merge Backend Considerations and E2E sections to remove redundancy

* docs(otel): remove internal dev debug reference from user-facing guide

* docs(otel): remove Grafana section and Jaeger refs from App Insights section

* docs(otel): trim Backend section to factual setup guides, remove claims

* docs(otel): final accuracy audit — fix false claims against code

- Mark copilot_chat.session.start event as 'not yet emitted' (defined but no call site)
- Mark copilot_chat.agent.turn event as 'not yet emitted' (defined but no call site)
- Mark copilot_chat.session.count metric as 'not yet wired up'
- Fix OTEL_EXPORTER_OTLP_PROTOCOL desc: only 'grpc' changes behavior
- Fix telemetry kill switch claim: vscodeTelemetryLevel not wired in services.ts
- Remove false toolCalling.tsx instrumentation point from arch doc
- Fix docker-compose comments: wrong port numbers (16686→16687, 4318→4328)
- Add reference to full collector config file from inline snippet

* docs(otel): remove telemetry.telemetryLevel references — OTel is independent

* feat(otel): wire up session.start event, agent.turn event, and session.count metric

- emitSessionStartEvent + incrementSessionCount at invoke_agent start (top-level only)
- emitAgentTurnEvent per LLM response in onDidReceiveResponse listener
- Remove 'not yet wired' markers from docs

* chore: untrack .playwright-mcp/ and add to .gitignore

* chore: remove otel spec reference files

* chore(otel): remove OpenTelemetry environment variables from launch configurations

* fix(otel): add 64KB truncation limit for content capture attributes

Prevents OTLP batch export failures when large prompts/responses are
captured. Aligned with gemini-cli's limitTotalLength pattern.

Applied truncateForOTel() to all JSON.stringify calls feeding span
attributes across chatMLFetcher, toolCallingLoop, toolsService,
anthropicProvider, geminiNativeProvider, and genAiEvents.

* refactor(otel): make GenAiMetrics methods static to avoid per-call allocations

Aligned with gemini-cli pattern of module-level metric functions.
Eliminates 17+ throwaway GenAiMetrics instances per agent run.

* fix(otel): fix timer leak, cap buffered ops, rate-limit export logs

- storeTraceContext: track timers for clearTimeout on retrieval/shutdown,
  add 100-entry max with LRU eviction
- BufferedSpanHandle: cap _ops at 200 to prevent unbounded growth
- DiagnosticSpanExporter: rate-limit failure logs to once per 60s

* docs(otel): fix Jaeger UI port to match docker-compose (16687)

* chore(otel): update sprint plan — mark P0/P1 tasks done

* fix(otel): remove as any casts in BYOK provider content capture

Use proper Array.isArray + instanceof checks instead of as any[]
casts for LanguageModelChatMessage.content iteration.

* refactor(otel): extract OTelModelOptions shared interface

Replaces 3 duplicated inline type assertions for _otelTraceContext
and _capturingTokenCorrelationId with a single shared interface.

* refactor(otel): route OTel logs through ILogService output channel

Replace console.info/error/warn in NodeOTelService with a log callback.
OTelContrib logs essential status to the Copilot Chat output channel
for user troubleshooting (enabled/disabled, exporter config, shutdown).

* fix(otel): remove orphaned OTel ConfigKey definitions

OTel config is read via workspace.getConfiguration in services.ts,
not through IConfigurationService.get(ConfigKey). These constants
were unused dead code.

* test(otel): add comprehensive OTel instrumentation tests

- Agent trace hierarchy (invoke_agent → chat → execute_tool, subagent
  propagation, error states, metrics, events)
- BYOK provider span emission (CLIENT kind, token usage, error.type,
  content capture gating, parentTraceContext linking)
- chatMLFetcher two-phase span lifecycle (create → enrich → end,
  error path, operation duration metric)
- Service robustness (runWithTraceContext, startActiveSpan error
  lifecycle, storeTraceContext overwrite)
- CapturingOTelService reusable test mock for all OTel assertions

* chore: apply formatter import sorting

* chore: remove outdated sprint plan document

* feat(otel): add OTel configuration settings for tracing and logging

* fix(otel): ensure metric reader is flushed and shutdown properly
2026-03-02 20:46:30 +00:00
Christof Marti 8649964a4d Support CAPI WebSocket connections (#4068) (#4069) 2026-02-27 16:58:06 +00:00
Ulugbek Abdullaev 376af91601 nes: support similar files in et (#3938)
* nes: support similar files in et

* fix typing issue

* fix tests
2026-02-24 07:08:36 +00:00
Christof Marti 41a442c2b3 Add network status (#3932) 2026-02-23 21:54:09 +00:00
Ulugbek Abdullaev 0c77656331 nes: feat: more diff merging strategies (#3763)
also fix that two adjacent line changes would result in two diff hunks
2026-02-16 11:15:45 +00:00
Ulugbek Abdullaev b77056774f nes: modelsService: multiple fixes around race conditions (#3749)
* fix: parseModelConfigString accepts value directly instead of re-reading

parseModelConfigStringSetting was re-reading the config from the service
instead of using the observable-tracked value already passed to callers.
This caused a potential inconsistency: the truthy gate in aggregateModels
and determineDefaultModel used the observable value, but the parsing used
a separate read that could return a different value if config changed
between the two reads.

Renamed to parseModelConfigString and changed to accept the config string
value directly as a parameter.

* fix: track useSlashModels as observable dependency in _modelsObs

useSlashModels was read via a non-observable getExperimentBasedConfig()
call inside the derived _modelsObs computation. This meant changes to
the useSlashModels config would not trigger recomputation of the models
list.

Now read through an observable (_useSlashModelsObs) and passed as a
parameter to aggregateModels, so the derived properly tracks it.

* fix: track undesired models as observable dependency in _currentModelObs

_pickModel was calling _undesiredModelsManager.isUndesiredModelId()
directly — a plain synchronous read not tracked by the observable
system. Changes to the undesired models list would not trigger
_currentModelObs to recompute.

Added onDidChange event to IUndesiredModelsManager (and both
implementations), created an observable from it, and read it through
the reader in the _currentModelObs derived computation. _pickModel now
receives the undesiredModelsManager as a parameter.

* fix: serialize UndesiredModels.Manager operations with TaskQueue

addUndesiredModelId and removeUndesiredModelId did read-then-write
without serialization. Concurrent calls could interleave: the second
reads stale state before the first's write completes, overwriting
the first's change.

Now all mutations are serialized through a TaskQueue, ensuring each
operation reads the latest state after any prior write has completed.

* fix: serialize setCurrentModelId calls with TaskQueue

setCurrentModelId was async with multiple awaits but no serialization.
Concurrent calls (e.g. rapid model switching in the UI picker) could
interleave, corrupting undesired-models state and writing stale
preferred-model config.

Now serialized through a TaskQueue so only one setCurrentModelId runs
at a time, with subsequent calls queued until the previous completes.

* refactor: remove dead fallback in selectedModelConfiguration

_currentModelObs always returns a Model (never undefined) due to
_pickModel's fallback chain. The truthy check and the fallback to
determineDefaultModel were dead code.

* fix: validate JSON.parse result with MODEL_CONFIGURATION_VALIDATOR

JSON.parse result was cast to ModelConfiguration without validation.
Invalid JSON structures (e.g. a plain string or number) would pass
parsing but fail at runtime when accessed as ModelConfiguration.

Now uses the existing MODEL_CONFIGURATION_VALIDATOR to validate the
parsed result and reports validation errors via telemetry.

* fix: do not leak an emitter in UndesiredModels.Manager

* fix test
2026-02-14 23:36:18 +00:00
Ulugbek Abdullaev a8e4b9fe96 ghostText: log network requests to the log tree (#3669)
* ghostText: log network requests to the log tree

* nest requests

* Update src/platform/nesFetch/node/completionsFetchServiceImpl.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update src/platform/nesFetch/node/completionsFetchServiceImpl.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address cpoilot comments

* address cpoilot comments

* Revert "Update src/platform/nesFetch/node/completionsFetchServiceImpl.ts"

This reverts commit da984b07119716d8585ca799851ea17cb5b4a7a4.

* fix tests

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-12 11:20:56 +00:00
Alexandru Dima d4f9754abd Fall back to node-fetch when Electron network process crashes (#3655)
When Electron's network service process crashes mid-session, all requests
through it fail permanently with net::ERR_FAILED (network_process_crashed).
The process does not recover until the extension host restarts.

This change detects the crash via chromiumDetails.network_process_crashed
on the error object and responds at two levels:

1. FetcherService (fetchWithFallbacks): permanently demotes the crashed
   fetcher and promotes node-fetch as the primary fetcher. All subsequent
   requests (chat, auth, ping, etc.) bypass Electron entirely with zero
   overhead. Gated by FallbackNodeFetchOnNetworkProcessCrash experiment.

2. chatMLFetcher (_retryAfterError): for the first crashing chat request,
   switches the connectivity check and retry to use node-fetch, so the
   request can recover. Also gated by the experiment flag.

Added isNetworkProcessCrashedError() to IFetcher/IFetcherService interfaces.
ElectronFetcher implements structural detection via chromiumDetails.
Error classification methods (isFetcherError, isNetworkProcessCrashedError,
getUserMessageForFetcherError) now check all fetchers so they still work
correctly after a fetcher demotion.

Added isNetworkProcessCrash flag to ChatFetchResponseType.NetworkError.
Added ElectronFetchErrorChromiumDetails type for structured error access.

30 unit tests covering retry logic, fetcher demotion, experiment gating,
and error classification after demotion.
2026-02-11 08:04:36 +00:00
Benjamin Christopher Simmonds 4be85bd221 Add terminal output monitoring with a limit on buffer size (#3522)
* termial output

* limit the count to 2000

* terminal servcie missing

* Add ITerminalService to service setup in chatLibMain
2026-02-09 11:40:26 +00:00
Logan Ramos 731ade0a0d Attempt to make model service more resilient to network blips (#3523)
* Attempt to make model service more resilient to network blips

* Update src/extension/conversation/vscode-node/languageModelAccess.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update src/extension/prompt/vscode-node/endpointProviderImpl.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-06 15:40:10 +00:00
Christof Marti de441dc178 Block power save during chat requests (#3483) 2026-02-05 15:58:22 +00:00
Logan Ramos fedc7bb41e Don't fetch auto mode token on inactive windows (#3442)
* Don't fetch auto mode token on inactive windows

* Fix promise manipulation
2026-02-04 23:44:19 +00:00
Ulugbek Abdullaev e5740895bc ghost: add logs to be able to track down behavior issues in ghost-text (#3271) 2026-01-29 14:20:26 +00:00
Ulugbek Abdullaev 03db2ad8b9 ghost: support isShown and isFromCache for provideInlineEdit (#3250)
* ghost: support isFromCache for provideInlineEdit

* ghost: support isShown for provideInlineEdit

* formatting

* easier way to mark a suggestion as from cache
2026-01-28 23:13:35 +00:00
Ulugbek Abdullaev 1f51cefb42 ghost: logContext: include prompt in ghost logContext (#3124)
* ghost: logContext: include prompt in ghost logContext

* fix issues

* fix issues
2026-01-23 17:10:58 +00:00
Christof Marti f492228270 Move chat-lib tests to main src folder (#3004) 2026-01-20 17:05:51 +00:00
Ulugbek Abdullaev 5aff59830d exp: add a filter for team members (#2979)
* exp: add a filter for team members

* fix test

* fix formatting

* lint-staged: fix: do not lint if there're no files to lint

otherwise, all files in workspace are linted which just causes headache for the committer

* fix test
2026-01-19 15:36:24 +00:00
Dirk Bäumer 681745ecfb Include diagnostics into completion prompt (#2970)
* Include default diagnostics into the inline completion prompt

* Some minor renames

* Update nls string

* Name setting correctly

* Rename NLS key

* Remove team internal setting from package.json / nls

* Fix VS Code imports

* Use ILanguageDiagnosticsService instead of languages reference

* Import correct experimentation service

* Declare ILanguageDiagnosticsService fro chatLib completions
2026-01-19 11:03:41 +00:00
Ulugbek Abdullaev 0558e860b8 chatLib: fix: register completions fetch service for chatLib (#2880) 2026-01-15 19:08:41 +00:00
Jeff Hunter c5285fa9f8 allow speculative requests to be initiated using chat-lib (#2591) 2025-12-16 15:52:05 +00:00
Jeff Hunter a2fd2580ef fix NESProvider in chat-lib with an alternative to requiring a full IVSCodeExtensionContext (#2570) 2025-12-16 15:49:09 +00:00
Ulugbek Abdullaev 985add451e ghost: log tree: do not double log proposed completion (#2603)
* ghost: log tree: do not double log proposed completion

* Update src/extension/completions-core/vscode-node/extension/src/ghostText/ghostText.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update src/lib/node/chatLibMain.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-16 14:30:09 +00:00
Ulugbek Abdullaev 524fa4b92c ghost: refactor: adopt service injection in inlineCompletion.ts (#2601) 2025-12-16 12:57:28 +00:00
Ulugbek Abdullaev aee2abe2e1 nes: joint: don't enforce cache delay if document hasn't changed (#2409) 2025-12-04 22:13:16 +00:00
Jeff Hunter 6e0f7df3dd make capi client service optional when creating an inline completions provider (#2369) 2025-12-04 07:51:19 +00:00
Christof Marti 81d78b83a4 Fix chat-lib: Include models lookup (#2366) 2025-12-03 20:32:35 +00:00
Christof Marti 1904e7246e Fix chat-lib: Propagate exp update through config event (#2351) 2025-12-03 11:01:57 +00:00
Ulugbek Abdullaev 71803ce30f nes: support jump-to label (#2248)
* nes: update to latest core API for completions

* nes: support jump-to
2025-11-27 18:41:52 +00:00
Osvaldo Ortega 6c944a5c46 Fetch sessions without nwo (#2143)
* Fetch sessions without nwo

* Revert

* Package update
2025-11-21 23:32:44 +00:00
Jeff Hunter 4cbd1042c0 Inline completions in @vscode/chat-lib (#2131)
* Include inline completions in @vscode/chat-lib

* Follow type imports, * exports without "as", and jsxImportSource pragmas for dependency extraction

* update @vscode/chat-lib test configuration

* update chat lib extraction with new path and add context setup for lib

* initial stubs for inline completions test

* round trip test for getInlineCompletions

* remove unused path mappings

* fix type import

* send only original event names for chat-lib telemetry

* fix wasm loading in chat-lib

* have locateFile default to the current dir if the expected parent directory cannot be found

* update to use service injection with completions in chat-lib

* update citation and ExP handling for completions in chat-lib

* hook up enhanced telemetry for chat-lib

* add missing tsx package

* update post-install script to work with pre-built and unbuilt versions of chat-lib and add missing completions dependencies

* remove unneeded try/catch block

* correct typo

* generate package-lock from correct npm version

---------

Co-authored-by: Christof Marti <chrmarti@microsoft.com>
2025-11-21 15:47:14 +00:00
Sandeep Somavarapu fedde4f1f1 clean up: (#2040)
* clean up:
- simplify namespaces
- simplify defining settings

* fix unused configs
2025-11-17 14:32:52 +00:00
Christof Marti 497c81944b Add INESProvider.updateTreatmentVariables() (#1932)
* Add INESProvider.updateTreatmentVariables()

* Update src/lib/node/chatLibMain.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Dispose

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-11 22:07:33 +00:00
Ross Wollman a3e519c7c3 test: support anonymous/BYOK mode (#1491)
* move deviceid to own class

* rename env var

* wip: simplify token manager

* fix test compilation

* address review feedback

* remove extraneous diffs

* switch to lazy token provider

* remove extra diff
2025-10-23 01:14:07 +00:00
Christof Marti feb8766822 Fix telemetry (#1191) 2025-09-29 21:06:05 +00:00
Christof Marti 9e1bdf2c8a Surface logging interface (#1159) 2025-09-26 10:35:12 +00:00
Christof Marti 044915f2fe Surface ITelemetrySender (#979) 2025-09-09 15:58:21 +00:00
Christof Marti 31799df2cc Add missing methods (#881)
* Update @vscode/copilot-api

* Add missing methods

* Omit source
2025-09-03 10:03:36 +00:00
Christof Marti 3b795b1e58 Extract @vscode/chat-lib (#807)
* Extract chat lib

* Extract chat lib

* Add test

* Get test working

* Simulate response

* Fix type issue

* Package

* Cleanup

* Tuck away workspace service

* Include package.json

* Ensure shim is used

* Include tiktoken files in package

* Update @vscode/copilot-api

* Ignore chat-lib
2025-08-29 14:41:27 +00:00