* Expand extended cache TTL model list + add message-breakpoint sub-toggle
Two related changes to Anthropic Messages API prompt caching:
1. Expand modelSupportsExtendedCacheTtl beyond just the 1M context
variants. Per Anthropic docs, the 1h cache TTL is available on all
active models; this widens our opt-in list to Claude Opus 4.5/4.6/4.7
and Sonnet 4.5/4.6 (all variants, not just -1m).
2. Add a new experiment-based setting
chat.anthropic.promptCaching.extendedTtlMessages as a strict
sub-toggle of the existing extendedTtl setting. When both are on, the
rolling message-level breakpoints (last cacheable user / tool-result
blocks set by addMessagesApiCacheControl) also use the 1h TTL
instead of the default 5m. Nested rather than orthogonal because
Anthropic requires longer-TTL breakpoints to appear before shorter
ones in the tools->system->messages prefix order.
Tests: 69 passed. Added a suite for isExtendedCacheTtlMessagesEnabled
(parent on/off x sub on/off matrix + inherited model/location/subagent
gates) and two tests verifying addMessagesApiCacheControl propagates
the new cacheTtl argument.
* Slim down extended cache TTL tests
- Trim isExtendedCacheTtlEnabled model-list test to just verify the
delegation (full boundaries are covered by modelSupportsExtendedCacheTtl).
- Remove redundant 'inherits gates from parent' test in
isExtendedCacheTtlMessagesEnabled suite — the parent×sub matrix plus
the parent's own gate tests already cover this.
- Merge the two addMessagesApiCacheControl ttl tests into one
parameterized assertion.
* Update stale comment about message breakpoint TTL
The comment claimed message breakpoints 'always use the default 5m TTL',
but that's no longer true when the new extendedTtlMessages sub-toggle is on.
* Refactor extended cache TTL: pass parentEnabled, drop misleading coercions
- isExtendedCacheTtlMessagesEnabled now takes parentEnabled: boolean
instead of re-running the parent gate. Call site passes the resolved
useExtendedCacheTtl directly, eliminating a duplicate experiment-service
lookup per request. Makes the 'sub-toggle of' relationship literal in
the signature.
- Drop the !! coercion on getExperimentBasedConfig<boolean> returns —
the generic guarantees T, so the coercion was misleading defensive
noise.
- Narrow cacheTtl param from '5m' | '1h' to just '1h' on both
addToolsAndSystemCacheControl and addMessagesApiCacheControl. Per
Anthropic docs, { type: 'ephemeral' } already defaults to 5m, so '5m'
is never actually emitted on the wire and call sites never passed it.
- Stronger composition test for isExtendedCacheTtlEnabled — replaces
four single-axis tests with one table-driven matrix that exercises
all four gates simultaneously, catching short-circuit refactors.
- Table-driven 2x2 matrix for isExtendedCacheTtlMessagesEnabled.
- Trim user-facing extendedTtlMessages setting description; team-only
rationale (rolling breakpoints, 2x write premium) lives in the JSDoc.
- Update stale comment claiming message breakpoints always use 5m.
* Remove outdated comments about extended cache TTL models
Flips the default for `chat.freezeCustomizationsIndex` from `false` to `true` so the customizations payload (`<instructions>`/`<skills>`/`<agents>`) is frozen at the first turn of a conversation and reused on subsequent turns, stabilizing the system-prompt prefix for cache reuse.
* Freeze customizations-index per conversation to stabilize system prompt cache
Adds the experimental `github.copilot.chat.freezeCustomizationsIndex`
setting (advanced/experimental/onExp, default false). When on, the
bundled <instructions>/<skills>/<agents> listing in the system prompt is
snapshotted on the first turn and reused on every subsequent turn,
preventing per-turn churn (mode swap rewriting the active subagent in
<agents>, async experimentation flipping a when-gated skill in or out)
from invalidating the system prompt cache.
When the live listing drifts from the snapshot, the updated set is
appended to the latest user message inside AgentUserMessage's context
tag — kept inside the captured RenderedUserMessageMetadata so the
historical user message replays byte-identically on later turns. Drift
also fires with an empty value when the live variable disappears, so
the model gets a signal that previously-listed entries are gone.
Fixes#315408Fixes#316182
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Claude agent: add auto permission mode + non-streaming passthrough
- Adds 'auto' permission mode (model-classifier) to Claude sessions, gated
by new setting github.copilot.chat.claudeAgent.allowAutoPermissions
(default true).
- Threads allowAutoPermissions through ClaudeSessionOptionBuilder and the
reactive options pipeline in ClaudeChatSessionContentProvider.
- ClaudeLanguageModelServer: defer response headers until upstream content-type
is known and propagate the SDK's stream flag via requestOptions, so the
classifier's non-streaming (application/json) calls used by 'auto' mode
pass through correctly alongside the existing SSE path.
* add non-streaming passthrough response handling in ClaudeLanguageModelServer
* Claude auto mode: mark setting preview, link aka.ms, honor preview features policy
- Move github.copilot.chat.claudeAgent.allowAutoPermissions into the 'preview' config section and tag it with 'preview'.
- Description now links to https://aka.ms/vscode-claude-auto-mode.
- Auto mode is gated on copilotToken.isEditorPreviewFeaturesEnabled() in addition to the setting, so org-level preview-features policy disables it.
* Claude auto mode: default false, ExperimentBased + onExp tag
- Default off; flipped via experiment.
- Setting tagged with both 'preview' and 'onExp'.
- ConfigKey.ClaudeAgentAllowAutoPermissions is now ExperimentBased; option builder and content provider read it via getExperimentBasedConfig with the injected IExperimentationService.
* Claude auto mode: default to true in the agents window
* Revert "Claude auto mode: default to true in the agents window"
This reverts commit 9083b1c2fd.
* Sessions/Claude: add 'Auto' permission mode item to picker
* Sessions/Claude: gate Auto permission mode item on allowAutoPermissions setting
* Sessions/Claude: also gate Auto on chat entitlement preview features
* Sessions/Claude: use observableConfigValue for allowAutoPermissions
* Sessions/Claude: append Auto mode when enabled instead of filtering
* Sessions/Claude: stub IConfigurationService and IChatEntitlementService in picker tests
---------
Co-authored-by: bhavyaus <bhavyau@microsoft.com>
Re-introduces `cache_control.ttl: "1h"` for the Anthropic Messages
API tools + system breakpoints, gated on the main agent conversation
where the 2x cache-write cost trades favourably against the longer
hit window. Previously reverted from the copilot-chat repo.
All four gates must hold:
- Model is a 1M-context Claude variant (`claude-opus-4-{6,7}-1m...`)
- Setting `github.copilot.chat.anthropic.promptCaching.extendedTtl` is
on (ConfigType.ExperimentBased, default false, advanced/experimental/onExp)
- Location is `ChatLocation.Agent` (Panel/Editor/Terminal/Notebook/
EditingSession/Other and both proxy locations are excluded)
- Request is not a subagent (typed via
`interactionTypeOverride === 'conversation-subagent'`, the same
source of truth as the `X-Interaction-Type` wire header)
When all gates pass:
- The `extended-cache-ttl-2025-04-11` beta header is added
- The last non-deferred tool and the last system block carry
`cache_control: { type: 'ephemeral', ttl: '1h' }`. The two rolling
message breakpoints keep the default 5m TTL, satisfying Anthropic's
longer-TTLs-before-shorter ordering rule.
Tests: messagesApi.spec.ts now at 65 tests (was 59); adds dedicated
`modelSupportsExtendedCacheTtl` and `isExtendedCacheTtlEnabled`
suites covering every gate explicitly.
Tool search is now always enabled for gpt-5.4/gpt-5.5, matching the
messages API path. Aligns the responses API on the same
endpoint.supportsToolSearch capability flag.
Also registers ToolSearchTool for gpt-5.4/gpt-5.5 and the
claude-opus-4.7 variants so model-specific tool gating actually
matches the supported endpoints.
Memory tool is now always enabled. Removes the preview gate, the config
key, the now-unused DI params on MemoryTool/MemoryContextPrompt/
MemoryInstructionsPrompt, and isAnthropicMemoryToolEnabled (replaced by
modelSupportsMemory at the BYOK call site).
Strip Copilot Memory (CAPI) feature entirely
Removes the CAPI-backed Copilot Memory that synced repository-scoped facts
to GitHub. The local file-based MemoryTool with user/session/repo scopes
remains as the sole memory mechanism.
- Delete AgentMemoryService and its test.
- Remove the github.copilot.chat.copilotMemory.enabled setting and its NLS string.
- Remove ConfigKey.CopilotMemoryEnabled.
- Strip all CAPI gating in memoryTool.tsx, memoryContextPrompt.tsx, tools.ts.
- Drop _dispatchRepoCAPI / _repoCreate / _sendRepoTelemetry.
- /memories/repo/ now always routes to local storage.
- Update memoryTool.spec.tsx: remove mock CAPI services and CAPI-only tests.
- Update simulationExtHostToolsService.ts for the new ToolsContribution arity.
* refactor: enhance cache control logic for tools and system prefixes
* refactor: enhance summarization and cache control handling in agent prompts
* refactor: remove unused countCacheControl function from messagesApi tests
* Add tests for summarization and cache control features
- Introduced a new snapshot test for summarization without cache breakpoints in `agentPrompt.spec.tsx`.
- Added a new test suite for `clearAllCacheControl` in `messagesApi.spec.ts` to validate cache control stripping and limits.
- Created a snapshot for summarization without cache breakpoints in the new snapshot file.
* test: pass enableSummarization in summarization.spec helper
The agentPrompt prop split decoupled enableSummarization from
enableCacheBreakpoints, so summarization.spec.tsx — which only set
enableCacheBreakpoints — fell through to the non-summarized branch
and broke 5 snapshot tests. Default the helper's baseProps to
enableSummarization: true so each test exercises the summarized path
as intended; individual tests can still override via otherProps.
* Introduce proposed agentsWindow configuration extension point
Add a new `agentsWindow` property to the configuration contribution point
that allows extensions to declare per-setting default value overrides and
read-only behavior for the Agents window.
Shape: `agentsWindow: { default?: unknown; readOnly?: boolean }`
- Gated behind `agentsWindowConfiguration` proposed API
- SessionsDefaultConfiguration uses agentsWindow.default as the default value
- ReadOnly settings are excluded from user configuration parsing and
cannot be written via updateValue
- Settings editor shows lock indicator for readOnly settings
- `@override:agentsWindow` filter in settings search (agents window only)
- Adopted all existing hardcoded session defaults to use the new schema
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* revert
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Add the github.copilot.chat.agent.backgroundTodoAgent.enabled setting
to the advanced configuration section, fixing the CI failure in
configurations.spec.ts that requires all experiment-backed config
keys to have a corresponding package.json entry.
Adds `github.copilot.chat.otel.maxAttributeSizeChars` setting and
`COPILOT_OTEL_MAX_ATTRIBUTE_SIZE_CHARS` env var to control truncation
of free-form OTel content attributes (prompts, responses, tool
arguments/results, hook input/output).
Default is `0` (unlimited), matching the OTel spec's
`AttributeValueLengthLimit` default of `Infinity`. Users on backends
with per-attribute size limits can set a positive value to keep OTLP
batches under the backend cap.
Plumbs the resolved limit through every call site that previously hit a
hardcoded 64KB fallback. Drops the `DEFAULT_MAX_OTEL_ATTRIBUTE_LENGTH`
constant; `truncateForOTel`'s default arg is now `0` (unlimited).
Refs #299952
* Try to get model and token info to show up for cli in chat
Co-authored-by: Copilot <copilot@github.com>
* make sure that auto mode persists for CLI
* make ui appear for non-contoller api route
* make controller route work without modifying requestHandler
Co-authored-by: Copilot <copilot@github.com>
* dqwdqwdwq
* ship more stuf
Co-authored-by: Copilot <copilot@github.com>
* gate this behind setting
Co-authored-by: Copilot <copilot@github.com>
* cleaner
Co-authored-by: Copilot <copilot@github.com>
* stop messing with tests
Co-authored-by: Copilot <copilot@github.com>
* try fix test
* Make sure the test pass
Co-authored-by: Copilot <copilot@github.com>
* rename better
Co-authored-by: Copilot <copilot@github.com>
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: justschen <justchen@microsoft.com>