* AH: replace agentSdks {url, sha256} with {urlTemplate}
product.agentSdks.<sdk> now ships {version, urlTemplate} instead of
{version, url, sha256}. The runtime substitutes {sdkTarget} into the
template per-launch via a new IAgentSdkPackage.currentSdkTarget()
hook — Claude appends -musl on musl Linux hosts (detected from Node's
process.report.header.glibcVersionRuntime, no subprocess), Codex
never does (statically musl-linked, single Linux SKU).
Why the shape change:
1. macOS Universal bundles ship arm64 + x64 binaries sharing one
product.json — a fixed per-platform {url, sha256} could only be
correct for one of the two halves. The template lets the same
bundle serve both.
2. The sha256 was belt-and-suspenders: product.json is covered by
product.checksums inside the signed app bundle, URLs are HTTPS
to a Microsoft-controlled CDN. The sha only guarded "trusted URL
string, tampered edge bytes" — a much harder attack than
tampering with product.json itself.
Downloader changes: sdkTarget joins the cache key path
(<userDataPath>/agent-host/sdk-cache/<pkg>/<sdkVersion>/<sdkTarget>/)
so Universal launches with different resolved targets get independent
caches. .complete sentinel content is now the source URL (debug-only;
the file's existence is the integrity signal).
isAvailable() now gates on both product config AND currentSdkTarget()
resolving, so the provider doesn't register on hosts with no SKU
(armhf) even if some future product.json carries an agentSdks block.
Tests: drop sha-mismatch + stale-cache-by-sha tests; add coverage for
{sdkTarget} substitution, separate-cache-dirs-for-different-targets
(the Universal motivating case), currentSdkTarget-undefined gating
isAvailable, and CodexSdkPackage.currentSdkTarget agreeing with the
existing codexPackageSuffix table.
Pairs with the build PR (#321012), which will be rebased to emit the
new shape once this lands. See build/agent-sdk/TODO.md on that branch.
* AH: simplify per-package SDK target resolution
Replaces per-package `currentSdkTarget()` (one method per SDK,
re-implementing the same platform/arch table modulo a musl branch)
with a single boolean `hasSeparateMuslLinuxPackage` on the package
descriptor and a shared `resolveSdkTarget(pkg, host)` in the
downloader. Claude sets it true; Codex sets it false. The supported-
platforms whitelist collapses from three copies (claudeSdkTarget,
codexPackageSuffix, build's getSdkTargetForBuild) to one runtime
resolver paired with the build helper.
Host injection: AgentSdkDownloader gains an optional leading
`ISdkTargetHost` constructor param (per project convention, non-DI
before DI). Production call sites pass `undefined` to derive from
`process`; tests pass synthetic hosts to exercise Universal launches
and musl Linux without touching `process`.
Other simplifications applied from review:
- `_cacheHit` was a one-line passthrough → inlined to
`_fileService.exists(sentinel)` at both call sites.
- `_pendingDownloads` key now uses `cacheDir` directly (already
unique per pkg/version/target) instead of allocating a parallel
`<pkg>/<version>/<target>` key string.
- `.complete` sentinel content is now empty — the file's existence
is the integrity signal, the cache dir path already encodes
`<pkg>/<version>/<sdkTarget>` for debugging.
- `detectLibc()` returns `LibcFamily | undefined` on non-Linux
instead of `'glibc'`-by-convention (drops consumer-specific
phrasing from the primitive).
- Test's `listLeftovers` recursive walker replaced with a direct
`readdir` of the known version dir (the only level where scratch
dirs can land).
- Tests collapse 4 direct `new AgentSdkDownloader(...)` blocks
through `makeDownloader(null, host)`.
- `IAgentSdkProductConfig` JSDoc trimmed to interface contract;
rationale lives in roadmap.md Phase 15.
- `_failureLatch` doc explains why per-id (not per-target) granularity
is intentional.
Tests: 25 passing (3 libc + 3 resolveSdkTarget + 13 downloader +
6 codex paths). New `resolveSdkTarget` suite covers the cross-product
of {claude, codex} × {linux glibc, linux musl, darwin, win32} that
previously lived as scattered table tests.
* AH: drop test-only host injection on AgentSdkDownloader
The previous commit added a constructor param to inject a synthetic
`(platform, arch, libc)` into the downloader so tests could exercise
Universal launches and musl Linux from any CI host. Production passed
`undefined` and the body fell back to a derived host — a test-only
ceremony in production code.
Restructured so the runtime stays clean:
- `resolveSdkTarget(pkg, host?)` keeps its optional `host` param,
defaulting to the real process. Cross-host coverage lives in
dedicated unit tests that call it directly.
- `AgentSdkDownloader` no longer takes a host. Both call sites
revert to `createInstance(AgentSdkDownloader)` with no extras.
- Integration suite `suiteSetup` skips on hosts the downloader
can't target (e.g. linux-armhf), and pins `hostSdkTarget` for
path assertions. The "two-host cache key" assertion becomes a
direct path check on the host's resolved target instead of an
artificial second-host download.
Tests: 23 passing (3 libc + 3 resolveSdkTarget unit + 11 downloader
integration + 6 codex paths).
* AH: address PR review — validate urlTemplate placeholders + honor backpressure
Two findings from #321078 review:
1. `format2()` silently leaves unknown placeholders untouched, so a
vscode-distro typo like `{sdkTaret}` would produce a 404 from the
CDN with no hint at the real cause. Add a `{...}` scan after
substitution that throws an actionable error pointing at the
suspect product.json field. Covered by a new test.
2. The hand-rolled `_fetch` pipe ignored `out.write()`'s return value,
so a slow disk (Windows AV scan, network home dir) could buffer
the entire 70-95MB tarball in memory. Pause the source stream on
write-buffer full, resume on drain. Can't use `stream/promises
.pipeline()` here because `IRequestContext.stream` is a
`VSBufferReadableStream`, not a Node Readable — the source's own
`pause()`/`resume()` is what we have to work with.
Cancellation test still passes; backpressure change is transparent to
the cancel teardown.
* AH: per-platform agent SDK build + CDN upload (#7885)
Per-platform VS Code build jobs now produce + upload the Claude and
Codex agent SDK tarballs to main.vscode-cdn.net and stamp the resulting
url/sha into `product.agentSdks` of their own packaged product.json.
The build step (`build/azure-pipelines/common/agent-sdk-produce.yml`)
runs inline in each existing platform job (darwin/linux/win32/alpine),
before the gulp packaging step. It always builds the tarballs. The
AzureCLI credential fetch and the CDN upload are gated on
`VSCODE_PUBLISH=true` — test pipeline runs leave the tarballs as a
pipeline artifact (`agent_sdk_<platform>_<arch>_tarballs`) for
inspection but don't touch the CDN, and ship product.json without
`agentSdks` (same shape as a local dev build).
The REH gulpfile only stamps `agentSdks` for `type === 'reh'`; REH-web
skips it because the agent host is node-only.
* AH: use npm.cmd on Windows in agent SDK build
`spawnSync('npm', ...)` fails on Windows because npm ships as a `.cmd`
shim and Node's child_process doesn't resolve PATHEXT without an
explicit suffix. The Windows pipeline jobs were dying with `exited
null` and no further context.
Also surface `result.error` so a future spawn-resolution failure shows
the actual ENOENT instead of a bare exit-code message.
* AH: pass shell:true when spawning npm.cmd on Windows
Node 20+ (CVE-2024-27980) refuses to spawn `.cmd`/`.bat` files without
`shell: true` and fails with `EINVAL`. The Windows pipeline jobs hit
this after the previous fix swapped `npm` for `npm.cmd`.
* AH: emit {version, urlTemplate} per the runtime shape change
Stacked on top of the runtime PR (tyler/agent-sdk-url-template). With
the runtime now consuming `{version, urlTemplate}` and substituting
`{sdkTarget}` per launch, the build pipeline emits the matching shape:
- `IAgentSdkResults[<sdk>]` drops `{url, sha256}` for
`{version, urlTemplate}`.
- `produce.ts` still uploads its platform's tarballs (idempotent
HEAD-then-skip in upload.ts is unchanged), but the results JSON
every job writes is identical per SDK — only the version differs.
That's the whole point: macOS Universal can ship one product.json
that covers both arm64 and x64 launches because the runtime
resolves {sdkTarget} per launch.
- New `buildCdnUrlTemplate(sdk, version)` mirrors `buildCdnUrl`'s
path but leaves `{sdkTarget}` as the format2 placeholder.
- README updated; TODO.md (the placeholder note left while the
runtime PR was pending) deleted.
Tarballs at the existing CDN paths (e.g.
`agent-sdk/claude/0.3.168/darwin-arm64.tgz`) remain valid and reachable
— no re-upload needed, just a re-stamp of product.json on the next
publish run.
* AH: address PR review on build/agent-sdk (Copilot)
Five comments from the build PR review:
1. common.ts header named drift-check.ts (deleted during simplification)
and missed produce.ts. Updated.
2. common.ts "single source of truth is package.json optionalDependencies"
was aspirational — getSdkTargetForBuild is a hardcoded table. Reframed
the comment to describe what we actually do (hardcoded table kept in
lockstep by convention) and why (no runtime npm metadata lookup).
3. package.ts header said the library form is consumed by gulpfile
packaging tasks — actually called from produce.ts as its own pipeline
step. Updated.
4. + 5. isCliInvocation() in package.ts and upload.ts compared
import.meta.url to a manually constructed `file://${process.argv[1]}`,
which breaks on Windows (drive letters URL-encoded, spaces escaped).
Repo already established the cleaner `import.meta.filename ===
process.argv[1]` pattern (see build/npm/installStateHash.ts:143).
Pure portability fix — only affects the dev-mode CLIs, the production
pipeline calls these as library functions.
* AH: per-SDK agents/<sdk>/{package.json,package-lock.json} for byte determinism
Build 447090 surfaced sha drift across pipeline runs: same exact-pinned
SDK version, but transitive deps unlocked (`npm install
--no-package-lock`) → different bytes → CDN HEAD-then-fail rejected
the re-upload. Determinism is load-bearing for the security model
(content-addressed CDN URLs, no runtime sha verification — the bytes
at a given URL must be stable).
Fix: ship pinned lockfiles per SDK and use `npm ci`. New layout:
build/agent-sdk/agents/
claude/
package.json # exact one dep: @anthropic-ai/claude-agent-sdk@0.3.168
package-lock.json # full transitive graph
codex/
package.json # exact one dep: @openai/codex@0.134.0
package-lock.json
Bonus: the folder set IS the SDK list. Drops the hardcoded
`SDKS: readonly Sdk[]` and `PACKAGE_NAME: Record<Sdk, string>` from
common.ts; replaced with `getSdks()` (discovers from `agents/`) and
`getAgentMeta(sdk)` (reads from the agent's own package.json). Adding
a new SDK is now one folder + lockfile gen + commit.
Verified reproducible locally: two back-to-back runs of `package.ts
--sdk=codex --target=darwin-arm64` produce byte-identical tarballs.
NOTE: existing CDN blobs from build 446990 carry the old drifted
shas. The next publish will fail HEAD-then-skip against them. Need to
delete `agent-sdk/{claude,codex}/{0.3.168,0.134.0}/*.tgz` from the
vscodeweb storage account's $web container before re-publishing, or
the upload step will refuse with "blob already present with DIFFERENT
sha256".
* AH: bump pinned SDK versions to sidestep stale CDN blobs
claude 0.3.168 → 0.3.169 (one point release; 0.3.170/172/173/174/175
all exist upstream, sticking to the next bump for risk minimisation).
codex 0.134.0 → 0.135.0 (next stable; 0.135-0.139 are all stable
releases, picking the immediate successor).
Bumping versions changes the CDN URL path (`agent-sdk/<sdk>/<version>/...`)
so the next publish lands at fresh, never-uploaded blob URLs. Avoids
having to delete the drift-shaped blobs from build 446990 that would
otherwise trip HEAD-then-fail.
Bumped both the per-SDK `agents/<sdk>/package.json` (the build's pin)
and repo-root `package.json` devDeps (the runtime's type-import pin)
in lockstep, with all four lockfiles regenerated. Local reproducibility
re-verified: two back-to-back runs of `package.ts --sdk=codex
--target=darwin-arm64` produce byte-identical tarballs at the new pin.
Runtime typecheck clean — no API changes to either SDK in these point
releases.
* AH: stub usage_EXPERIMENTAL on test Query fakes (SDK 0.3.169)
Claude SDK 0.3.169 added `usage_EXPERIMENTAL_MAY_CHANGE_DO_NOT_RELY_ON_THIS_API_YET`
as a required method on `Query`. Three test files implement the
interface as fakes (FakeQuery, ImmediatelyDoneQuery, RoundTripQuery)
and broke the type-check on tsgo.
Stubbed each as `throw 'not modeled'` matching the existing pattern
for every other method these fakes don't exercise. The field name
makes it clear the SDK doesn't expect anyone to rely on it yet, so a
"not modeled" stub is honest.
* AH: authenticate npmrc before agent SDK `npm ci`
Build 447232 hit E401 from the private npm mirror: the platform job's
existing "Setup NPM Authentication" step is gated on the node_modules
cache being a miss (it lives in the cache-warming path), so on a cache
hit the user's ~/.npmrc has no auth token, and our agent-sdk `npm ci`
inherits the global registry override + missing auth → E401.
Fix: add an always-on auth step at the top of agent-sdk-produce.yml.
Captures the user's npmrc path, runs npmAuthenticate@0 against it. Now
runs independent of the node_modules cache state.
The previous npm install --no-package-lock path tolerated this because
it fell back to anonymous resolution against npmjs.org. `npm ci`
strictly resolves through the configured (private) registry, which
needs auth. The lockfile + private mirror combination is what we want
for supply-chain auditing — the fix is to make sure auth is set up
unconditionally rather than bypass the mirror.
Also reverts a brief stop-along-the-way that added
`--registry=https://registry.npmjs.org/` to the npm ci call — wrong
direction (would bypass the supply-chain mirror).
* AH: create ~/.npmrc with `npm config set` before authenticating
npmAuthenticate@0 errored on cache-hit runs: the .npmrc path returned
by `npm config get userconfig` is just where npm WOULD write — the
file doesn't exist until something actually writes to it. The platform
job's "Setup NPM" step creates it via `npm config set registry`, but
is skipped on cache hits.
Mirror that pattern in our prep step: run `npm config set registry`
ourselves (idempotent — rewrites the same value the existing config
already has on cache misses) so npmAuthenticate@0 has a real file to
edit.
* AH: move agent SDK step ahead of Download Copilot VSIX
Was: install-builtin → VSIX-background → Compile → … → VSIX-attach → agent-sdk → Build client.
Now: install-builtin → agent-sdk → VSIX-background → Compile → … → VSIX-attach → Build client.
No data dependency between the agent SDK step and the VSIX download
(or Compile, for that matter — agent SDK uses its own scratch dir,
its own npmrc, doesn't read node_modules or anything from out-build).
Benefit: fail-fast. The agent SDK step previously ran after Compile +
both VSIX wait points, so a CDN auth failure or a sha-mismatch
would only surface ~10 minutes into the job. Moving it earlier
catches those failures in seconds, before any heavy work runs.
Applied consistently across darwin/linux/win32/alpine (linux still
gated on `ne(VSCODE_ARCH, 'armhf')`).
build/agent-sdk
Per-platform agent SDK production. Each VS Code build (darwin-arm64,
linux-x64, Alpine REH, etc.) uploads its own platform's SDK tarballs
to main.vscode-cdn.net and stamps agentSdks into the shipped
product.json with a {version, urlTemplate} per SDK. Every platform
job emits the same urlTemplate per SDK — the runtime substitutes
{sdkTarget} per launch via resolveSdkTarget(), which is what lets
macOS Universal bundles share one product.json across arm64 + x64.
The runtime side (src/vs/platform/agentHost/) downloads and caches
the SDK tarball at first use. See IAgentSdkProductConfig in
src/vs/base/common/product.ts for the contract.
How the pipeline uses this
The platform packaging jobs (Linux, macOS, Windows, Alpine) each include
the shared template build/azure-pipelines/common/agent-sdk-produce.yml
before the existing gulp vscode-<platform>-<arch>-min-ci step:
- template: ../../common/agent-sdk-produce.yml@self
parameters:
vscodePlatform: linux
The template runs node build/agent-sdk/produce.ts --vscode-platform=<x> --arch=$(VSCODE_ARCH), which iterates the SDKs (SDKS = ['claude', 'codex']), figures out the matching sdkTarget for (vscode-platform, arch, sdk) via getSdkTargetForBuild, runs buildOne for each in
parallel, and drops the tarballs in
$(Build.SourcesDirectory)/.build/agent-sdk/tarballs/.
Publish vs test runs
produce.ts reads the pipeline variable VSCODE_PUBLISH from env (Azure
auto-injects all non-secret pipeline variables) to decide whether to
hit the CDN:
-
VSCODE_PUBLISH=true(real release builds) — the AzureCLI@2 step inside the template fetches CDN credentials,produce.tscallsuploadOnefor every tarball (HEAD-then-decide idempotent), writes the results JSON, and emits##vso[task.setvariable variable=AGENT_SDK_RESULTS_FILE]<path>. The downstream gulp packaging step then stampsproduct.agentSdksviareadAgentSdkResults(). -
VSCODE_PUBLISHunset or not'true'(PR runs, CI runs, manual test runs with the publish toggle off) — the AzureCLI credential step is skipped, the upload is skipped, no results file is written, andtask.setvariableis not emitted. The tarballs are still produced and published as a pipeline artifact namedagent_sdk_<vscodePlatform>_<arch>_tarballsso you can download and inspect them. product.json ships withoutagentSdks— same shape as a local dev build, so the runtime falls back to the per-provider env-var override.
Where the agentSdks gating lives
Inside packageTask's jsonEditor callback (the same one that injects
commit / date / checksums / version), readAgentSdkResults() loads
the results file (returns {} when the env var is unset) and merges
agentSdks into product.json. The REH gulpfile only writes agentSdks
for type === 'reh'; the REH-web variant skips it because the agent host
is node-only and the SDK config has no consumer in a browser-served
server.
Local gulp vscode-darwin-arm64 invocations don't set
AGENT_SDK_RESULTS_FILE and don't have VSCODE_PUBLISH=true, so
readAgentSdkResults() returns {} and product.json ships without
agentSdks — same UX as today's no-config build.
Why two steps, not inline-in-gulp
The agent SDK work is a distinct concern from the VS Code packaging gulp graph. As its own pipeline step:
- Visible in the build log — operators see a discrete "Agent SDK: build
- upload" step they can click into instead of grepping inside "Build client" output.
- Independently re-triggerable — if the SDK step fails, the operator can re-run just the platform job; if it succeeds but the gulp step fails, the SDK upload is already idempotent (HEAD-then-skip).
- Doesn't add async-stream complexity to the gulpfile.
packageTaskstays a sync stream-returning function; the only change is one synchronousreadAgentSdkResults()call inside the existingjsonEditorcallback.
Files
agents/<sdk>/— one folder per SDK we ship. Each contains apackage.json(single dependency: the SDK's own npm package, pinned to an exact version) and apackage-lock.json(full transitive graph). Folder name = SDK id = key underproduct.agentSdks= path segment in the CDN URL. The set of folders IS the SDK list — no parallel array to keep in sync.common.ts— types,getSdks()(discovers SDKs fromagents/),getAgentMeta()/getSdkVersion()(reads fromagents/<sdk>/package.json, rejects^/~ranges),getSdkTargetForBuild()((vscodePlatform, arch, sdk) → npm-suffix),buildCdnUrl()/buildCdnUrlTemplate(),sha256OfFile(),parseFlags()for CLI flag parsing, andreadAgentSdkResults()for the gulpfile-side reader.package.ts—buildOne({ sdk, sdkTarget, outDir }). Runs on any OS: copiesagents/<sdk>/{package.json,package-lock.json}into a scratch dir,npm ciwithnpm_config_libc/os/cpufetches the foreign platform binary verbatim from the locked graph, then node-tar+gzip with reproducible flags. Has a thin CLI at bottom.upload.ts—uploadOne(...). HEAD-then-decide: absent → upload; matching sha → skip (idempotent re-runs); different / no-metadata sha → fail loud, refusing to overwrite content-addressed history. Thin CLI.produce.ts— pipeline-step entry. For one(vscode-platform, arch), iterates the SDKs in parallel, callsbuildOne+uploadOnefor each that applies, writes results toAGENT_SDK_RESULTS_FILE, and emits##vso[task.setvariable]so downstream pipeline steps see the path.
Bumping an SDK version
- Edit the
dependenciesversion inbuild/agent-sdk/agents/<sdk>/package.jsonto the new exact version. - From that directory:
npm install --package-lock-only --ignore-scriptsto refreshpackage-lock.json. - Also bump the matching
devDependenciesentry in repo-rootpackage.json(the runtime imports types from that copy) so the shipped types and the build-time pin stay in lockstep. npm installat repo root to refresh the root lockfile.- Commit all four edits together.
The next pipeline run rebuilds + uploads each platform tarball at the
new content-addressed CDN path and re-stamps each product.json with
the new urlTemplate pointing at the bumped version.
No human-paste step into vscode-distro. No coordination between jobs.
Local dev
Build one tarball locally:
node build/agent-sdk/package.ts --sdk=claude --target=darwin-arm64 --out=/tmp/out
For OSS contributors who want to drive the agent host without going through the CDN, point the dev override env vars at a local SDK install:
VSCODE_AGENT_HOST_CLAUDE_SDK_ROOT=/path/to/anthropic-claude-sdk-install \
./scripts/code.sh
(See src/vs/platform/agentHost/common/agentService.ts for env var names.)