diff --git a/.github/prompts/fix-error.prompt.md b/.github/prompts/fix-error.prompt.md new file mode 100644 index 00000000000..1cdc8fa90b4 --- /dev/null +++ b/.github/prompts/fix-error.prompt.md @@ -0,0 +1,17 @@ +--- +agent: agent +description: 'Fix an unhandled error from the VS Code error telemetry dashboard' +argument-hint: Paste the GitHub issue URL for the error-telemetry issue +tools: ['edit', 'search', 'execute/getTerminalOutput', 'execute/runInTerminal', 'read/terminalLastCommand', 'read/terminalSelection', 'execute/createAndRunTask', 'execute/runTask', 'read/getTaskOutput', 'search/usages', 'read/problems', 'search/changes', 'execute/testFailure', 'todo', 'execute/runTests', 'web/fetch', 'web/githubRepo'] +--- + +The user has given you a GitHub issue URL for an unhandled error from the VS Code error telemetry dashboard. Fetch the issue to retrieve its details (error message, stack trace, hit count, affected users). + +Follow the `fix-errors` skill guidelines to fix this error. Key principles: + +1. **Do NOT fix at the crash site.** Do not add guards, try/catch, or fallback values at the bottom of the stack trace. That only masks the problem. +2. **Trace the data flow upward** through the call stack to find the producer of invalid data. +3. **If the producer is cross-process** (e.g., IPC) and cannot be identified from the stack alone, **enrich the error message** with diagnostic context (data type, truncated value, operation name) so the next telemetry cycle reveals the source. Do NOT silently swallow the error. +4. **If the producer is identifiable**, fix it directly. + +After making changes, check for compilation errors via the build task and run relevant unit tests. diff --git a/.github/skills/fix-errors/SKILL.md b/.github/skills/fix-errors/SKILL.md new file mode 100644 index 00000000000..7a03d11ee0c --- /dev/null +++ b/.github/skills/fix-errors/SKILL.md @@ -0,0 +1,71 @@ +--- +name: fix-errors +description: Guidelines for fixing unhandled errors from the VS Code error telemetry dashboard. Use when investigating error-telemetry issues with stack traces, error messages, and hit/user counts. Covers tracing data flow through call stacks, identifying producers of invalid data vs. consumers that crash, enriching error messages for telemetry diagnosis, and avoiding common anti-patterns like silently swallowing errors. +--- + +When fixing an unhandled error from the telemetry dashboard, the issue typically contains an error message, a stack trace, hit count, and affected user count. + +## Approach + +### 1. Do NOT fix at the crash site + +The error manifests at a specific line in the stack trace, but **the fix almost never belongs there**. Fixing at the crash site (e.g., adding a `typeof` guard in a `revive()` function, swallowing the error with a try/catch, or returning a fallback value) only masks the real problem. The invalid data still flows through the system and will cause failures elsewhere. + +### 2. Trace the data flow upward through the call stack + +Read each frame in the stack trace from bottom to top. For each frame, understand: +- What data is being passed and what is expected +- Where that data originated (IPC message, extension API call, storage, user input, etc.) +- Whether the data could have been corrupted or malformed at that point + +The goal is to find the **producer of invalid data**, not the consumer that crashes on it. + +### 3. When the producer cannot be identified from the stack alone + +Sometimes the stack trace only shows the receiving/consuming side (e.g., an IPC server handler). The sending side is in a different process and not in the stack. In this case: + +- **Enrich the error message** at the consuming site with diagnostic context: the type of the invalid data, a truncated representation of its value, and which operation/command received it. This information flows into the error telemetry dashboard automatically via the unhandled error pipeline. +- **Do NOT silently swallow the error** — let it still throw so it remains visible in telemetry, but with enough context to identify the sender in the next telemetry cycle. +- Consider adding the same enrichment to the low-level validation function that throws (e.g., include the invalid value in the error message) so the telemetry captures it regardless of call site. + +### 4. When the producer IS identifiable + +Fix the producer directly: +- Validate or sanitize data before sending it over IPC / storing it / passing it to APIs +- Ensure serialization/deserialization preserves types correctly (e.g., URI objects should serialize as `UriComponents` objects, not as strings) + +## Example + +Given a stack trace like: +``` +at _validateUri (uri.ts) ← validation throws +at new Uri (uri.ts) ← constructor +at URI.revive (uri.ts) ← revive assumes valid UriComponents +at SomeChannel.call (ipc.ts) ← IPC handler receives arg from another process +``` + +**Wrong fix**: Add a `typeof` guard in `URI.revive` to return `undefined` for non-object input. This silences the error but the caller still expects a valid URI and will fail later. + +**Right fix (when producer is unknown)**: Enrich the error at the IPC handler level and in `_validateUri` itself to include the actual invalid value, so telemetry reveals what data is being sent and from where. Example: +```typescript +// In the IPC handler — validate before revive +function reviveUri(data: UriComponents | URI | undefined | null, context: string): URI { + if (data && typeof data !== 'object') { + throw new Error(`[Channel] Invalid URI data for '${context}': type=${typeof data}, value=${String(data).substring(0, 100)}`); + } + // ... +} + +// In _validateUri — include the scheme value +throw new Error(`[UriError]: Scheme contains illegal characters. scheme:"${ret.scheme.substring(0, 50)}" (len:${ret.scheme.length})`); +``` + +**Right fix (when producer is known)**: Fix the code that sends malformed data. For example, if an authentication provider passes a stringified URI instead of a `UriComponents` object to a logger creation call, fix that call site to pass the proper object. + +## Guidelines + +- Prefer enriching error messages over adding try/catch guards +- Truncate any user-controlled values included in error messages (to avoid PII and keep messages bounded) +- Do not change the behavior of shared utility functions (like `URI.revive`) in ways that affect all callers — fix at the specific call site or producer +- Run the relevant unit tests after making changes +- Check for compilation errors via the build task before declaring work complete