1
0
mirror of https://github.com/home-assistant/supervisor.git synced 2026-05-18 21:58:52 +01:00
Files
Stefan Agner 97bc19d4b3 Detect container registry rate limits uniformly (#6732)
* Detect container registry rate limits uniformly

Container registry rate limits reach Supervisor in three distinct shapes:

  1. HTTP 429 from the daemon - recognised today, but the exception and
     resolution issue are hardcoded to Docker Hub. Since Core/Supervisor/
     plugin images all live on ghcr.io now, virtually every 429 we see in
     the field is actually a GHCR throttle that we mislabel. The biggest
     Sentry issue (SUPERVISOR-16BK) has >115k events / >93k users, all
     pulling a ghcr.io image, yet each user is told to "log into
     Docker Hub".
  2. HTTP 500 with 'toomanyrequests' in the body - not recognised. Docker
     daemons before 28.3.0 wrap upstream 429s as 500 (fixed upstream by
     moby/moby 23fa0ae74a, "Cleanup http status error checks"). The large
     fleet on older daemons still produces this shape.
  3. JSON error event during a streaming pull - not recognised. Once the
     daemon starts writing the 200 OK response body the status is locked
     in, so rate limits that land during layer download arrive as plain
     text in the pull stream. Happens on all recent daemon versions -
     SUPERVISOR-13FQ (>16k events) and SUPERVISOR-13E0 (>8k events) are
     two large examples.

Cases 2 and 3 propagate as plain DockerError, bypass the 429 detection in
install() entirely, never produce a DOCKER_RATELIMIT resolution issue, and
generate large amounts of Sentry noise. Case 1 is detected but routes
every GHCR 429 through Docker-Hub-specific messaging and suggestions.

Changes:

- Add DockerRegistryRateLimitExceeded as the common base class and
  GithubContainerRegistryRateLimitExceeded alongside the existing
  DockerHubRateLimitExceeded. All extend APITooManyRequests so callers
  and retry logic can key off a single type.
- Add GITHUB_RATELIMIT IssueType so GHCR failures don't show the
  "log in to Docker Hub" suggestion that DOCKER_RATELIMIT carries.
- PullLogEntry.exception now maps stream errors containing
  'toomanyrequests' to DockerRegistryRateLimitExceeded (case 3).
- docker/interface.py:install() routes all three cases through a single
  _registry_rate_limit_exception() helper that picks the right issue
  type, suggestion and exception subclass based on the image's registry.
- utils/sentry.py filters APITooManyRequests (and anything wrapping it
  via __cause__) in capture_exception / async_capture_exception. One
  point of policy, every caller benefits.

Callers (supervisor.update(), plugin manager, homeassistant core) are
unchanged - UPDATE_FAILED issues still get created alongside the
registry-specific rate limit issue, giving users the full picture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Consolidate Sentry noise filtering in one before_send hook

Move the APITooManyRequests filter from capture_exception /
async_capture_exception wrappers into the existing filter_data
before_send hook in supervisor/misc/filter.py, alongside the
AddonConfigurationError filter.

One isinstance tuple check instead of multiple layers, and every path
that reaches Sentry (including logging-integration and excepthook
captures, not just our explicit wrappers) now gets the same treatment.
The filter walks the __cause__ chain so wrapped rate-limit errors
(e.g. DockerHubRateLimitExceeded inside SupervisorUpdateError) still
get filtered. A debug log is emitted on each dropped event for
observability.

Review feedback from mdegat01 on #6732.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Drop GITHUB_RATELIMIT resolution issue

There is no actionable remediation for a GHCR rate limit - logging in
doesn't lift the quota the way it does for Docker Hub, and the cap is
on the authenticated account anyway. A resolution issue that just tells
the user "you were rate limited" adds UI noise without helping them.

Keep the GithubContainerRegistryRateLimitExceeded exception - retry
logic and the Sentry filter still key off it - but don't create a
resolution issue. A log entry from the exception constructor is
sufficient. Docker Hub still gets DOCKER_RATELIMIT + registry-login
suggestion since that is actionable.

Review feedback on #6732.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 07:49:01 +02:00
..