supervisor

mirror of https://github.com/home-assistant/supervisor.git synced 2026-05-20 06:38:53 +01:00

Author	SHA1	Message	Date
Stefan Agner	c772a9bbb0	Replace fixed-duration sleeps after bus events with gather (#6803 ) * Replace fixed-duration sleeps after bus events with gather Several tests use ``await asyncio.sleep(...)`` to "wait for the listener to run" after firing a bus event. The fixed duration is real wall-clock time and the wait can be indeterministic — if the handler chain happens to need slightly more time on a busy CI runner, the assertion races the handler. ``Bus.fire_event`` returns the listener tasks since #6252; capture and ``await asyncio.gather(tasks)`` instead of sleeping. Touches test_bus.py (the bus tests were poking scheduling instead of verifying their assertions), test_home_assistant_watchdog.py, test_plugin_base.py, addons/test_manager.py, docker/test_addon.py, and test_store_execute_reload.py. Other cleanups in the same spirit: - ``_fire_test_event`` in addons/test_addon.py becomes ``async def`` and gathers the listener tasks itself, so its 17 call sites collapse to a single ``await _fire_test_event(...)``. - The two test_store_execute_reload.py sites that used the private ``_update_connectivity()`` helper are reworked to set the cached connectivity flag directly and fire the event themselves so they can gather the listener tasks the same way. - The two ``sleep(1)`` post-pull drains in docker/test_interface.py collapse to ``sleep(0)`` (handler tasks are already gathered inside pull_image), saving ~2s. - The ``sleep(0.01)`` waits inside ``container_events()`` task bodies (api/test_addons.py, api/test_store.py, backups/test_manager.py) are just one-yield-to-the-parent and become ``sleep(0)``. Switching to ``gather`` exposes a few latent test mocks that were silently swallowing TypeErrors as background-task failures before: - ``CGroup.add_devices_allowed`` is ``async def`` but was patched as a plain MagicMock in docker/test_addon.py — now patched via ``new_callable=AsyncMock``. - The watchdog does ``await (await self.start())`` / ``await (await self.restart())`` because ``App.start`` / ``App.restart`` return ``asyncio.Task``. The mocks in addons/test_addon.py (test_app_watchdog, test_watchdog_on_stop, test_watchdog_during_attach) needed ``AsyncMock(return_value=<settled future>)`` to mirror that shape rather than a plain MagicMock. Factor bus.fire_event + gather pattern into a helper Per review feedback, the ``await asyncio.gather(*coresys.bus.fire_event(...))`` incantation was scattered across many call sites. Add ``tests.common.fire_bus_event`` that takes the coresys, event and data, fires the event and awaits the spawned listener tasks. Convert all matching sites to use it, including the ``_fire_test_event`` wrapper in addons/test_addon.py which now just builds the ``DockerContainerStateEvent`` and delegates.	2026-05-06 12:02:28 +02:00
Stefan Agner	0de6d25fed	Drop legacy test classes in favor of module-level functions (#6796 ) Per CLAUDE.md, plain test_* functions are the project style; class- based test grouping is considered legacy. Convert the 24 test methods in test_pull_progress.py (TestLayerProgress, TestImagePullProgress) to module-level functions — none of them used self, so the rewrite is mechanical. Also rename three helper classes whose names accidentally matched pytest's Test* collection pattern, even though they are fakes/fixtures rather than test cases: - TestAddon -> FakeApp (data holder used as a fake App in pwned tests) - TestDockerInterface -> FakeDockerInterface (fixture/inner helper in docker tests) The two DBusServiceMock subclasses named TestInterface already had __test__ = False and are left alone.	2026-05-04 21:38:22 +02:00
Stefan Agner	97bc19d4b3	Detect container registry rate limits uniformly (#6732 ) * Detect container registry rate limits uniformly Container registry rate limits reach Supervisor in three distinct shapes: 1. HTTP 429 from the daemon - recognised today, but the exception and resolution issue are hardcoded to Docker Hub. Since Core/Supervisor/ plugin images all live on ghcr.io now, virtually every 429 we see in the field is actually a GHCR throttle that we mislabel. The biggest Sentry issue (SUPERVISOR-16BK) has >115k events / >93k users, all pulling a ghcr.io image, yet each user is told to "log into Docker Hub". 2. HTTP 500 with 'toomanyrequests' in the body - not recognised. Docker daemons before 28.3.0 wrap upstream 429s as 500 (fixed upstream by moby/moby 23fa0ae74a, "Cleanup http status error checks"). The large fleet on older daemons still produces this shape. 3. JSON error event during a streaming pull - not recognised. Once the daemon starts writing the 200 OK response body the status is locked in, so rate limits that land during layer download arrive as plain text in the pull stream. Happens on all recent daemon versions - SUPERVISOR-13FQ (>16k events) and SUPERVISOR-13E0 (>8k events) are two large examples. Cases 2 and 3 propagate as plain DockerError, bypass the 429 detection in install() entirely, never produce a DOCKER_RATELIMIT resolution issue, and generate large amounts of Sentry noise. Case 1 is detected but routes every GHCR 429 through Docker-Hub-specific messaging and suggestions. Changes: - Add DockerRegistryRateLimitExceeded as the common base class and GithubContainerRegistryRateLimitExceeded alongside the existing DockerHubRateLimitExceeded. All extend APITooManyRequests so callers and retry logic can key off a single type. - Add GITHUB_RATELIMIT IssueType so GHCR failures don't show the "log in to Docker Hub" suggestion that DOCKER_RATELIMIT carries. - PullLogEntry.exception now maps stream errors containing 'toomanyrequests' to DockerRegistryRateLimitExceeded (case 3). - docker/interface.py:install() routes all three cases through a single _registry_rate_limit_exception() helper that picks the right issue type, suggestion and exception subclass based on the image's registry. - utils/sentry.py filters APITooManyRequests (and anything wrapping it via __cause__) in capture_exception / async_capture_exception. One point of policy, every caller benefits. Callers (supervisor.update(), plugin manager, homeassistant core) are unchanged - UPDATE_FAILED issues still get created alongside the registry-specific rate limit issue, giving users the full picture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Consolidate Sentry noise filtering in one before_send hook Move the APITooManyRequests filter from capture_exception / async_capture_exception wrappers into the existing filter_data before_send hook in supervisor/misc/filter.py, alongside the AddonConfigurationError filter. One isinstance tuple check instead of multiple layers, and every path that reaches Sentry (including logging-integration and excepthook captures, not just our explicit wrappers) now gets the same treatment. The filter walks the __cause__ chain so wrapped rate-limit errors (e.g. DockerHubRateLimitExceeded inside SupervisorUpdateError) still get filtered. A debug log is emitted on each dropped event for observability. Review feedback from mdegat01 on #6732. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Drop GITHUB_RATELIMIT resolution issue There is no actionable remediation for a GHCR rate limit - logging in doesn't lift the quota the way it does for Docker Hub, and the cap is on the authenticated account anyway. A resolution issue that just tells the user "you were rate limited" adds UI noise without helping them. Keep the GithubContainerRegistryRateLimitExceeded exception - retry logic and the Sentry filter still key off it - but don't create a resolution issue. A log entry from the exception constructor is sufficient. Docker Hub still gets DOCKER_RATELIMIT + registry-login suggestion since that is actionable. Review feedback on #6732. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 07:49:01 +02:00
Stefan Agner	7fb621234e	Add Unix socket support for Core communication with feature flag (#6742 ) * Use Unix socket for Supervisor to Core communication Reintroduce Unix socket support for Supervisor-to-Core communication (reverted in #6735) with the addition of a feature flag gate. The feature is now controlled by the `core_unix_socket` feature flag and disabled by default. When enabled and Core version supports it, Supervisor communicates with Core via a Unix socket at /run/os/core.sock instead of TCP. This eliminates the need for access token authentication on the socket path, as Core authenticates the peer by the socket connection itself. Key changes: - Add FeatureFlag.CORE_UNIX_SOCKET to gate the feature - HomeAssistantAPI: transport-aware session/url/websocket management - WSClient: separate connect() (Unix, no auth) and connect_with_auth() (TCP) class methods with proper error handling - APIProxy delegates websocket setup to api.connect_websocket() - Container state tracking for Unix session lifecycle - CI builder mounts /run/supervisor for integration tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Sort feature flags alphabetically * Drop per-call max_msg_size from WSClient Hardcode the WebSocket message size cap to 64 MB in WSClient and remove the parameter from WSClient.connect, connect_with_auth, _ws_connect, and HomeAssistantAPI.connect_websocket. This was only ever overridden by APIProxy, so threading it through four layers was unnecessary. max_msg_size is a cap, not a pre-allocation; aiohttp only grows buffers to the size of actual incoming messages. Supervisor's own control channel never approaches 64 MB, so unifying the limit has no runtime cost. Addresses review feedback on #6742. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 15:03:05 +02:00
Mike Degatano	ba8c49935b	Refactor internal addon references to app/apps (#6717 ) * Rename addon→app in docstrings and comments Updates all docstrings and inline comments across supervisor/ and tests/ to use the new app/apps terminology. No runtime behaviour is changed by this commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Rename addon→app in code (variables, args, class names, functions) Renames all internal Python identifiers from addon/addons to app/apps: - Variable and argument names - Function and method names - Class names (Addon→App, AddonManager→AppManager, DockerAddon→DockerApp, all exception, check, and fixup classes, etc.) - String literals used as Python identifiers (pytest fixtures, parametrize param names, patch.object attribute strings, URL route match_info keys) External API contracts are preserved: JSON keys, error codes, discovery protocol fields, TypedDict/attr.s field names. Import module paths (supervisor/addons/) are also unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix partial backup/restore API to remap addons key to apps The external API accepts `addons` as the request body key (since ATTR_APPS = "addons"), but do_backup_partial and do_restore_partial now take an `apps` parameter after the rename. The *body expansion in both endpoints would pass `addons=...` causing a TypeError. Remap the key before expansion in both backup_partial and restore_partial: if ATTR_APPS in body: body["apps"] = body.pop(ATTR_APPS) Also adds test_restore_partial_with_addons_key to verify the restore path correctly receives apps= when addons is passed in the request body. This path had no existing test coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Fix merge error * Adjust AppLoggerAdapter to use app_name Co-authored-by: Stefan Agner <stefan@agner.ch> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Stefan Agner <stefan@agner.ch>	2026-04-14 16:47:20 +02:00
Stefan Agner	5c5428fde3	Revert "Use Unix socket for Supervisor to Core communication (#6590 )" (#6735 ) This reverts commit `28fa0b35bd`.	2026-04-14 12:28:02 +02:00
Stefan Agner	28fa0b35bd	Use Unix socket for Supervisor to Core communication (#6590 ) * Use Unix socket for Supervisor to Core communication Switch internal Supervisor-to-Core HTTP and WebSocket communication from TCP (port 8123) to a Unix domain socket. The existing /run/supervisor directory on the host (already mounted at /run/os inside the Supervisor container) is bind-mounted into the Core container at /run/supervisor. Core receives the socket path via the SUPERVISOR_CORE_API_SOCKET environment variable, creates the socket there, and Supervisor connects to it via aiohttp.UnixConnector at /run/os/core.sock. Since the Unix socket is only reachable by processes on the same host, requests arriving over it are implicitly trusted and authenticated as the existing Supervisor system user. This removes the token round-trip where Supervisor had to obtain and send Bearer tokens on every Core API call. WebSocket connections are likewise authenticated implicitly, skipping the auth_required/auth handshake. Key design decisions: - Version-gated by CORE_UNIX_SOCKET_MIN_VERSION so older Core versions transparently continue using TCP with token auth - LANDINGPAGE is explicitly excluded (not a CalVer version) - Hard-fails with a clear error if the socket file is unexpectedly missing when Unix socket communication is expected - WSClient.connect() for Unix socket (no auth) and WSClient.connect_with_auth() for TCP (token auth) separate the two connection modes cleanly - Token refresh always uses the TCP websession since it is inherently a TCP/Bearer-auth operation - Logs which transport (Unix socket vs TCP) is being used on first request Closes #6626 Related Core PR: home-assistant/core#163907 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Close WebSocket on handshake failure and validate auth_required Ensure the underlying WebSocket connection is closed before raising when the handshake produces an unexpected message. Also validate that the first TCP message is auth_required before sending credentials. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix pylint protected-access warnings in tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Check running container env before using Unix socket Split use_unix_socket into two properties to handle the Supervisor upgrade transition where Core is still running with a container started by the old Supervisor (without SUPERVISOR_CORE_API_SOCKET): - supports_unix_socket: version check only, used when creating the Core container to decide whether to set the env var - use_unix_socket: version check + running container env check, used for communication decisions This ensures TCP fallback during the upgrade transition while still hard-failing if the socket is missing after Supervisor configured Core to use it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Improve Core API communication logging and error handling - Remove transport log from make_request that logged before Core container was attached, causing misleading connection logs - Log "Connected to Core via ..." once on first successful API response in get_api_state, when the transport is actually known - Remove explicit socket existence check from session property, let aiohttp UnixConnector produce natural connection errors during Core startup (same as TCP connection refused) - Add validation in get_core_state matching get_config pattern - Restore make_request docstring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Guard Core API requests with container running check Add is_running() check to make_request and connect_websocket so no HTTP or WebSocket connection is attempted when the Core container is not running. This avoids misleading connection attempts during Supervisor startup before Core is ready. Also make use_unix_socket raise if container metadata is not available instead of silently falling back to TCP. This is a defensive check since is_running() guards should prevent reaching this state. Add attached property to DockerInterface to expose whether container metadata has been loaded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Reset Core API connection state on container stop Listen for Core container STOPPED/FAILED events to reset the connection state: clear the _core_connected flag so the transport is logged again on next successful connection, and close any stale Unix socket session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Only mount /run/supervisor if we use it * Fix pytest errors * Remove redundant is_running check from ingress panel update The is_running() guard in update_hass_panel is now redundant since make_request checks is_running() internally. Also mock is_running in the websession test fixture since tests using it need make_request to proceed past the container running check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Bind mount /run/supervisor to Supervisor /run/os Home Assistant OS (as well as the Supervised run scripts) bind mount /run/supervisor to /run/os in Supervisor. Since we reuse this location for the communication socket between Supervisor and Core, we need to also bind mount /run/supervisor to Supervisor /run/os in CI. * Wrap WebSocket handshake errors in HomeAssistantAPIError Unexpected exceptions during the WebSocket handshake (KeyError, ValueError, TypeError from malformed messages) are now wrapped in HomeAssistantAPIError inside WSClient.connect/connect_with_auth. This means callers only need to catch HomeAssistantAPIError. Remove the now-unnecessary except (RuntimeError, ValueError, TypeError) from proxy _websocket_client and add a proper error message to the APIError per review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Narrow WebSocket handshake exception handling Replace broad `except Exception` with specific exception types that can actually occur during the WebSocket handshake: KeyError (missing dict keys), ValueError (bad JSON), TypeError (non-text WS message), aiohttp.ClientError (connection errors), and TimeoutError. This avoids silently wrapping programming errors into HomeAssistantAPIError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove unused create_mountpoint from MountBindOptions The field was added but never used. The /run/supervisor host path is guaranteed to exist since HAOS creates it for the Supervisor container mount, so auto-creating the mountpoint is unnecessary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Clear stale access token before raising on final retry Move token clear before the attempt check in connect_websocket so the stale token is always discarded, even when raising on the final attempt. Without this, the next call would reuse the cached bad token via _ensure_access_token's fast path, wasting a round-trip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add tests for Unix socket communication and Core API Add tests for the new Unix socket communication path and improve existing test coverage: - Version-based supports_unix_socket and env-based use_unix_socket - api_url/ws_url transport selection - Connection lifecycle: connected log after restart, ignoring unrelated container events - get_api_state/check_api_state parameterized across versions, responses, and error cases - make_request is_running guard and TCP flow with real token fetch - connect_websocket for both Unix and TCP (with token verification) - WSClient.connect/connect_with_auth handshake success, errors, cleanup on failure, and close with pending futures Consolidate existing tests into parameterized form and drop synthetic tests that covered very little. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 15:09:38 +02:00
Stefan Agner	a30f2509a3	Enable IPv6 on Supervisor network by default for all installations (#6720 ) * Improve Docker network test coverage and infrastructure Add test cases for enable_ipv6=None (no user setting) to test_network_recreation, verifying existing behavior where None leaves the network unchanged. Use pytest.param with descriptive IDs for better test readability. Add create_network_mock side_effect to the docker fixture so network creation returns realistic metadata built from the provided params. Remove redundant manual create mock setups from individual tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Enable IPv6 on Supervisor network by default for all installations Previously, IPv6 was only enabled by default for new installations (when enable_ipv6 config was None). Existing installations with IPv4-only networks were left unchanged unless the user explicitly set enable_ipv6 to true. Now, when no explicit IPv6 setting exists, the network is migrated to dual-stack on next boot. The same safety checks apply: migration is blocked if user containers are running and requires a reboot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:08:30 +02:00
Stefan Agner	a4a17a70a5	Add specific error message for registry authentication failures (#6678 ) * Add specific error message for registry authentication failures When a Docker image pull fails with 401 Unauthorized and registry credentials are configured, raise DockerRegistryAuthError instead of a generic DockerError. This surfaces a clear message to the user ("Docker registry authentication failed for <registry>. Check your registry credentials") instead of "An unknown error occurred with addon <name>". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add tests for registry authentication error handling Test that a 401 during image pull raises DockerRegistryAuthError when credentials are configured, and falls back to generic DockerError when no credentials are present. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add tests for addon install/update/rebuild auth failure handling Test that DockerRegistryAuthError propagates correctly through addon install, update, and rebuild paths without being wrapped in a generic AddonUnknownError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 09:29:49 +02:00
Stefan Agner	1fd78dfc4e	Fix Docker Hub registry auth for containerd image store (#6677 ) aiodocker derives ServerAddress for X-Registry-Auth by doing image.partition("/"). For Docker Hub images like "homeassistant/amd64-supervisor", this extracts "homeassistant" (the namespace) instead of "docker.io" (the registry). With the classic graphdriver image store, ServerAddress was never checked and credentials were sent regardless. With the containerd image store (default since Docker v29 / HAOS 15), the resolver compares ServerAddress against the actual registry host and silently drops credentials on mismatch, falling back to anonymous access. Fix by prefixing Docker Hub images with "docker.io/" when registry credentials are configured, so aiodocker sets ServerAddress correctly. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 14:43:18 +02:00
Stefan Agner	0ef71d1dd1	Drop unsupported architectures and machines, create issue for affected apps (#6607 ) * Drop unsupported architectures and machines from Supervisor Since #5620 Supervisor no longer updates the version information on unsupported architectures and machines. This means users can no longer update to newer version of Supervisor since that PR got released. Furthermore since #6347 we also no longer build for these architectures. With this, any code related to these architectures becomes dead code and should be removed. This commit removes all refrences to the deprecated architectures and machines from Supervisor. This affects the following architectures: - armhf - armv7 - i386 And the following machines: - odroid-xu - qemuarm - qemux86 - raspberrypi - raspberrypi2 - raspberrypi3 - raspberrypi4 - tinker * Create issue if an app using a deprecated architecture is installed This adds a check to the resolution system to detect if an app is installed that uses a deprecated architecture. If so, it will show a warning to the user and recommend them to uninstall the app. * Formally deprecate machine add-on configs as well Not only deprecate add-on configs for unsupported architectures, but also for unsupported machines. * For installed add-ons architecture must always exist Fail hard in case of missing architecture, as this is a required field for installed add-ons. This will prevent the Supervisor from running with an unsupported configuration and causing further issues down the line.	2026-03-04 10:59:14 +01:00
Mike Degatano	4a1c816b92	Finish dockerpy to aiodocker migration (#6578 )	2026-02-18 08:49:15 +01:00
Stefan Agner	0cd668ec77	Fix typeguard errors by explicitly converting IP addresses to strings (#6531 ) * Fix environment variable type errors by converting IP addresses to strings Environment variables must be strings, but IPv4Address and IPv4Network objects were being passed directly to container environment dictionaries, causing typeguard validation errors. Changes: - Convert IPv4Address objects to strings in homeassistant.py for SUPERVISOR and HASSIO environment variables - Convert IPv4Network object to string in observer.py for NETWORK_MASK environment variable - Update tests to expect string values instead of IP objects in environment dictionaries - Remove unused ip_network import from test_observer.py Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Use explicit string conversion for extra_hosts IP addresses Use the !s format specifier in the f-string to explicitly convert IPv4Address objects to strings when building the ExtraHosts list. While f-strings implicitly convert objects to strings, using !s makes the conversion explicit and consistent with the environment variable fixes in the previous commit. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-05 11:00:43 +01:00
Stefan Agner	d1a576e711	Fix Docker Hub manifest fetching by using correct registry API endpoint (#6525 ) The manifest fetcher was using docker.io as the registry API endpoint, but Docker Hub's actual registry API is at registry-1.docker.io. When trying to access https://docker.io/v2/..., requests were being redirected to https://www.docker.com/ (the marketing site), which returned HTML instead of JSON, causing manifest fetching to fail. This matches exactly what Docker itself does internally - see daemon/pkg/registry/config.go:49 where Docker hardcodes DefaultRegistryHost = "registry-1.docker.io" for registry operations. Changes: - Add DOCKER_HUB_API constant for the actual API endpoint - Add _get_api_endpoint() helper to translate docker.io to registry-1.docker.io for HTTP API calls - Update _get_auth_token() and _fetch_manifest() to use the API endpoint - Keep docker.io as the registry identifier for naming and credentials - Add tests to verify the API endpoint translation Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-03 19:03:47 +01:00
Mike Degatano	a122b5f1e9	Migrate info, events and container logs to aiodocker (#6514 ) * Migrate info and events to aiodocker * Migrate container logs to aiodocker * Fix dns plugin loop test * Fix mocking for docker info * Fixes from feedback * Harden monitor error handling * Deleted failing tests because they were not useful	2026-02-03 18:36:41 +01:00
Stefan Agner	6957341c3e	Refactor Docker pull progress with registry manifest fetcher (#6379 ) * Use count-based progress for Docker image pulls Refactor Docker image pull progress to use a simpler count-based approach where each layer contributes equally (100% / total_layers) regardless of size. This replaces the previous size-weighted calculation that was susceptible to progress regression. The core issue was that Docker rate-limits concurrent downloads (~3 at a time) and reports layer sizes only when downloading starts. With size- weighted progress, large layers appearing late would cause progress to drop dramatically (e.g., 59% -> 29%) as the total size increased. The new approach: - Each layer contributes equally to overall progress - Per-layer progress: 70% download weight, 30% extraction weight - Progress only starts after first "Downloading" event (when layer count is known) - Always caps at 99% - job completion handles final 100% This simplifies the code by moving progress tracking to a dedicated module (pull_progress.py) and removing complex size-based scaling logic that tried to account for unknown layer sizes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Exclude already-existing layers from pull progress calculation Layers that already exist locally should not count towards download progress since there's nothing to download for them. Only layers that need pulling are included in the progress calculation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add registry manifest fetcher for size-based pull progress Fetch image manifests directly from container registries before pulling to get accurate layer sizes upfront. This enables size-weighted progress tracking where each layer contributes proportionally to its byte size, rather than equal weight per layer. Key changes: - Add RegistryManifestFetcher that handles auth discovery via WWW-Authenticate headers, token fetching with optional credentials, and multi-arch manifest list resolution - Update ImagePullProgress to accept manifest layer sizes via set_manifest() and calculate size-weighted progress - Fall back to count-based progress when manifest fetch fails - Pre-populate layer sizes from manifest when creating layer trackers The manifest fetcher supports ghcr.io, Docker Hub, and private registries by using credentials from Docker config when available. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Clamp progress to 100 to prevent floating point precision issues Floating point arithmetic in weighted progress calculations can produce values slightly above 100 (e.g., 100.00000000000001). This causes validation errors when the progress value is checked. Add min(100, ...) clamping to both size-weighted and count-based progress calculations to ensure the result never exceeds 100. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Use sys_websession for manifest fetcher instead of creating new session Reuse the existing CoreSys websession for registry manifest requests instead of creating a new aiohttp session. This improves performance and follows the established pattern used throughout the codebase. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Make platform parameter required and warn on missing platform - Make platform a required parameter in get_manifest() and _fetch_manifest() since it's always provided by the calling code - Return None and log warning when requested platform is not found in multi-arch manifest list, instead of falling back to first manifest which could be the wrong architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Log manifest fetch failures at warning level Users will notice degraded progress tracking when manifest fetch fails, so log at warning level to help diagnose issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add pylint disable comments for protected access in manifest tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Separate download_current and total_size updates in pull progress Update download_current and total_size independently in the DOWNLOADING handler. This ensures download_current is updated even when total is not yet available. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Reject invalid platform format in manifest selection --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-02-02 15:56:24 +01:00
Mike Degatano	a5c3781f9d	Migrate network interactions to aiodocker (#6505 )	2026-01-30 15:34:12 +01:00
dependabot[bot]	2a4890e2b0	Bump aiodocker from 0.24.0 to 0.25.0 (#6448 ) * Bump aiodocker from 0.24.0 to 0.25.0 Bumps [aiodocker](https://github.com/aio-libs/aiodocker) from 0.24.0 to 0.25.0. - [Release notes](https://github.com/aio-libs/aiodocker/releases) - [Changelog](https://github.com/aio-libs/aiodocker/blob/main/CHANGES.rst) - [Commits](https://github.com/aio-libs/aiodocker/compare/v0.24.0...v0.25.0) --- updated-dependencies: - dependency-name: aiodocker dependency-version: 0.25.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Update to new timeout configuration * Fix pytest failure --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mike Degatano <michael.degatano@gmail.com> Co-authored-by: Stefan Agner <stefan@agner.ch>	2026-01-30 09:39:06 +01:00
AlCalzone	de02bc991a	fix: pull missing images before running (#6500 ) * fix: pull missing images before running * add tests for auto-pull behavior	2026-01-28 13:08:03 +01:00
Mike Degatano	909a2dda2f	Migrate (almost) all docker container interactions to aiodocker (#6489 ) * Migrate all docker container interactions to aiodocker * Remove containers_legacy since its no longer used * Add back remove color logic * Revert accidental invert of conditional in setup_network * Fix typos found by copilot * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Revert "Apply suggestions from code review" This reverts commit `0a475433ea`. --------- Co-authored-by: Stefan Agner <stefan@agner.ch> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-27 12:42:17 +01:00
Jan Čermák	753021d4d5	Fix 'DockerMount is not JSON serializable' in DockerAPI.run_command (#6477 )	2026-01-14 15:21:11 +01:00
Mike Degatano	d23bc291d5	Migrate create container to aiodocker (#6415 ) * Migrate create container to aiodocker * Fix extra hosts transformation * Env not Environment * Fix tests * Fixes from feedback --------- Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>	2025-12-15 09:57:30 +01:00
Jan Čermák	cdef1831ba	Add option to Core settings to enable duplicated logs (#6400 ) Introduce new option `duplicate_log_file` to HA Core configuration that will set an environment variable `HA_DUPLICATE_LOG_FILE=1` for the Core container if enabled. This will serve as a flag for Core to enable the legacy log file, along the standard logging which is handled by Systemd Journal.	2025-12-08 16:35:56 +01:00
Stefan Agner	382f0e8aef	Disable timeout for Docker image pull operations (#6391 ) * Disable timeout for Docker image pull operations The aiodocker migration introduced a regression where image pulls could timeout during slow downloads. The session-level timeout (900s total) was being applied to pull operations, but docker-py explicitly sets timeout=None for pulls, allowing them to run indefinitely. When aiodocker receives timeout=None, it converts it to ClientTimeout(total=None), which aiohttp treats as "no timeout" (returns TimerNoop instead of enforcing a timeout). This fixes TimeoutError exceptions that could occur during installation on systems with slow network connections or when pulling large images. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix pytests --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-12-03 21:52:46 +01:00
Stefan Agner	d220fa801f	Await aiodocker import_image coroutine (#6378 ) The aiodocker images.import_image() method returns a coroutine that needs to be awaited, but the code was iterating over it directly, causing "TypeError: 'coroutine' object is not iterable". Fixes SUPERVISOR-13D9 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>	2025-12-02 14:11:06 -05:00
Stefan Agner	50d31202ae	Use Docker's official registry domain detection logic (#6360 ) * Use Docker's official registry domain detection logic Replace the custom IMAGE_WITH_HOST regex with a proper implementation based on Docker's reference parser (vendor/github.com/distribution/ reference/normalize.go). Changes: - Change DOCKER_HUB from "hub.docker.com" to "docker.io" (official default) - Add DOCKER_HUB_LEGACY for backward compatibility with "hub.docker.com" - Add IMAGE_DOMAIN_REGEX and get_domain() function that properly detects: - localhost (with optional port) - Domains with "." (e.g., ghcr.io, 127.0.0.1) - Domains with ":" port (e.g., myregistry:5000) - IPv6 addresses (e.g., [::1]:5000) - Update credential handling to support both docker.io and hub.docker.com - Add comprehensive tests for domain detection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Refactor Docker domain detection to utils module Move get_domain function to supervisor/docker/utils.py and rename it to get_domain_from_image for consistency with get_registry_for_image. Use named group in the regex for better readability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Rename domain to registry for consistency Use consistent "registry" terminology throughout the codebase: - Rename get_domain_from_image to get_registry_from_image - Rename IMAGE_DOMAIN_REGEX to IMAGE_REGISTRY_REGEX - Update named group from "domain" to "registry" - Update all related comments and variable names 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-12-02 14:30:03 +01:00
Mike Degatano	6302c7d394	Fix progress when using containerd snapshotter (#6357 ) * Fix progress when using containerd snapshotter * Add test for tiny image download under containerd-snapshotter * Fix API tests after progress allocation change * Fix test for auth changes * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Stefan Agner <stefan@agner.ch> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-27 16:26:22 +01:00
Stefan Agner	8a251e0324	Pass registry credentials to add-on build for private base images (#6356 ) * Pass registry credentials to add-on build for private base images When building add-ons that use a base image from a private registry, the build would fail because credentials configured via the Supervisor API were not passed to the Docker-in-Docker build container. This fix: - Adds get_docker_config_json() to generate a Docker config.json with registry credentials for the base image - Creates a temporary config file and mounts it into the build container at /root/.docker/config.json so BuildKit can authenticate when pulling the base image - Cleans up the temporary file after build completes Fixes #6354 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix pylint errors * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Refactor registry credential extraction into shared helper Extract duplicate logic for determining which registry matches an image into a shared `get_registry_for_image()` method in `DockerConfig`. This method is now used by both `DockerInterface._get_credentials()` and `AddonBuild.get_docker_config_json()`. Move `DOCKER_HUB` and `IMAGE_WITH_HOST` constants to `docker/const.py` to avoid circular imports between manager.py and interface.py. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * ruff format * Document raises --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Mike Degatano <michael.degatano@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-27 11:10:17 +01:00
Stefan Agner	ae7700f52c	Fix private registry authentication for aiodocker image pulls (#6355 ) * Fix private registry authentication for aiodocker image pulls After PR #6252 migrated image pulling from dockerpy to aiodocker, private registry authentication stopped working. The old _docker_login() method stored credentials in ~/.docker/config.json via dockerpy, but aiodocker doesn't read that file - it requires credentials passed explicitly via the auth parameter. Changes: - Remove unused _docker_login() method (dockerpy login was ineffective) - Pass credentials directly to pull_image() via new auth parameter - Add auth parameter to DockerAPI.pull_image() method - Add unit tests for Docker Hub and custom registry authentication Fixes #6345 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Ignore protected access in test * Fix plug-in pull test * Fix HA core tests --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-11-26 17:37:24 +01:00
Stefan Agner	63a3dff118	Handle pull events with complete progress details only (#6320 ) * Handle pull events with complete progress details only Under certain circumstances, Docker seems to send pull events with incomplete progress details (i.e., missing 'current' or 'total' fields). In practise, we've observed an empty dictionary for progress details as well as missing 'total' field (while 'current' was present). All events were using Docker 28.3.3 using the old, default Docker graph backend. * Fix docstring/comment	2025-11-19 12:21:27 +01:00
Mike Degatano	30cc172199	Migrate images from dockerpy to aiodocker (#6252 ) * Migrate images from dockerpy to aiodocker * Add missing coverage and fix bug in repair * Bind libraries to different files and refactor images.pull * Use the same socket again Try using the same socket again. * Fix pytest --------- Co-authored-by: Stefan Agner <stefan@agner.ch>	2025-11-12 20:54:06 +01:00
Stefan Agner	d85aedc42b	Avoid using deprecated 'id' field in Docker events (#6307 )	2025-11-12 20:44:01 +01:00
Stefan Agner	d96ea9aef9	Fix docker image pull progress blocked by small layers (#6287 ) * Fix docker image pull progress blocked by small layers Small Docker layers (typically <100 bytes) can skip the downloading phase entirely, going directly from "Pulling fs layer" to "Download complete" without emitting any progress events with byte counts. This caused the aggregate progress calculation to block indefinitely, as it required all layer jobs to have their `extra` field populated with byte counts before proceeding. The issue manifested as parent job progress jumping from 0% to 97.9% after long delays, as seen when a 96-byte layer held up progress reporting for ~50 seconds until it finally reached the "Extracting" phase. Set a minimal `extra` field (current=1, total=1) when layers reach "Download complete" without having gone through the downloading phase. This allows the aggregate progress calculation to proceed immediately while still correctly representing the layer as 100% downloaded. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update test to capture issue correctly * Improve pytest * Fix pytest comment * Fix pylint warning --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-11-06 09:04:55 +01:00
Stefan Agner	1448a33dbf	Remove Codenotary integrity check (#6236 ) * Formally deprecate CodeNotary build config * Remove CodeNotary specific integrity checking The current code is specific to how CodeNotary was doing integrity checking. A future integrity checking mechanism likely will work differently (e.g. through EROFS based containers). Remove the current code to make way for a future implementation. * Drop CodeNotary integrity fixups * Drop unused tests * Fix pytest * Fix pytest * Remove CodeNotary related exceptions and handling Remove CodeNotary related exceptions and handling from the Docker interface. * Drop unnecessary comment * Remove Codenotary specific IssueType/SuggestionType * Drop Codenotary specific environment and secret reference * Remove unused constants * Introduce APIGone exception for removed APIs Introduce a new exception class APIGone to indicate that certain API features have been removed and are no longer available. Update the security integrity check endpoint to raise this new exception instead of a generic APIError, providing clearer communication to clients that the feature has been intentionally removed. * Drop content trust A cosign based signature verification will likely be named differently to avoid confusion with existing implementations. For now, remove the content trust option entirely. * Drop code sign test * Remove source_mods/content_trust evaluations * Remove content_trust reference in bootstrap.py * Fix security tests * Drop unused tests * Drop codenotary from schema Since we have "remove extra" in voluptuous, we can remove the codenotary field from the addon schema. * Remove content_trust from tests * Remove content_trust unsupported reason * Remove unnecessary comment * Remove unrelated pytest * Remove unrelated fixtures	2025-11-03 20:13:15 +01:00
Stefan Agner	53a8044aff	Add support for ulimit in addon config (#6206 ) * Add support for ulimit in addon config Similar to docker-compose, this adds support for setting ulimits for addons via the addon config. This is useful e.g. for InfluxDB which on its own does not support setting higher open file descriptor limits, but recommends increasing limits on the host. * Make soft and hard limit mandatory if ulimit is a dict	2025-10-08 12:43:12 +02:00
Mike Degatano	190b734332	Add progress reporting to addon, HA and Supervisor updates (#6195 ) * Add progress reporting to addon, HA and Supervisor updates * Fix assert in test * Add progress to addon, core, supervisor updates/installs * Fix double install bug in addons install * Remove initial_install and re-arrange order of load	2025-10-07 16:54:11 +02:00
Stefan Agner	f3e1e0f423	Fix CID file handling to prevent directory creation (#6225 ) * Fix CID file handling to prevent directory creation It seems that under certain conditions Docker creates a directory instead of a file for the CID file. This change ensures that the CID file is always created as a file, and any existing directory is removed before creating the file. * Fix tests * Fix pytest	2025-10-02 09:24:19 +02:00
Jan Čermák	bbb9469c1c	Write cidfiles of Docker containers and mount them individually to /run/cid (#6154 ) * Write cidfiles of Docker containers and mount them individually to /run/cid There is no standard way to get the container ID in the container itself, which can be needed for instance for #6006. The usual pattern is to use the --cidfile argument of Docker CLI and mount the generated file to the container. However, this is feature of Docker CLI and we can't use it when creating the containers via API. To get container ID to implement native logging in e.g. Core as well, we need the help of the Supervisor. This change implements similar feature fully in Supervisor's DockerAPI class that orchestrates lifetime of all containers managed by Supervisor. The files are created in the SUPERVISOR_DATA directory, as it needs to be persisted between reboots, just as the instances of Docker containers are. Supervisor's cidfile must be created when starting the Supervisor itself, for that see home-assistant/operating-system#4276. * Address review comments, fix mounting of the cidfile	2025-09-09 13:38:31 +02:00
Mike Degatano	78be155b94	Handle download retart in pull progres log (#6131 )	2025-08-25 23:20:00 +02:00
Mike Degatano	9900dfc8ca	Do not skip messages in pull progress log due to rounding (#6129 )	2025-08-25 22:25:38 +02:00
Mike Degatano	207b665e1d	Send progress updates during image pull for install/update (#6102 ) * Send progress updates during image pull for install/update * Add extra to tests about job APIs * Sent out of date progress to sentry and combine done event * Pulling container image layer	2025-08-22 10:41:10 +02:00
Mike Degatano	8a82b98e5b	Improved error handling for docker image pulls (#6095 ) * Improved error handling for docker image pulls * Fix mocking in tests due to api use change	2025-08-13 18:05:27 +02:00
Copilot	fd205ce2ef	Add Docker MTU configuration support for networks with non-standard MTU (#6079 ) * Initial plan * Implement Docker MTU support - core functionality Co-authored-by: agners <34061+agners@users.noreply.github.com> * Add comprehensive MTU tests and documentation Co-authored-by: agners <34061+agners@users.noreply.github.com> * Fix final linting issue in test file Co-authored-by: agners <34061+agners@users.noreply.github.com> * Apply suggestions from code review * Implement reboot_required flag pattern and fix MyPy typing issue Co-authored-by: agners <34061+agners@users.noreply.github.com> * Update supervisor/api/docker.py * Update supervisor/docker/manager.py Co-authored-by: Mike Degatano <michael.degatano@gmail.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: agners <34061+agners@users.noreply.github.com> Co-authored-by: Stefan Agner <stefan@agner.ch> Co-authored-by: Mike Degatano <michael.degatano@gmail.com>	2025-08-12 09:19:12 +02:00
Stefan Agner	7dcf5ba631	Enable IPv6 for containers on new installations (#6029 ) * Enable IPv6 by default for new installations Enable IPv6 by default for new Supervisor installations. Let's also make the `enable_ipv6` attribute nullable, so we can distinguish between "not set" and "set to false". * Add pytest * Add log message that system restart is required for IPv6 changes * Fix API pytest * Create resolution center issue when reboot is required * Order log after actual setter call	2025-07-29 15:59:03 +02:00
Felipe Santos	d1c1a2d418	Fix `docker.run_command()` needing `detach` but not enforcing it (#5979 ) * Fix `docker.run_command()` needing `detach` but not enforcing it * Fix test	2025-06-30 16:09:19 +02:00
Felipe Santos	cf32f036c0	Fix `docker_home_assistant_execute_command` not honoring HA version (#5978 ) * Fix `docker_home_assistant_execute_command` not honoring HA version * Change variable name to image_with_tag * Fix test	2025-06-30 16:08:05 +02:00
Felipe Santos	b8852872fe	Remove anonymous volumes when removing containers (#5977 ) * Remove anonymous volumes when removing containers * Add tests for docker.run_command()	2025-06-30 13:31:41 +02:00
David Rapan	d5b5a328d7	feat: Add opt-in IPv6 for containers (#5879 ) Configurable and w/ migrations between IPv4-Only and Dual-Stack Signed-off-by: David Rapan <david@rapan.cz> Co-authored-by: Stefan Agner <stefan@agner.ch>	2025-06-12 11:32:24 +02:00
Mike Degatano	4a00caa2e8	Fix mypy issues in docker, hardware and homeassistant modules (#5805 ) * Fix mypy issues in docker and hardware modules * Fix mypy issues in homeassistant module * Fix async_send_command typing * Fixes from feedback	2025-04-08 12:52:58 -04:00
Mike Degatano	31193abb7b	FileConfiguration uses executor for I/O (#5652 ) * FileConfiguration uses executor for I/O * Fix credentials tests * Remove migrate_system_env as its very deprecated	2025-02-26 19:11:11 +01:00

1 2

86 Commits