* Use Unix socket for Supervisor to Core communication
Reintroduce Unix socket support for Supervisor-to-Core communication
(reverted in #6735) with the addition of a feature flag gate. The
feature is now controlled by the `core_unix_socket` feature flag and
disabled by default.
When enabled and Core version supports it, Supervisor communicates with
Core via a Unix socket at /run/os/core.sock instead of TCP. This
eliminates the need for access token authentication on the socket path,
as Core authenticates the peer by the socket connection itself.
Key changes:
- Add FeatureFlag.CORE_UNIX_SOCKET to gate the feature
- HomeAssistantAPI: transport-aware session/url/websocket management
- WSClient: separate connect() (Unix, no auth) and connect_with_auth()
(TCP) class methods with proper error handling
- APIProxy delegates websocket setup to api.connect_websocket()
- Container state tracking for Unix session lifecycle
- CI builder mounts /run/supervisor for integration tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Sort feature flags alphabetically
* Drop per-call max_msg_size from WSClient
Hardcode the WebSocket message size cap to 64 MB in WSClient and remove
the parameter from WSClient.connect, connect_with_auth, _ws_connect,
and HomeAssistantAPI.connect_websocket. This was only ever overridden
by APIProxy, so threading it through four layers was unnecessary.
max_msg_size is a cap, not a pre-allocation; aiohttp only grows buffers
to the size of actual incoming messages. Supervisor's own control
channel never approaches 64 MB, so unifying the limit has no runtime
cost.
Addresses review feedback on #6742.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add versioned v2 API with apps terminology
Introduce a v2 API sub-app mounted at /v2 that uses 'apps' terminology
throughout, while keeping v1 fully backward-compatible.
Key changes:
- Add ATTR_ADDONS = 'addons' constant alongside ATTR_APPS = 'apps' so
backup file data (which must remain 'addons' for backward compat) and
v2 API responses can use distinct constants
- Add FeatureFlag.SUPERVISOR_V2_API to gate v2 route registration
- Mount aiohttp sub-app at /v2 in RestAPI.load() when flag is enabled
- Add _AppSecurityPatterns frozen dataclass and _V1_PATTERNS/_V2_PATTERNS
with strict per-version regex sets (no cross-version matching)
- Add _register_v2_apps, _register_v2_backups, _register_v2_store route
registration methods
- Add v1 thin wrapper methods (*_v1) for all affected endpoints so
business logic lives in the canonical v2 methods
- Extract _info_data() helper in APIApps so v1 closure can bypass
@api_process and still catch APIAppNotInstalled for store routing
- Add _rename_apps_to_addons_in_backups(), _process_location_in_body(),
_all_store_apps_info() shared helpers to eliminate duplication
- Add api_client_v2, api_client_with_prefix, app_api_client_with_root,
store_app_api_client_with_root parameterized test fixtures
- Add test_v2_api_disabled_without_feature_flag
- Parameterize backup, addons, and store tests to cover both v1 and v2
paths
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix pylint false positive for re.Pattern C extension methods
re.Pattern methods (match, search, etc.) are C extension methods.
Pylint cannot detect them via static analysis when re.Pattern is used
as a type annotation in a dataclass field, producing false E1101
no-member errors. Add generated-members to inform pylint these members
exist.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* pylint and feedback fixes
* Copilot suggested fixes
* Minor feedback fixes
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Delay old image cleanup until after health checks on Core update
Move the old Docker image cleanup from inside _update() to after the
post-update health checks (frontend loaded and accessible). This keeps
the previous version's image available locally when a rollback is
needed, avoiding a potentially slow re-download.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add test assertions for old image cleanup timing on Core update
Verify that the old Docker image is cleaned up only after health checks
pass, and not when a rollback is triggered.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix missing rollback when get_config fails after Core update
The early return after setting error_state skipped the rollback block,
leaving the system on a broken new version when the API stopped
responding after update. The other health check failure paths correctly
fall through to the rollback logic; this was the only one that didn't.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Remove double newlines from build output
The log lines from run command already have newline characters, so
joining them with "\n" adds extra newlines. Joining them with an empty
string preserves the original formatting of the logs.
* Remove double newlines from check config output
The log lines from run command already have newline characters, so
joining them with "\n" adds extra newlines. Joining them with an empty
string preserves the original formatting of the logs.
* Fix pytest
* Add experimental feature toggle system
Introduces an ExperimentalFeature enum and feature_flags config to allow
toggling experimental features via the supervisor options API. The first
feature flag is 'supervisor_v2_api' to gate the upcoming V2 API.
Absent keys in options request = no change (partial update, consistent
with existing options APIs). The info endpoint always returns all known
feature flags and their current state for discoverability.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* ExperimentalFeature -> FeatureFlag
* Use explicit value of StrEnum to be typesafe
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Minor comment improvement
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Rename addon→app in docstrings and comments
Updates all docstrings and inline comments across supervisor/ and
tests/ to use the new app/apps terminology. No runtime behaviour
is changed by this commit.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Rename addon→app in code (variables, args, class names, functions)
Renames all internal Python identifiers from addon/addons to app/apps:
- Variable and argument names
- Function and method names
- Class names (Addon→App, AddonManager→AppManager, DockerAddon→DockerApp,
all exception, check, and fixup classes, etc.)
- String literals used as Python identifiers (pytest fixtures,
parametrize param names, patch.object attribute strings,
URL route match_info keys)
External API contracts are preserved: JSON keys, error codes,
discovery protocol fields, TypedDict/attr.s field names.
Import module paths (supervisor/addons/) are also unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix partial backup/restore API to remap addons key to apps
The external API accepts `addons` as the request body key (since
ATTR_APPS = "addons"), but do_backup_partial and do_restore_partial
now take an `apps` parameter after the rename. The **body expansion
in both endpoints would pass `addons=...` causing a TypeError.
Remap the key before expansion in both backup_partial and
restore_partial:
if ATTR_APPS in body:
body["apps"] = body.pop(ATTR_APPS)
Also adds test_restore_partial_with_addons_key to verify the restore
path correctly receives apps= when addons is passed in the request
body. This path had no existing test coverage.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix merge error
* Adjust AppLoggerAdapter to use app_name
Co-authored-by: Stefan Agner <stefan@agner.ch>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Use Unix socket for Supervisor to Core communication
Switch internal Supervisor-to-Core HTTP and WebSocket communication
from TCP (port 8123) to a Unix domain socket.
The existing /run/supervisor directory on the host (already mounted
at /run/os inside the Supervisor container) is bind-mounted into the
Core container at /run/supervisor. Core receives the socket path via
the SUPERVISOR_CORE_API_SOCKET environment variable, creates the
socket there, and Supervisor connects to it via aiohttp.UnixConnector
at /run/os/core.sock.
Since the Unix socket is only reachable by processes on the same host,
requests arriving over it are implicitly trusted and authenticated as
the existing Supervisor system user. This removes the token round-trip
where Supervisor had to obtain and send Bearer tokens on every Core
API call. WebSocket connections are likewise authenticated implicitly,
skipping the auth_required/auth handshake.
Key design decisions:
- Version-gated by CORE_UNIX_SOCKET_MIN_VERSION so older Core
versions transparently continue using TCP with token auth
- LANDINGPAGE is explicitly excluded (not a CalVer version)
- Hard-fails with a clear error if the socket file is unexpectedly
missing when Unix socket communication is expected
- WSClient.connect() for Unix socket (no auth) and
WSClient.connect_with_auth() for TCP (token auth) separate the
two connection modes cleanly
- Token refresh always uses the TCP websession since it is inherently
a TCP/Bearer-auth operation
- Logs which transport (Unix socket vs TCP) is being used on first
request
Closes#6626
Related Core PR: home-assistant/core#163907
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Close WebSocket on handshake failure and validate auth_required
Ensure the underlying WebSocket connection is closed before raising
when the handshake produces an unexpected message. Also validate that
the first TCP message is auth_required before sending credentials.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix pylint protected-access warnings in tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Check running container env before using Unix socket
Split use_unix_socket into two properties to handle the Supervisor
upgrade transition where Core is still running with a container
started by the old Supervisor (without SUPERVISOR_CORE_API_SOCKET):
- supports_unix_socket: version check only, used when creating the
Core container to decide whether to set the env var
- use_unix_socket: version check + running container env check, used
for communication decisions
This ensures TCP fallback during the upgrade transition while still
hard-failing if the socket is missing after Supervisor configured
Core to use it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Improve Core API communication logging and error handling
- Remove transport log from make_request that logged before Core
container was attached, causing misleading connection logs
- Log "Connected to Core via ..." once on first successful API response
in get_api_state, when the transport is actually known
- Remove explicit socket existence check from session property, let
aiohttp UnixConnector produce natural connection errors during
Core startup (same as TCP connection refused)
- Add validation in get_core_state matching get_config pattern
- Restore make_request docstring
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Guard Core API requests with container running check
Add is_running() check to make_request and connect_websocket so no
HTTP or WebSocket connection is attempted when the Core container is
not running. This avoids misleading connection attempts during
Supervisor startup before Core is ready.
Also make use_unix_socket raise if container metadata is not available
instead of silently falling back to TCP. This is a defensive check
since is_running() guards should prevent reaching this state.
Add attached property to DockerInterface to expose whether container
metadata has been loaded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Reset Core API connection state on container stop
Listen for Core container STOPPED/FAILED events to reset the
connection state: clear the _core_connected flag so the transport
is logged again on next successful connection, and close any stale
Unix socket session.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Only mount /run/supervisor if we use it
* Fix pytest errors
* Remove redundant is_running check from ingress panel update
The is_running() guard in update_hass_panel is now redundant since
make_request checks is_running() internally. Also mock is_running
in the websession test fixture since tests using it need make_request
to proceed past the container running check.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Bind mount /run/supervisor to Supervisor /run/os
Home Assistant OS (as well as the Supervised run scripts) bind mount
/run/supervisor to /run/os in Supervisor. Since we reuse this location
for the communication socket between Supervisor and Core, we need to
also bind mount /run/supervisor to Supervisor /run/os in CI.
* Wrap WebSocket handshake errors in HomeAssistantAPIError
Unexpected exceptions during the WebSocket handshake (KeyError,
ValueError, TypeError from malformed messages) are now wrapped in
HomeAssistantAPIError inside WSClient.connect/connect_with_auth.
This means callers only need to catch HomeAssistantAPIError.
Remove the now-unnecessary except (RuntimeError, ValueError,
TypeError) from proxy _websocket_client and add a proper error
message to the APIError per review feedback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Narrow WebSocket handshake exception handling
Replace broad `except Exception` with specific exception types that
can actually occur during the WebSocket handshake: KeyError (missing
dict keys), ValueError (bad JSON), TypeError (non-text WS message),
aiohttp.ClientError (connection errors), and TimeoutError. This
avoids silently wrapping programming errors into HomeAssistantAPIError.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Remove unused create_mountpoint from MountBindOptions
The field was added but never used. The /run/supervisor host path
is guaranteed to exist since HAOS creates it for the Supervisor
container mount, so auto-creating the mountpoint is unnecessary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Clear stale access token before raising on final retry
Move token clear before the attempt check in connect_websocket so
the stale token is always discarded, even when raising on the final
attempt. Without this, the next call would reuse the cached bad token
via _ensure_access_token's fast path, wasting a round-trip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add tests for Unix socket communication and Core API
Add tests for the new Unix socket communication path and improve
existing test coverage:
- Version-based supports_unix_socket and env-based use_unix_socket
- api_url/ws_url transport selection
- Connection lifecycle: connected log after restart, ignoring
unrelated container events
- get_api_state/check_api_state parameterized across versions,
responses, and error cases
- make_request is_running guard and TCP flow with real token fetch
- connect_websocket for both Unix and TCP (with token verification)
- WSClient.connect/connect_with_auth handshake success, errors,
cleanup on failure, and close with pending futures
Consolidate existing tests into parameterized form and drop synthetic
tests that covered very little.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Change addons to apps in all user-facing strings
* Fix grammar in errors
* Apply suggestions from code review
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Stefan Agner <stefan@agner.ch>
---------
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Remove CLI command hint from unknown error messages
Since #6303 introduced specific error messages for many cases,
the generic "check with 'ha supervisor logs'" hint in unknown
error messages is no longer as useful. Remove the CLI command
part while keeping the "Check supervisor logs for details" rider.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Use consistently "Supervisor logs" with capitalization
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
aiohttp's BasicAuth.decode() raises ValueError for any non-Basic auth
method (e.g. Bearer tokens). This propagated as an unhandled exception,
causing a 500 response instead of the expected 401 Unauthorized.
Catch the ValueError in _process_basic() and raise HTTPUnauthorized with
the WWW-Authenticate realm header so clients get a proper 401 response.
Fixes SUPERVISOR-BFG
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Drop unsupported architectures and machines from Supervisor
Since #5620 Supervisor no longer updates the version information on
unsupported architectures and machines. This means users can no longer
update to newer version of Supervisor since that PR got released.
Furthermore since #6347 we also no longer build for these
architectures. With this, any code related to these architectures
becomes dead code and should be removed.
This commit removes all refrences to the deprecated architectures and
machines from Supervisor.
This affects the following architectures:
- armhf
- armv7
- i386
And the following machines:
- odroid-xu
- qemuarm
- qemux86
- raspberrypi
- raspberrypi2
- raspberrypi3
- raspberrypi4
- tinker
* Create issue if an app using a deprecated architecture is installed
This adds a check to the resolution system to detect if an app is
installed that uses a deprecated architecture. If so, it will show a
warning to the user and recommend them to uninstall the app.
* Formally deprecate machine add-on configs as well
Not only deprecate add-on configs for unsupported architectures, but
also for unsupported machines.
* For installed add-ons architecture must always exist
Fail hard in case of missing architecture, as this is a required field
for installed add-ons. This will prevent the Supervisor from running
with an unsupported configuration and causing further issues down the
line.
* Use verbose log output for plug-ins
All three plug-ins which support logging (dns, multicast and audio)
should use the verbose log format by default to make sure the log lines
are annotated with timestamp. Introduce a new flag default_verbose for
advanced logs.
* Use default_verbose for host logs as well
Use the new default_verbose flag for advanced logs, to make it more
explicit that we want timestamps for host logs as well.
The /os/info API endpoint has been using D-Bus property TimeUSec which got
cached between requests, so the time returned was not always the same as
current time on the host system at the time of the request. Since there's no
reason to use D-Bus API for the time, as Supervisor runs on the same machine
and time is global, simply format current datetime object with Python and
return it in the response.
Fixes#6581
* Handle missing Accept header in host logs
Avoid indexing request headers directly in the host advanced logs handler when Accept is absent, preventing KeyError crashes on valid requests without that header. Fixes SUPERVISOR-1939.
* Add pytest
* Ensure uuid of dismissed suggestion/issue matches an existing one
* Fix lint, test and feedback issues
* Adjust existing tests and remove new ones for not found errors
* fix device access issue usage
* Unify Core user listing with HomeAssistantUser model
Replace the ingress-specific IngressSessionDataUser with a general
HomeAssistantUser dataclass that models the Core config/auth/list WS
response. This deduplicates the WS call (previously in both auth.py
and module.py) into a single HomeAssistant.list_users() method.
- Add HomeAssistantUser dataclass with fields matching Core's user API
- Remove get_users() and its unnecessary 5-minute Job throttle
- Auth and ingress consumers both use HomeAssistant.list_users()
- Auth API endpoint uses typed attribute access instead of dict keys
- Migrate session serialization from legacy "displayname" to "name"
- Accept both keys in schema/deserialization for backwards compat
- Add test for loading persisted sessions with legacy displayname key
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Tighten list_users() to trust Core's auth/list contract
Core's config/auth/list WS command always returns a list, never None.
Replace the silent `if not raw: return []` (which also swallowed empty
lists) with an assert, remove the dead AuthListUsersNoneResponseError
exception class, and document the HomeAssistantWSError contract in the
docstring.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Remove | None from async_send_command return type
The WebSocket result is always set from data["result"] in _receive_json,
never explicitly to None. Remove the misleading | None from the return
type of both WSClient and HomeAssistantWebSocket async_send_command, and
drop the now-unnecessary assert in list_users.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Use HomeAssistantWSConnectionError in _ensure_connected
_ensure_connected and connect_with_auth raise on connection-level
failures, so use the more specific HomeAssistantWSConnectionError
instead of the broad HomeAssistantWSError. This allows callers to
distinguish connection errors from Core API errors (e.g. unsuccessful
WebSocket command responses). Also document that _ensure_connected can
propagate HomeAssistantAuthError from ensure_access_token.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Remove user list cache from _find_user_by_id
Drop the _list_of_users cache to avoid stale auth data in ingress
session creation. The method now fetches users fresh each time and
returns None on any API error instead of serving potentially outdated
cached results.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Raise HomeAssistantWSError when Core WebSocket is unreachable
Previously, async_send_command silently returned None when Home Assistant
Core was not reachable, leading to misleading error messages downstream
(e.g. "returned invalid response of None instead of a list of users").
Refactor _can_send to _ensure_connected which now raises
HomeAssistantWSError on connection failures while still returning False
for silent-skip cases (shutdown, unsupported version). async_send_message
catches the exception to preserve fire-and-forget behavior.
Update callers that don't handle HomeAssistantWSError: _hardware_events
and addon auto-update in tasks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Simplify HomeAssistantWebSocket command/message distinction
The WebSocket layer had a confusing split between "messages" (fire-and-forget)
and "commands" (request/response) that didn't reflect Home Assistant Core's
architecture where everything is just a WS command.
- Remove dead WSClient.async_send_message (never called)
- Rename async_send_message → _async_send_command (private, fire-and-forget)
- Rename send_message → send_command (sync wrapper)
- Simplify _ensure_connected: drop message param, always raise on failure
- Simplify async_send_command: always raise on connection errors
- Remove MIN_VERSION gating (minimum supported Core is now 2024.2+)
- Remove begin_backup/end_backup version guards for Core < 2022.1.0
- Add debug logging for silently ignored connection errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Wait for Core to come up before backup
This is crucial since the WebSocket command to Core now fails with the
new error handling if Core is not running yet.
* Wait for Core install job instead
* Use CLI to fetch jobs instead of Supervisor API
The Supervisor API needs authentication token, which we have not
available at this point in the workflow. Instead of fetching the token,
we can use the CLI, which is available in the container.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Replace the dynamic `getattr(self.sys_websession, method)(...)` pattern
with the explicit `self.sys_websession.request(method, ...)` call. This
is type-safe and avoids runtime failures from typos in method names.
Also wrap the timeout parameter in `aiohttp.ClientTimeout` for
consistency with the typed `request()` signature.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Fix MCP API proxy support for streaming and headers
This commit fixes two issues with using the core API core/api/mcp through
the API proxy:
1. **Streaming support**: The proxy now detects text/event-stream responses
and properly streams them instead of buffering all data. This is required
for MCP's Server-Sent Events (SSE) transport.
2. **Header forwarding**: Added MCP-required headers to the forwarded headers:
- Accept: Required for content negotiation
- Last-Event-ID: Required for resuming broken SSE connections
- Mcp-Session-Id: Required for session management across requests
The proxy now also preserves MCP-related response headers (Mcp-Session-Id)
and sets X-Accel-Buffering to "no" for streaming responses to prevent
buffering by intermediate proxies.
Tests added to verify:
- MCP headers are properly forwarded to Home Assistant
- Streaming responses (text/event-stream) are handled correctly
- Response headers are preserved
* Refactor: reuse stream logic for SSE responses (#3)
* Fix ruff format + cover streaming payload error
* Fix merge error
* Address review comments (headers / streaming proxy) (#4)
* Address review: header handling for streaming/non-streaming
* Forward MCP-Protocol-Version and Origin headers
* Do not forward Origin header through API proxy (#5)
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
The CpuArch enum was being used inconsistently throughout the codebase,
with some code expecting enum values and other code expecting strings.
This caused type checking issues and potential runtime errors.
Changes:
- Fix match_base() to return CpuArch enum instead of str
- Add explicit string conversions using !s formatting where arch values
are used in f-strings (build.py, model.py)
- Convert CpuArch to str explicitly in contexts requiring strings
(docker/addon.py, misc/filter.py)
- Update all tests to use CpuArch enum values instead of strings
- Update test mocks to return CpuArch enum values
This ensures type consistency and improves MyPy type checking accuracy
across the architecture detection and management code.
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The Supervisor's /core/api proxy previously only supported GET and POST
methods, returning 405 Method Not Allowed for DELETE requests. This
prevented addons from calling Home Assistant Core REST API endpoints
that require DELETE methods, such as deleting automations, scripts,
or scenes.
The underlying proxy implementation already supported passing through
any HTTP method via request.method.lower(), so only the route
registration was needed.
Fixes#6509
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Use count-based progress for Docker image pulls
Refactor Docker image pull progress to use a simpler count-based approach
where each layer contributes equally (100% / total_layers) regardless of
size. This replaces the previous size-weighted calculation that was
susceptible to progress regression.
The core issue was that Docker rate-limits concurrent downloads (~3 at a
time) and reports layer sizes only when downloading starts. With size-
weighted progress, large layers appearing late would cause progress to
drop dramatically (e.g., 59% -> 29%) as the total size increased.
The new approach:
- Each layer contributes equally to overall progress
- Per-layer progress: 70% download weight, 30% extraction weight
- Progress only starts after first "Downloading" event (when layer
count is known)
- Always caps at 99% - job completion handles final 100%
This simplifies the code by moving progress tracking to a dedicated
module (pull_progress.py) and removing complex size-based scaling logic
that tried to account for unknown layer sizes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Exclude already-existing layers from pull progress calculation
Layers that already exist locally should not count towards download
progress since there's nothing to download for them. Only layers that
need pulling are included in the progress calculation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add registry manifest fetcher for size-based pull progress
Fetch image manifests directly from container registries before pulling
to get accurate layer sizes upfront. This enables size-weighted progress
tracking where each layer contributes proportionally to its byte size,
rather than equal weight per layer.
Key changes:
- Add RegistryManifestFetcher that handles auth discovery via
WWW-Authenticate headers, token fetching with optional credentials,
and multi-arch manifest list resolution
- Update ImagePullProgress to accept manifest layer sizes via
set_manifest() and calculate size-weighted progress
- Fall back to count-based progress when manifest fetch fails
- Pre-populate layer sizes from manifest when creating layer trackers
The manifest fetcher supports ghcr.io, Docker Hub, and private
registries by using credentials from Docker config when available.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Clamp progress to 100 to prevent floating point precision issues
Floating point arithmetic in weighted progress calculations can produce
values slightly above 100 (e.g., 100.00000000000001). This causes
validation errors when the progress value is checked.
Add min(100, ...) clamping to both size-weighted and count-based
progress calculations to ensure the result never exceeds 100.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Use sys_websession for manifest fetcher instead of creating new session
Reuse the existing CoreSys websession for registry manifest requests
instead of creating a new aiohttp session. This improves performance
and follows the established pattern used throughout the codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Make platform parameter required and warn on missing platform
- Make platform a required parameter in get_manifest() and _fetch_manifest()
since it's always provided by the calling code
- Return None and log warning when requested platform is not found in
multi-arch manifest list, instead of falling back to first manifest
which could be the wrong architecture
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Log manifest fetch failures at warning level
Users will notice degraded progress tracking when manifest fetch fails,
so log at warning level to help diagnose issues.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add pylint disable comments for protected access in manifest tests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Separate download_current and total_size updates in pull progress
Update download_current and total_size independently in the DOWNLOADING
handler. This ensures download_current is updated even when total is
not yet available.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Reject invalid platform format in manifest selection
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Check frontend availability after Home Assistant Core updates
Add verification that the frontend is actually accessible at "/" after core
updates to ensure the web interface is serving properly, not just that the
API endpoints respond.
Previously, the update verification only checked API endpoints and whether
the frontend component was loaded. This could miss cases where the API is
responsive but the frontend fails to serve the UI.
Changes:
- Add check_frontend_available() method to HomeAssistantAPI that fetches
the root path and verifies it returns HTML content
- Integrate frontend check into core update verification flow after
confirming the frontend component is loaded
- Trigger automatic rollback if frontend is inaccessible after update
- Fix blocking I/O calls in rollback log file handling to use async
executor
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Avoid checking frontend if config data is None
* Improve pytest tests
* Make sure Core returns a valid config
* Remove Core version check in frontend availability test
The call site already makes sure that an actual Home Assistant Core
instance is running before calling the frontend availability test.
So this is rather redundant. Simplify the code by removing the version
check and update tests accordingly.
* Add test coverage for get_config
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Add route_metric attribute to IpProperties class
Signed-off-by: David Rapan <david@rapan.cz>
* Refactor dbus setting IP constants
Signed-off-by: David Rapan <david@rapan.cz>
* Add route metric
Signed-off-by: David Rapan <david@rapan.cz>
* Merge test_api_network_interface_info
Signed-off-by: David Rapan <david@rapan.cz>
* Add test case for route metric update
Signed-off-by: David Rapan <david@rapan.cz>
---------
Signed-off-by: David Rapan <david@rapan.cz>
* Migrate all docker container interactions to aiodocker
* Remove containers_legacy since its no longer used
* Add back remove color logic
* Revert accidental invert of conditional in setup_network
* Fix typos found by copilot
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Revert "Apply suggestions from code review"
This reverts commit 0a475433ea.
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
OS Agent will no longer support migrating to the overlay2 driver due to reasons
explained in home-assistant/os-agent#245. Remove it from the Docker API as
well.
* Remove unknown errors from addons
* Remove customized unknown error types
* Fix docker ratelimit exception and tests
* Fix stats test and add more for known errors
* Add defined error for when build fails
* Fixes from feedback
* Fix mypy issues
* Fix test failure due to rename
* Change auth reset error message
* Fix progress when using containerd snapshotter
* Add test for tiny image download under containerd-snapshotter
* Fix API tests after progress allocation change
* Fix test for auth changes
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Implement Supervisor API for home-assistant/os-agent#238, adding possibility to
schedule migration either to Containerd overlayfs driver, or migration to the
graph overlay2 driver, once the device is rebooted the next time. While it's
technically in the DBus OS interface, in Supervisor's abstraction it makes more
sense to put it under `/docker` endpoints.
Add support for `no_colors` query parameter on all advanced logs API endpoints,
allowing users to optionally strip ANSI color sequences from log output. This
complements the existing color stripping on /latest endpoints added in #6319.
* Strip ANSI escape color sequences from /latest log responses
Strip ANSI sequences of CSI commands [1] used for log coloring from
/latest log endpoints. These endpoint were primarily designed for log
downloads and colors are mostly not wanted in those. Add optional
argument for stripping the colors from the logs and enable it for the
/latest endpoints.
[1] https://en.wikipedia.org/wiki/ANSI_escape_code#CSIsection
* Refactor advanced logs' tests to use fixture factory
Introduce `advanced_logs_tester` fixture to simplify testing of advanced logs
in the API tests, declaring all the needed fixture in a single place. # Please
enter the commit message for your changes. Lines starting
* Migrate images from dockerpy to aiodocker
* Add missing coverage and fix bug in repair
* Bind libraries to different files and refactor images.pull
* Use the same socket again
Try using the same socket again.
* Fix pytest
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Avoid adding Content-Type to non-body responses
The current code sets the content-type header for all responses
to the result's content_type property if upstream does not set a
content_type. The default value for content_type is
"application/octet-stream".
For responses that do not have a body (like 204 No Content or
304 Not Modified), setting a content-type header is unnecessary and
potentially misleading. Follow HTTP standards by only adding the
content-type header to responses that actually contain a body.
* Add pytest for ingress proxy
* Preserve Content-Type header for HEAD requests in ingress API
* Formally deprecate CodeNotary build config
* Remove CodeNotary specific integrity checking
The current code is specific to how CodeNotary was doing integrity
checking. A future integrity checking mechanism likely will work
differently (e.g. through EROFS based containers). Remove the current
code to make way for a future implementation.
* Drop CodeNotary integrity fixups
* Drop unused tests
* Fix pytest
* Fix pytest
* Remove CodeNotary related exceptions and handling
Remove CodeNotary related exceptions and handling from the Docker
interface.
* Drop unnecessary comment
* Remove Codenotary specific IssueType/SuggestionType
* Drop Codenotary specific environment and secret reference
* Remove unused constants
* Introduce APIGone exception for removed APIs
Introduce a new exception class APIGone to indicate that certain API
features have been removed and are no longer available. Update the
security integrity check endpoint to raise this new exception instead
of a generic APIError, providing clearer communication to clients that
the feature has been intentionally removed.
* Drop content trust
A cosign based signature verification will likely be named differently
to avoid confusion with existing implementations. For now, remove the
content trust option entirely.
* Drop code sign test
* Remove source_mods/content_trust evaluations
* Remove content_trust reference in bootstrap.py
* Fix security tests
* Drop unused tests
* Drop codenotary from schema
Since we have "remove extra" in voluptuous, we can remove the
codenotary field from the addon schema.
* Remove content_trust from tests
* Remove content_trust unsupported reason
* Remove unnecessary comment
* Remove unrelated pytest
* Remove unrelated fixtures
* Add progress reporting to addon, HA and Supervisor updates
* Fix assert in test
* Add progress to addon, core, supervisor updates/installs
* Fix double install bug in addons install
* Remove initial_install and re-arrange order of load
* Fix range header to correctly fetch latest logs
Add a colon before line numbers to indicate that no cursor is used.
This makes the range header work when fetching latest logs from
systemd-journal-gatewayd.
* Fix pytest
* Add endpoint for complete logs of the latest container startup
Add endpoint that returns complete logs of the latest startup of
container, which can be used for downloading Core logs in the frontend.
Realtime filtering header is used for the Journal API and StartedAt
parameter from the Docker API is used as the reference point. This means
that any other Range header is ignored for this parameter, yet the
"lines" query argument can be used to limit the number of lines. By
default "infinite" number of lines is returned.
Closes#6147
* Implement fallback for latest logs for OS older than 16.0
Implement fallback which uses the internal CONTAINER_LOG_EPOCH metadata
added to logs created by the Docker logger. Still prefer the time-based
method, as it has lower overhead and using public APIs.
* Address review comments
* Only use CONTAINER_LOG_EPOCH for latest logs
As pointed out in the review comments, we might not be able to get the
StartedAt for add-ons that are not running. Thus we need to use the only
reliable mechanism available now, which is the container log epoch.
* Remove dead code for 'Range: realtime' header handling
* Store and persist OS upgrade map to fix update path evaluation
The existing logic calculated OS upgrade paths inline during fetch_data,
which will not get reevaluted when the current OS is unsupported
(JobCondition.OS_SUPPORTED). E.g. after updating from 11.4 to 11.5, the
system wouldn't offer the next available update (15.2) because the
upgrade path calculation relied on fresh data from the blocked fetch
operation.
Changes:
- Add ATTR_HASSOS_UPGRADE constant and schema validation
- Store hassos-upgrade map from version JSON in updater data
- Refactor version_hassos property to use stored upgrade map instead of
inline calculation during fetch_data
- Maintain upgrade path logic: upgrade within major version first, then
jump to next major version when at the latest in current major
- Add type safety checks for version.major access
This ensures upgrade paths work correctly even when update data refresh
is blocked due to unsupported OS versions, fixing the scenario where
HAOS 11.5 wouldn't show 15.2 as the next available update.
* Update supervisor/updater.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Address mypy issue
* Fix pytest
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add availability API for addons
* Add cast back and test for latest version of installed addon
* Make error responses more translation/client library friendly
* Add test cases for install/update APIs
* Add background option to update/install APIs
* Refactor to use common background_task utility in backups too
* Use a validation_complete event rather then looking for bus events