1
0
mirror of https://github.com/home-assistant/supervisor.git synced 2026-05-26 17:45:15 +01:00
Commit Graph

251 Commits

Author SHA1 Message Date
Stefan Agner 7fb621234e Add Unix socket support for Core communication with feature flag (#6742)
* Use Unix socket for Supervisor to Core communication

Reintroduce Unix socket support for Supervisor-to-Core communication
(reverted in #6735) with the addition of a feature flag gate. The
feature is now controlled by the `core_unix_socket` feature flag and
disabled by default.

When enabled and Core version supports it, Supervisor communicates with
Core via a Unix socket at /run/os/core.sock instead of TCP. This
eliminates the need for access token authentication on the socket path,
as Core authenticates the peer by the socket connection itself.

Key changes:
- Add FeatureFlag.CORE_UNIX_SOCKET to gate the feature
- HomeAssistantAPI: transport-aware session/url/websocket management
- WSClient: separate connect() (Unix, no auth) and connect_with_auth()
  (TCP) class methods with proper error handling
- APIProxy delegates websocket setup to api.connect_websocket()
- Container state tracking for Unix session lifecycle
- CI builder mounts /run/supervisor for integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Sort feature flags alphabetically

* Drop per-call max_msg_size from WSClient

Hardcode the WebSocket message size cap to 64 MB in WSClient and remove
the parameter from WSClient.connect, connect_with_auth, _ws_connect,
and HomeAssistantAPI.connect_websocket. This was only ever overridden
by APIProxy, so threading it through four layers was unnecessary.

max_msg_size is a cap, not a pre-allocation; aiohttp only grows buffers
to the size of actual incoming messages. Supervisor's own control
channel never approaches 64 MB, so unifying the limit has no runtime
cost.

Addresses review feedback on #6742.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 15:03:05 +02:00
Mike Degatano 56abe94d74 Add versioned v2 API with apps terminology (#6741)
* Add versioned v2 API with apps terminology

Introduce a v2 API sub-app mounted at /v2 that uses 'apps' terminology
throughout, while keeping v1 fully backward-compatible.

Key changes:
- Add ATTR_ADDONS = 'addons' constant alongside ATTR_APPS = 'apps' so
  backup file data (which must remain 'addons' for backward compat) and
  v2 API responses can use distinct constants
- Add FeatureFlag.SUPERVISOR_V2_API to gate v2 route registration
- Mount aiohttp sub-app at /v2 in RestAPI.load() when flag is enabled
- Add _AppSecurityPatterns frozen dataclass and _V1_PATTERNS/_V2_PATTERNS
  with strict per-version regex sets (no cross-version matching)
- Add _register_v2_apps, _register_v2_backups, _register_v2_store route
  registration methods
- Add v1 thin wrapper methods (*_v1) for all affected endpoints so
  business logic lives in the canonical v2 methods
- Extract _info_data() helper in APIApps so v1 closure can bypass
  @api_process and still catch APIAppNotInstalled for store routing
- Add _rename_apps_to_addons_in_backups(), _process_location_in_body(),
  _all_store_apps_info() shared helpers to eliminate duplication
- Add api_client_v2, api_client_with_prefix, app_api_client_with_root,
  store_app_api_client_with_root parameterized test fixtures
- Add test_v2_api_disabled_without_feature_flag
- Parameterize backup, addons, and store tests to cover both v1 and v2
  paths

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix pylint false positive for re.Pattern C extension methods

re.Pattern methods (match, search, etc.) are C extension methods.
Pylint cannot detect them via static analysis when re.Pattern is used
as a type annotation in a dataclass field, producing false E1101
no-member errors. Add generated-members to inform pylint these members
exist.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* pylint and feedback fixes

* Copilot suggested fixes

* Minor feedback fixes

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-20 21:19:27 +02:00
Stefan Agner 38ddb3df54 Fix Core update rollback: delay image cleanup and fix missing rollback path (#6726)
* Delay old image cleanup until after health checks on Core update

Move the old Docker image cleanup from inside _update() to after the
post-update health checks (frontend loaded and accessible). This keeps
the previous version's image available locally when a rollback is
needed, avoiding a potentially slow re-download.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add test assertions for old image cleanup timing on Core update

Verify that the old Docker image is cleaned up only after health checks
pass, and not when a rollback is triggered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix missing rollback when get_config fails after Core update

The early return after setting error_state skipped the rollback block,
leaving the system on a broken new version when the API stopped
responding after update. The other health check failure paths correctly
fall through to the rollback logic; this was the only one that didn't.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 10:57:13 +02:00
Stefan Agner a504d85745 Remove double newlines from build and check config output (#6743)
* Remove double newlines from build output

The log lines from run command already have newline characters, so
joining them with "\n" adds extra newlines. Joining them with an empty
string preserves the original formatting of the logs.

* Remove double newlines from check config output

The log lines from run command already have newline characters, so
joining them with "\n" adds extra newlines. Joining them with an empty
string preserves the original formatting of the logs.

* Fix pytest
2026-04-16 17:18:05 +02:00
Mike Degatano 1218326af3 Add development feature toggle system (#6719)
* Add experimental feature toggle system

Introduces an ExperimentalFeature enum and feature_flags config to allow
toggling experimental features via the supervisor options API. The first
feature flag is 'supervisor_v2_api' to gate the upcoming V2 API.

Absent keys in options request = no change (partial update, consistent
with existing options APIs). The info endpoint always returns all known
feature flags and their current state for discoverability.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* ExperimentalFeature -> FeatureFlag

* Use explicit value of StrEnum to be typesafe

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Minor comment improvement

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-15 13:13:45 +02:00
Mike Degatano ba8c49935b Refactor internal addon references to app/apps (#6717)
* Rename addon→app in docstrings and comments

Updates all docstrings and inline comments across supervisor/ and
tests/ to use the new app/apps terminology. No runtime behaviour
is changed by this commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rename addon→app in code (variables, args, class names, functions)

Renames all internal Python identifiers from addon/addons to app/apps:
- Variable and argument names
- Function and method names
- Class names (Addon→App, AddonManager→AppManager, DockerAddon→DockerApp,
  all exception, check, and fixup classes, etc.)
- String literals used as Python identifiers (pytest fixtures,
  parametrize param names, patch.object attribute strings,
  URL route match_info keys)

External API contracts are preserved: JSON keys, error codes,
discovery protocol fields, TypedDict/attr.s field names.
Import module paths (supervisor/addons/) are also unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix partial backup/restore API to remap addons key to apps

The external API accepts `addons` as the request body key (since
ATTR_APPS = "addons"), but do_backup_partial and do_restore_partial
now take an `apps` parameter after the rename. The **body expansion
in both endpoints would pass `addons=...` causing a TypeError.

Remap the key before expansion in both backup_partial and
restore_partial:

    if ATTR_APPS in body:
        body["apps"] = body.pop(ATTR_APPS)

Also adds test_restore_partial_with_addons_key to verify the restore
path correctly receives apps= when addons is passed in the request
body. This path had no existing test coverage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix merge error

* Adjust AppLoggerAdapter to use app_name

Co-authored-by: Stefan Agner <stefan@agner.ch>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
2026-04-14 16:47:20 +02:00
Stefan Agner 5c5428fde3 Revert "Use Unix socket for Supervisor to Core communication (#6590)" (#6735)
This reverts commit 28fa0b35bd.
2026-04-14 12:28:02 +02:00
Stefan Agner 28fa0b35bd Use Unix socket for Supervisor to Core communication (#6590)
* Use Unix socket for Supervisor to Core communication

Switch internal Supervisor-to-Core HTTP and WebSocket communication
from TCP (port 8123) to a Unix domain socket.

The existing /run/supervisor directory on the host (already mounted
at /run/os inside the Supervisor container) is bind-mounted into the
Core container at /run/supervisor. Core receives the socket path via
the SUPERVISOR_CORE_API_SOCKET environment variable, creates the
socket there, and Supervisor connects to it via aiohttp.UnixConnector
at /run/os/core.sock.

Since the Unix socket is only reachable by processes on the same host,
requests arriving over it are implicitly trusted and authenticated as
the existing Supervisor system user. This removes the token round-trip
where Supervisor had to obtain and send Bearer tokens on every Core
API call. WebSocket connections are likewise authenticated implicitly,
skipping the auth_required/auth handshake.

Key design decisions:
- Version-gated by CORE_UNIX_SOCKET_MIN_VERSION so older Core
  versions transparently continue using TCP with token auth
- LANDINGPAGE is explicitly excluded (not a CalVer version)
- Hard-fails with a clear error if the socket file is unexpectedly
  missing when Unix socket communication is expected
- WSClient.connect() for Unix socket (no auth) and
  WSClient.connect_with_auth() for TCP (token auth) separate the
  two connection modes cleanly
- Token refresh always uses the TCP websession since it is inherently
  a TCP/Bearer-auth operation
- Logs which transport (Unix socket vs TCP) is being used on first
  request

Closes #6626
Related Core PR: home-assistant/core#163907

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Close WebSocket on handshake failure and validate auth_required

Ensure the underlying WebSocket connection is closed before raising
when the handshake produces an unexpected message. Also validate that
the first TCP message is auth_required before sending credentials.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix pylint protected-access warnings in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Check running container env before using Unix socket

Split use_unix_socket into two properties to handle the Supervisor
upgrade transition where Core is still running with a container
started by the old Supervisor (without SUPERVISOR_CORE_API_SOCKET):

- supports_unix_socket: version check only, used when creating the
  Core container to decide whether to set the env var
- use_unix_socket: version check + running container env check, used
  for communication decisions

This ensures TCP fallback during the upgrade transition while still
hard-failing if the socket is missing after Supervisor configured
Core to use it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Improve Core API communication logging and error handling

- Remove transport log from make_request that logged before Core
  container was attached, causing misleading connection logs
- Log "Connected to Core via ..." once on first successful API response
  in get_api_state, when the transport is actually known
- Remove explicit socket existence check from session property, let
  aiohttp UnixConnector produce natural connection errors during
  Core startup (same as TCP connection refused)
- Add validation in get_core_state matching get_config pattern
- Restore make_request docstring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Guard Core API requests with container running check

Add is_running() check to make_request and connect_websocket so no
HTTP or WebSocket connection is attempted when the Core container is
not running. This avoids misleading connection attempts during
Supervisor startup before Core is ready.

Also make use_unix_socket raise if container metadata is not available
instead of silently falling back to TCP. This is a defensive check
since is_running() guards should prevent reaching this state.

Add attached property to DockerInterface to expose whether container
metadata has been loaded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Reset Core API connection state on container stop

Listen for Core container STOPPED/FAILED events to reset the
connection state: clear the _core_connected flag so the transport
is logged again on next successful connection, and close any stale
Unix socket session.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Only mount /run/supervisor if we use it

* Fix pytest errors

* Remove redundant is_running check from ingress panel update

The is_running() guard in update_hass_panel is now redundant since
make_request checks is_running() internally. Also mock is_running
in the websession test fixture since tests using it need make_request
to proceed past the container running check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Bind mount /run/supervisor to Supervisor /run/os

Home Assistant OS (as well as the Supervised run scripts) bind mount
/run/supervisor to /run/os in Supervisor. Since we reuse this location
for the communication socket between Supervisor and Core, we need to
also bind mount /run/supervisor to Supervisor /run/os in CI.

* Wrap WebSocket handshake errors in HomeAssistantAPIError

Unexpected exceptions during the WebSocket handshake (KeyError,
ValueError, TypeError from malformed messages) are now wrapped in
HomeAssistantAPIError inside WSClient.connect/connect_with_auth.
This means callers only need to catch HomeAssistantAPIError.

Remove the now-unnecessary except (RuntimeError, ValueError,
TypeError) from proxy _websocket_client and add a proper error
message to the APIError per review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Narrow WebSocket handshake exception handling

Replace broad `except Exception` with specific exception types that
can actually occur during the WebSocket handshake: KeyError (missing
dict keys), ValueError (bad JSON), TypeError (non-text WS message),
aiohttp.ClientError (connection errors), and TimeoutError. This
avoids silently wrapping programming errors into HomeAssistantAPIError.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove unused create_mountpoint from MountBindOptions

The field was added but never used. The /run/supervisor host path
is guaranteed to exist since HAOS creates it for the Supervisor
container mount, so auto-creating the mountpoint is unnecessary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Clear stale access token before raising on final retry

Move token clear before the attempt check in connect_websocket so
the stale token is always discarded, even when raising on the final
attempt. Without this, the next call would reuse the cached bad token
via _ensure_access_token's fast path, wasting a round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add tests for Unix socket communication and Core API

Add tests for the new Unix socket communication path and improve
existing test coverage:

- Version-based supports_unix_socket and env-based use_unix_socket
- api_url/ws_url transport selection
- Connection lifecycle: connected log after restart, ignoring
  unrelated container events
- get_api_state/check_api_state parameterized across versions,
  responses, and error cases
- make_request is_running guard and TCP flow with real token fetch
- connect_websocket for both Unix and TCP (with token verification)
- WSClient.connect/connect_with_auth handshake success, errors,
  cleanup on failure, and close with pending futures

Consolidate existing tests into parameterized form and drop synthetic
tests that covered very little.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:09:38 +02:00
Mike Degatano 941f7cd2be Change addons to apps in all user-facing strings (#6696)
* Change addons to apps in all user-facing strings

* Fix grammar in errors

* Apply suggestions from code review

Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Stefan Agner <stefan@agner.ch>

---------

Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
2026-04-07 18:54:40 +02:00
Stefan Agner 667bd62742 Remove CLI command hint from unknown error messages (#6684)
* Remove CLI command hint from unknown error messages

Since #6303 introduced specific error messages for many cases,
the generic "check with 'ha supervisor logs'" hint in unknown
error messages is no longer as useful. Remove the CLI command
part while keeping the "Check supervisor logs for details" rider.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Use consistently "Supervisor logs" with capitalization

Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
2026-03-31 18:09:14 +02:00
Stefan Agner 9e0d3fe461 Return 401 for non-Basic Authorization headers on /auth endpoint (#6612)
aiohttp's BasicAuth.decode() raises ValueError for any non-Basic auth
method (e.g. Bearer tokens). This propagated as an unhandled exception,
causing a 500 response instead of the expected 401 Unauthorized.

Catch the ValueError in _process_basic() and raise HTTPUnauthorized with
the WWW-Authenticate realm header so clients get a proper 401 response.

Fixes SUPERVISOR-BFG

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 15:55:49 -05:00
Stefan Agner 0ef71d1dd1 Drop unsupported architectures and machines, create issue for affected apps (#6607)
* Drop unsupported architectures and machines from Supervisor

Since #5620 Supervisor no longer updates the version information on
unsupported architectures and machines. This means users can no longer
update to newer version of Supervisor since that PR got released.
Furthermore since #6347 we also no longer build for these
architectures. With this, any code related to these architectures
becomes dead code and should be removed.

This commit removes all refrences to the deprecated architectures and
machines from Supervisor.

This affects the following architectures:
- armhf
- armv7
- i386

And the following machines:
- odroid-xu
- qemuarm
- qemux86
- raspberrypi
- raspberrypi2
- raspberrypi3
- raspberrypi4
- tinker

* Create issue if an app using a deprecated architecture is installed

This adds a check to the resolution system to detect if an app is
installed that uses a deprecated architecture. If so, it will show a
warning to the user and recommend them to uninstall the app.

* Formally deprecate machine add-on configs as well

Not only deprecate add-on configs for unsupported architectures, but
also for unsupported machines.

* For installed add-ons architecture must always exist

Fail hard in case of missing architecture, as this is a required field
for installed add-ons. This will prevent the Supervisor from running
with an unsupported configuration and causing further issues down the
line.
2026-03-04 10:59:14 +01:00
Stefan Agner 2627d55873 Add default verbose timestamps for plugin logs (#6598)
* Use verbose log output for plug-ins

All three plug-ins which support logging (dns, multicast and audio)
should use the verbose log format by default to make sure the log lines
are annotated with timestamp. Introduce a new flag default_verbose for
advanced logs.

* Use default_verbose for host logs as well

Use the new default_verbose flag for advanced logs, to make it more
explicit that we want timestamps for host logs as well.
2026-03-03 11:58:11 +01:00
Jan Čermák 6a955527f3 Ensure dt_utc in /os/info always returns current time (#6602)
The /os/info API endpoint has been using D-Bus property TimeUSec which got
cached between requests, so the time returned was not always the same as
current time on the host system at the time of the request. Since there's no
reason to use D-Bus API for the time, as Supervisor runs on the same machine
and time is global, simply format current datetime object with Python and
return it in the response.

Fixes #6581
2026-02-27 17:59:11 +01:00
Stefan Agner 7f6327e94e Handle missing Accept header in host logs (#6594)
* Handle missing Accept header in host logs

Avoid indexing request headers directly in the host advanced logs handler when Accept is absent, preventing KeyError crashes on valid requests without that header. Fixes SUPERVISOR-1939.

* Add pytest
2026-02-26 11:30:08 +01:00
Mike Degatano 9f00b6e34f Ensure uuid of dismissed suggestion/issue matches an existing one (#6582)
* Ensure uuid of dismissed suggestion/issue matches an existing one

* Fix lint, test and feedback issues

* Adjust existing tests and remove new ones for not found errors

* fix device access issue usage
2026-02-25 10:26:44 +01:00
Stefan Agner 3147d080a2 Unify Core user handling with HomeAssistantUser model (#6558)
* Unify Core user listing with HomeAssistantUser model

Replace the ingress-specific IngressSessionDataUser with a general
HomeAssistantUser dataclass that models the Core config/auth/list WS
response. This deduplicates the WS call (previously in both auth.py
and module.py) into a single HomeAssistant.list_users() method.

- Add HomeAssistantUser dataclass with fields matching Core's user API
- Remove get_users() and its unnecessary 5-minute Job throttle
- Auth and ingress consumers both use HomeAssistant.list_users()
- Auth API endpoint uses typed attribute access instead of dict keys
- Migrate session serialization from legacy "displayname" to "name"
- Accept both keys in schema/deserialization for backwards compat
- Add test for loading persisted sessions with legacy displayname key

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Tighten list_users() to trust Core's auth/list contract

Core's config/auth/list WS command always returns a list, never None.
Replace the silent `if not raw: return []` (which also swallowed empty
lists) with an assert, remove the dead AuthListUsersNoneResponseError
exception class, and document the HomeAssistantWSError contract in the
docstring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Remove | None from async_send_command return type

The WebSocket result is always set from data["result"] in _receive_json,
never explicitly to None. Remove the misleading | None from the return
type of both WSClient and HomeAssistantWebSocket async_send_command, and
drop the now-unnecessary assert in list_users.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Use HomeAssistantWSConnectionError in _ensure_connected

_ensure_connected and connect_with_auth raise on connection-level
failures, so use the more specific HomeAssistantWSConnectionError
instead of the broad HomeAssistantWSError. This allows callers to
distinguish connection errors from Core API errors (e.g. unsuccessful
WebSocket command responses). Also document that _ensure_connected can
propagate HomeAssistantAuthError from ensure_access_token.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Remove user list cache from _find_user_by_id

Drop the _list_of_users cache to avoid stale auth data in ingress
session creation. The method now fetches users fresh each time and
returns None on any API error instead of serving potentially outdated
cached results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 18:31:08 +01:00
Stefan Agner da800b8889 Simplify HomeAssistantWebSocket and raise on connection errors (#6553)
* Raise HomeAssistantWSError when Core WebSocket is unreachable

Previously, async_send_command silently returned None when Home Assistant
Core was not reachable, leading to misleading error messages downstream
(e.g. "returned invalid response of None instead of a list of users").

Refactor _can_send to _ensure_connected which now raises
HomeAssistantWSError on connection failures while still returning False
for silent-skip cases (shutdown, unsupported version). async_send_message
catches the exception to preserve fire-and-forget behavior.

Update callers that don't handle HomeAssistantWSError: _hardware_events
and addon auto-update in tasks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Simplify HomeAssistantWebSocket command/message distinction

The WebSocket layer had a confusing split between "messages" (fire-and-forget)
and "commands" (request/response) that didn't reflect Home Assistant Core's
architecture where everything is just a WS command.

- Remove dead WSClient.async_send_message (never called)
- Rename async_send_message → _async_send_command (private, fire-and-forget)
- Rename send_message → send_command (sync wrapper)
- Simplify _ensure_connected: drop message param, always raise on failure
- Simplify async_send_command: always raise on connection errors
- Remove MIN_VERSION gating (minimum supported Core is now 2024.2+)
- Remove begin_backup/end_backup version guards for Core < 2022.1.0
- Add debug logging for silently ignored connection errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Wait for Core to come up before backup

This is crucial since the WebSocket command to Core now fails with the
new error handling if Core is not running yet.

* Wait for Core install job instead

* Use CLI to fetch jobs instead of Supervisor API

The Supervisor API needs authentication token, which we have not
available at this point in the workflow. Instead of fetching the token,
we can use the CLI, which is available in the container.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 09:20:23 +01:00
Stefan Agner 66228f976d Use session.request() instead of getattr dispatch in HomeAssistantAPI (#6541)
Replace the dynamic `getattr(self.sys_websession, method)(...)` pattern
with the explicit `self.sys_websession.request(method, ...)` call. This
is type-safe and avoids runtime failures from typos in method names.

Also wrap the timeout parameter in `aiohttp.ClientTimeout` for
consistency with the typed `request()` signature.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 09:43:55 +01:00
Tom Quist 4d8d44721d Fix MCP API proxy support for streaming and headers (#6461)
* Fix MCP API proxy support for streaming and headers

This commit fixes two issues with using the core API core/api/mcp through
the API proxy:

1. **Streaming support**: The proxy now detects text/event-stream responses
   and properly streams them instead of buffering all data. This is required
   for MCP's Server-Sent Events (SSE) transport.

2. **Header forwarding**: Added MCP-required headers to the forwarded headers:
   - Accept: Required for content negotiation
   - Last-Event-ID: Required for resuming broken SSE connections
   - Mcp-Session-Id: Required for session management across requests

The proxy now also preserves MCP-related response headers (Mcp-Session-Id)
and sets X-Accel-Buffering to "no" for streaming responses to prevent
buffering by intermediate proxies.

Tests added to verify:
- MCP headers are properly forwarded to Home Assistant
- Streaming responses (text/event-stream) are handled correctly
- Response headers are preserved

* Refactor: reuse stream logic for SSE responses (#3)

* Fix ruff format + cover streaming payload error

* Fix merge error

* Address review comments (headers / streaming proxy) (#4)

* Address review: header handling for streaming/non-streaming

* Forward MCP-Protocol-Version and Origin headers

* Do not forward Origin header through API proxy (#5)

---------

Co-authored-by: Stefan Agner <stefan@agner.ch>
2026-02-04 17:28:11 +01:00
Stefan Agner a849050369 Improve CpuArch type safety with explicit conversions (#6524)
The CpuArch enum was being used inconsistently throughout the codebase,
with some code expecting enum values and other code expecting strings.
This caused type checking issues and potential runtime errors.

Changes:
- Fix match_base() to return CpuArch enum instead of str
- Add explicit string conversions using !s formatting where arch values
  are used in f-strings (build.py, model.py)
- Convert CpuArch to str explicitly in contexts requiring strings
  (docker/addon.py, misc/filter.py)
- Update all tests to use CpuArch enum values instead of strings
- Update test mocks to return CpuArch enum values

This ensures type consistency and improves MyPy type checking accuracy
across the architecture detection and management code.

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 11:34:23 +01:00
Stefan Agner 7ad9a911e8 Add DELETE method support to /core/api proxy (#6521)
The Supervisor's /core/api proxy previously only supported GET and POST
methods, returning 405 Method Not Allowed for DELETE requests. This
prevented addons from calling Home Assistant Core REST API endpoints
that require DELETE methods, such as deleting automations, scripts,
or scenes.

The underlying proxy implementation already supported passing through
any HTTP method via request.method.lower(), so only the route
registration was needed.

Fixes #6509

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 11:51:59 +01:00
Stefan Agner 6957341c3e Refactor Docker pull progress with registry manifest fetcher (#6379)
* Use count-based progress for Docker image pulls

Refactor Docker image pull progress to use a simpler count-based approach
where each layer contributes equally (100% / total_layers) regardless of
size. This replaces the previous size-weighted calculation that was
susceptible to progress regression.

The core issue was that Docker rate-limits concurrent downloads (~3 at a
time) and reports layer sizes only when downloading starts. With size-
weighted progress, large layers appearing late would cause progress to
drop dramatically (e.g., 59% -> 29%) as the total size increased.

The new approach:
- Each layer contributes equally to overall progress
- Per-layer progress: 70% download weight, 30% extraction weight
- Progress only starts after first "Downloading" event (when layer
  count is known)
- Always caps at 99% - job completion handles final 100%

This simplifies the code by moving progress tracking to a dedicated
module (pull_progress.py) and removing complex size-based scaling logic
that tried to account for unknown layer sizes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Exclude already-existing layers from pull progress calculation

Layers that already exist locally should not count towards download
progress since there's nothing to download for them. Only layers that
need pulling are included in the progress calculation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add registry manifest fetcher for size-based pull progress

Fetch image manifests directly from container registries before pulling
to get accurate layer sizes upfront. This enables size-weighted progress
tracking where each layer contributes proportionally to its byte size,
rather than equal weight per layer.

Key changes:
- Add RegistryManifestFetcher that handles auth discovery via
  WWW-Authenticate headers, token fetching with optional credentials,
  and multi-arch manifest list resolution
- Update ImagePullProgress to accept manifest layer sizes via
  set_manifest() and calculate size-weighted progress
- Fall back to count-based progress when manifest fetch fails
- Pre-populate layer sizes from manifest when creating layer trackers

The manifest fetcher supports ghcr.io, Docker Hub, and private
registries by using credentials from Docker config when available.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Clamp progress to 100 to prevent floating point precision issues

Floating point arithmetic in weighted progress calculations can produce
values slightly above 100 (e.g., 100.00000000000001). This causes
validation errors when the progress value is checked.

Add min(100, ...) clamping to both size-weighted and count-based
progress calculations to ensure the result never exceeds 100.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Use sys_websession for manifest fetcher instead of creating new session

Reuse the existing CoreSys websession for registry manifest requests
instead of creating a new aiohttp session. This improves performance
and follows the established pattern used throughout the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Make platform parameter required and warn on missing platform

- Make platform a required parameter in get_manifest() and _fetch_manifest()
  since it's always provided by the calling code
- Return None and log warning when requested platform is not found in
  multi-arch manifest list, instead of falling back to first manifest
  which could be the wrong architecture

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Log manifest fetch failures at warning level

Users will notice degraded progress tracking when manifest fetch fails,
so log at warning level to help diagnose issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add pylint disable comments for protected access in manifest tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Separate download_current and total_size updates in pull progress

Update download_current and total_size independently in the DOWNLOADING
handler. This ensures download_current is updated even when total is
not yet available.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Reject invalid platform format in manifest selection

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-02 15:56:24 +01:00
dependabot[bot] 2a4890e2b0 Bump aiodocker from 0.24.0 to 0.25.0 (#6448)
* Bump aiodocker from 0.24.0 to 0.25.0

Bumps [aiodocker](https://github.com/aio-libs/aiodocker) from 0.24.0 to 0.25.0.
- [Release notes](https://github.com/aio-libs/aiodocker/releases)
- [Changelog](https://github.com/aio-libs/aiodocker/blob/main/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiodocker/compare/v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: aiodocker
  dependency-version: 0.25.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update to new timeout configuration

* Fix pytest failure

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
2026-01-30 09:39:06 +01:00
Stefan Agner a2db716a5f Check frontend availability after Home Assistant Core updates (#6311)
* Check frontend availability after Home Assistant Core updates

Add verification that the frontend is actually accessible at "/" after core
updates to ensure the web interface is serving properly, not just that the
API endpoints respond.

Previously, the update verification only checked API endpoints and whether
the frontend component was loaded. This could miss cases where the API is
responsive but the frontend fails to serve the UI.

Changes:
- Add check_frontend_available() method to HomeAssistantAPI that fetches
  the root path and verifies it returns HTML content
- Integrate frontend check into core update verification flow after
  confirming the frontend component is loaded
- Trigger automatic rollback if frontend is inaccessible after update
- Fix blocking I/O calls in rollback log file handling to use async
  executor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Avoid checking frontend if config data is None

* Improve pytest tests

* Make sure Core returns a valid config

* Remove Core version check in frontend availability test

The call site already makes sure that an actual Home Assistant Core
instance is running before calling the frontend availability test.
So this is rather redundant. Simplify the code by removing the version
check and update tests accordingly.

* Add test coverage for get_config

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-01-29 09:06:45 +01:00
David Rapan 641b205ee7 Add configurable interface route metric (#6447)
* Add route_metric attribute to IpProperties class

Signed-off-by: David Rapan <david@rapan.cz>

* Refactor dbus setting IP constants

Signed-off-by: David Rapan <david@rapan.cz>

* Add route metric

Signed-off-by: David Rapan <david@rapan.cz>

* Merge test_api_network_interface_info

Signed-off-by: David Rapan <david@rapan.cz>

* Add test case for route metric update

Signed-off-by: David Rapan <david@rapan.cz>

---------

Signed-off-by: David Rapan <david@rapan.cz>
2026-01-28 13:08:36 +01:00
AlCalzone df8201ca33 Update get_docker_args() to return mounts not volumes (#6499)
* Update `get_docker_args()` to return `mounts` not `volumes`

* fix more mocks to return PurePaths
2026-01-27 15:00:33 -05:00
Mike Degatano 909a2dda2f Migrate (almost) all docker container interactions to aiodocker (#6489)
* Migrate all docker container interactions to aiodocker

* Remove containers_legacy since its no longer used

* Add back remove color logic

* Revert accidental invert of conditional in setup_network

* Fix typos found by copilot

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert "Apply suggestions from code review"

This reverts commit 0a475433ea.

---------

Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 12:42:17 +01:00
Mike Degatano 1d1a8cdad3 Add API to force repository repair (#6439)
* Add API to force repository repair

* Fix inheritance for error

* Fix absolute import
2026-01-06 16:01:48 +01:00
Mike Degatano d23bc291d5 Migrate create container to aiodocker (#6415)
* Migrate create container to aiodocker

* Fix extra hosts transformation

* Env not Environment

* Fix tests

* Fixes from feedback

---------

Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
2025-12-15 09:57:30 +01:00
Jan Čermák cd4e7f2530 Remove the option to revert to overlay2 driver (#6399)
OS Agent will no longer support migrating to the overlay2 driver due to reasons
explained in home-assistant/os-agent#245. Remove it from the Docker API as
well.
2025-12-05 14:45:56 +01:00
Stefan Agner 5d02b09a0d Fix addon options reset to defaults (#6397)
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 13:53:51 +01:00
Mike Degatano 81b7e54b18 Remove unknown errors from addons and auth (#6303)
* Remove unknown errors from addons

* Remove customized unknown error types

* Fix docker ratelimit exception and tests

* Fix stats test and add more for known errors

* Add defined error for when build fails

* Fixes from feedback

* Fix mypy issues

* Fix test failure due to rename

* Change auth reset error message
2025-12-03 18:11:51 +01:00
Stefan Agner fa490210cd Improve CpuArch type safety across codebase (#6372)
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-01 19:56:05 +01:00
Mike Degatano 6302c7d394 Fix progress when using containerd snapshotter (#6357)
* Fix progress when using containerd snapshotter

* Add test for tiny image download under containerd-snapshotter

* Fix API tests after progress allocation change

* Fix test for auth changes

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-27 16:26:22 +01:00
Jan Čermák f55fd891e9 Add API endpoint for migrating Docker storage driver (#6361)
Implement Supervisor API for home-assistant/os-agent#238, adding possibility to
schedule migration either to Containerd overlayfs driver, or migration to the
graph overlay2 driver, once the device is rebooted the next time. While it's
technically in the DBus OS interface, in Supervisor's abstraction it makes more
sense to put it under `/docker` endpoints.
2025-11-27 16:02:39 +01:00
Jan Čermák 5ed0c85168 Add optional no_colors query parameter to advanced logs endpoints (#6326)
Add support for `no_colors` query parameter on all advanced logs API endpoints,
allowing users to optionally strip ANSI color sequences from log output. This
complements the existing color stripping on /latest endpoints added in #6319.
2025-11-21 09:29:15 +01:00
Jan Čermák 0837e05cb2 Strip ANSI escape color sequences from /latest log responses (#6319)
* Strip ANSI escape color sequences from /latest log responses

Strip ANSI sequences of CSI commands [1] used for log coloring from
/latest log endpoints. These endpoint were primarily designed for log
downloads and colors are mostly not wanted in those. Add optional
argument for stripping the colors from the logs and enable it for the
/latest endpoints.

[1] https://en.wikipedia.org/wiki/ANSI_escape_code#CSIsection

* Refactor advanced logs' tests to use fixture factory

Introduce `advanced_logs_tester` fixture to simplify testing of advanced logs
in the API tests, declaring all the needed fixture in a single place. # Please
enter the commit message for your changes. Lines starting
2025-11-19 09:39:24 +01:00
Mike Degatano 30cc172199 Migrate images from dockerpy to aiodocker (#6252)
* Migrate images from dockerpy to aiodocker

* Add missing coverage and fix bug in repair

* Bind libraries to different files and refactor images.pull

* Use the same socket again

Try using the same socket again.

* Fix pytest

---------

Co-authored-by: Stefan Agner <stefan@agner.ch>
2025-11-12 20:54:06 +01:00
Stefan Agner 91a9cb98c3 Avoid adding Content-Type to non-body responses (#6266)
* Avoid adding Content-Type to non-body responses

The current code sets the content-type header for all responses
to the result's content_type property if upstream does not set a
content_type. The default value for content_type is
"application/octet-stream".

For responses that do not have a body (like 204 No Content or
304 Not Modified), setting a content-type header is unnecessary and
potentially misleading. Follow HTTP standards by only adding the
content-type header to responses that actually contain a body.

* Add pytest for ingress proxy

* Preserve Content-Type header for HEAD requests in ingress API
2025-11-10 17:39:10 +01:00
Stefan Agner 1448a33dbf Remove Codenotary integrity check (#6236)
* Formally deprecate CodeNotary build config

* Remove CodeNotary specific integrity checking

The current code is specific to how CodeNotary was doing integrity
checking. A future integrity checking mechanism likely will work
differently (e.g. through EROFS based containers). Remove the current
code to make way for a future implementation.

* Drop CodeNotary integrity fixups

* Drop unused tests

* Fix pytest

* Fix pytest

* Remove CodeNotary related exceptions and handling

Remove CodeNotary related exceptions and handling from the Docker
interface.

* Drop unnecessary comment

* Remove Codenotary specific IssueType/SuggestionType

* Drop Codenotary specific environment and secret reference

* Remove unused constants

* Introduce APIGone exception for removed APIs

Introduce a new exception class APIGone to indicate that certain API
features have been removed and are no longer available. Update the
security integrity check endpoint to raise this new exception instead
of a generic APIError, providing clearer communication to clients that
the feature has been intentionally removed.

* Drop content trust

A cosign based signature verification will likely be named differently
to avoid confusion with existing implementations. For now, remove the
content trust option entirely.

* Drop code sign test

* Remove source_mods/content_trust evaluations

* Remove content_trust reference in bootstrap.py

* Fix security tests

* Drop unused tests

* Drop codenotary from schema

Since we have "remove extra" in voluptuous, we can remove the
codenotary field from the addon schema.

* Remove content_trust from tests

* Remove content_trust unsupported reason

* Remove unnecessary comment

* Remove unrelated pytest

* Remove unrelated fixtures
2025-11-03 20:13:15 +01:00
Mike Degatano 190b734332 Add progress reporting to addon, HA and Supervisor updates (#6195)
* Add progress reporting to addon, HA and Supervisor updates

* Fix assert in test

* Add progress to addon, core, supervisor updates/installs

* Fix double install bug in addons install

* Remove initial_install and re-arrange order of load
2025-10-07 16:54:11 +02:00
Mike Degatano 64f94a159c Add progress syncing from child jobs (#6207)
* Add progress syncing from child jobs

* Fix pylint issue

* Set initial progress from parent and end at 100
2025-09-30 14:52:16 -04:00
Mike Degatano 42f93d0176 Remove message_template field from errors (#6205) 2025-09-23 17:07:38 +02:00
Stefan Agner ed7155604c Fix range header to correctly fetch latest logs (#6202)
* Fix range header to correctly fetch latest logs

Add a colon before line numbers to indicate that no cursor is used.
This makes the range header work when fetching latest logs from
systemd-journal-gatewayd.

* Fix pytest
2025-09-23 16:43:20 +02:00
Jan Čermák 2e22e1e884 Add endpoint for complete logs of the latest container startup (#6163)
* Add endpoint for complete logs of the latest container startup

Add endpoint that returns complete logs of the latest startup of
container, which can be used for downloading Core logs in the frontend.

Realtime filtering header is used for the Journal API and StartedAt
parameter from the Docker API is used as the reference point. This means
that any other Range header is ignored for this parameter, yet the
"lines" query argument can be used to limit the number of lines. By
default "infinite" number of lines is returned.

Closes #6147

* Implement fallback for latest logs for OS older than 16.0

Implement fallback which uses the internal CONTAINER_LOG_EPOCH metadata
added to logs created by the Docker logger. Still prefer the time-based
method, as it has lower overhead and using public APIs.

* Address review comments

* Only use CONTAINER_LOG_EPOCH for latest logs

As pointed out in the review comments, we might not be able to get the
StartedAt for add-ons that are not running. Thus we need to use the only
reliable mechanism available now, which is the container log epoch.

* Remove dead code for 'Range: realtime' header handling
2025-09-16 11:29:28 +02:00
Stefan Agner c277f3cad6 Store and persist OS upgrade map to fix update path evaluation (#6152)
* Store and persist OS upgrade map to fix update path evaluation

The existing logic calculated OS upgrade paths inline during fetch_data,
which will not get reevaluted when the current OS is unsupported
(JobCondition.OS_SUPPORTED). E.g. after updating from 11.4 to 11.5, the
system wouldn't offer the next available update (15.2) because the
upgrade path calculation relied on fresh data from the blocked fetch
operation.

Changes:
- Add ATTR_HASSOS_UPGRADE constant and schema validation
- Store hassos-upgrade map from version JSON in updater data
- Refactor version_hassos property to use stored upgrade map instead of
  inline calculation during fetch_data
- Maintain upgrade path logic: upgrade within major version first, then
  jump to next major version when at the latest in current major
- Add type safety checks for version.major access

This ensures upgrade paths work correctly even when update data refresh
is blocked due to unsupported OS versions, fixing the scenario where
HAOS 11.5 wouldn't show 15.2 as the next available update.

* Update supervisor/updater.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address mypy issue

* Fix pytest

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-04 13:19:31 +02:00
Igor Yamolov 236c39cbb0 Add network interface settings for mDNS/LLMNR (#5520) 2025-09-04 13:18:11 +02:00
Mike Degatano 7ed83a15fe Add availability API for addons (#6140)
* Add availability API for addons

* Add cast back and test for latest version of installed addon

* Make error responses more translation/client library friendly

* Add test cases for install/update APIs
2025-09-04 11:14:42 +02:00
Mike Degatano 9392d10625 Add background option to update/install APIs (#6134)
* Add background option to update/install APIs

* Refactor to use common background_task utility in backups too

* Use a validation_complete event rather then looking for bus events
2025-09-03 08:33:00 +02:00