supervisor

mirror of https://github.com/home-assistant/supervisor.git synced 2026-05-18 21:58:52 +01:00

Author	SHA1	Message	Date
dependabot[bot]	39f8a3d116	Bump urllib3 from 2.6.3 to 2.7.0 (#6822 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.6.3 to 2.7.0. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.6.3...2.7.0) --- updated-dependencies: - dependency-name: urllib3 dependency-version: 2.7.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-08 10:49:07 +02:00
dependabot[bot]	5141178e7c	Bump types-pyyaml from 6.0.12.20260408 to 6.0.12.20260508 (#6823 ) Bumps [types-pyyaml](https://github.com/python/typeshed) from 6.0.12.20260408 to 6.0.12.20260508. - [Commits](https://github.com/python/typeshed/commits) --- updated-dependencies: - dependency-name: types-pyyaml dependency-version: 6.0.12.20260508 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-08 09:57:42 +02:00
Stefan Agner	67258dea4a	Skip post-update health check when Core was not running on entry (#6821 ) PR #6726 removed the early return after a HomeAssistantError from the post-update get_config() call so that a Core that stopped responding after an update would correctly trigger a rollback. That early return was, however, also load-bearing for the backup restore flow: Backup.restore_homeassistant() stops and removes Core before invoking core.update(target_version) and starts Core later in its own await_home_assistant_restart stage. With Core not running, _update() correctly skips the start step, but the unconditional post-update get_config() now always raises, sets error_state, and triggers a spurious rollback that re-pulls the previous image and leaves the system on the wrong version after the restore completes. Return early from update() when Core was not running on entry. The caller is responsible for starting Core and there is no live API to health-check at this point. Genuine update failures (Core was running, update broke it) are unaffected and still roll back. Also rename the local rollback to rollback_version for clarity. 2026.05.0	2026-05-07 11:27:28 +02:00
Stefan Agner	44e0e5ee28	Enable Unix socket Core API by default on Core 2026.5.1+ (#6815 ) The UNIX_SOCKET_CORE_API feature flag has been the only way to opt into Unix socket communication between Supervisor and Home Assistant Core. Now that the implementation has settled, enable it by default for Core versions at or above 2026.5.1. Versions in the supported range below that (down to CORE_UNIX_SOCKET_MIN_VERSION) continue to require the feature flag, preserving the existing opt-in behavior for early dev builds. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 09:18:51 +02:00
dependabot[bot]	8666c4be77	Bump orjson from 3.11.8 to 3.11.9 (#6818 ) Bumps [orjson](https://github.com/ijl/orjson) from 3.11.8 to 3.11.9. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](https://github.com/ijl/orjson/compare/3.11.8...3.11.9) --- updated-dependencies: - dependency-name: orjson dependency-version: 3.11.9 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-07 09:14:49 +02:00
dependabot[bot]	8124f09f33	Bump sigstore/cosign-installer from 4.1.1 to 4.1.2 (#6817 ) Bumps [sigstore/cosign-installer](https://github.com/sigstore/cosign-installer) from 4.1.1 to 4.1.2. - [Release notes](https://github.com/sigstore/cosign-installer/releases) - [Commits](https://github.com/sigstore/cosign-installer/compare/cad07c2e89fa2edd6e2d7bab4c1aa38e53f76003...6f9f17788090df1f26f669e9d70d6ae9567deba6) --- updated-dependencies: - dependency-name: sigstore/cosign-installer dependency-version: 4.1.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-07 09:14:06 +02:00
Jan Čermák	d815c0922f	Flatten published Docker image (#6812 ) This flattens the Docker image from 9 layers to 5 by using multi-stage build that squashes layers into logical blocks. The first layer on top of the base image adds system-wide packages and uv (which is not updated so often - if it were, it may be wise to move it into the next or separate layer; it weights roughly 50 MB) which should be preserved between releases, while the next layer adds all Supervisor Python code and dependencies. This means that unless the base image or packages installed in the first stage are changed (or in other words, only Supervisor code is changed), only a single layer is pulled from the repository. Previously, it generally resulted in pull of all the following 4 layers, as just a change in the requirements invalidated the following layers. The fetched payload size remains roughly the same.	2026-05-06 12:04:32 +02:00
Stefan Agner	c772a9bbb0	Replace fixed-duration sleeps after bus events with gather (#6803 ) * Replace fixed-duration sleeps after bus events with gather Several tests use ``await asyncio.sleep(...)`` to "wait for the listener to run" after firing a bus event. The fixed duration is real wall-clock time and the wait can be indeterministic — if the handler chain happens to need slightly more time on a busy CI runner, the assertion races the handler. ``Bus.fire_event`` returns the listener tasks since #6252; capture and ``await asyncio.gather(tasks)`` instead of sleeping. Touches test_bus.py (the bus tests were poking scheduling instead of verifying their assertions), test_home_assistant_watchdog.py, test_plugin_base.py, addons/test_manager.py, docker/test_addon.py, and test_store_execute_reload.py. Other cleanups in the same spirit: - ``_fire_test_event`` in addons/test_addon.py becomes ``async def`` and gathers the listener tasks itself, so its 17 call sites collapse to a single ``await _fire_test_event(...)``. - The two test_store_execute_reload.py sites that used the private ``_update_connectivity()`` helper are reworked to set the cached connectivity flag directly and fire the event themselves so they can gather the listener tasks the same way. - The two ``sleep(1)`` post-pull drains in docker/test_interface.py collapse to ``sleep(0)`` (handler tasks are already gathered inside pull_image), saving ~2s. - The ``sleep(0.01)`` waits inside ``container_events()`` task bodies (api/test_addons.py, api/test_store.py, backups/test_manager.py) are just one-yield-to-the-parent and become ``sleep(0)``. Switching to ``gather`` exposes a few latent test mocks that were silently swallowing TypeErrors as background-task failures before: - ``CGroup.add_devices_allowed`` is ``async def`` but was patched as a plain MagicMock in docker/test_addon.py — now patched via ``new_callable=AsyncMock``. - The watchdog does ``await (await self.start())`` / ``await (await self.restart())`` because ``App.start`` / ``App.restart`` return ``asyncio.Task``. The mocks in addons/test_addon.py (test_app_watchdog, test_watchdog_on_stop, test_watchdog_during_attach) needed ``AsyncMock(return_value=<settled future>)`` to mirror that shape rather than a plain MagicMock. Factor bus.fire_event + gather pattern into a helper Per review feedback, the ``await asyncio.gather(*coresys.bus.fire_event(...))`` incantation was scattered across many call sites. Add ``tests.common.fire_bus_event`` that takes the coresys, event and data, fires the event and awaits the spawned listener tasks. Convert all matching sites to use it, including the ``_fire_test_event`` wrapper in addons/test_addon.py which now just builds the ``DockerContainerStateEvent`` and delegates.	2026-05-06 12:02:28 +02:00
Stefan Agner	ad1a9115d8	Improve and extend frontend probe after update with WebSocket check (#6811 ) * Improve and extend frontend probe after update with WebSocket check The post-update health check introduced in #6311 added HomeAssistantAPI.check_frontend_available, which fetched the frontend through the existing Supervisor-internal API connection to Core. Since #6742 that connection optionally runs over a Unix socket with no authentication, so the request no longer exercises the same transport, auth and routing path that an external HTTP client uses. Move the frontend probe out of HomeAssistantAPI into a small frontend_check module that talks to Core's TCP endpoints via the plain websession with no authentication, mirroring what an external client would see. While doing this, extend the post-update verification to also probe the WebSocket endpoint: open /api/websocket and confirm the first frame is the auth_required text message. This catches the kind of WebSocket breakage seen in #6802, where api/config still listed websocket_api as loaded and GET / still returned HTML, but the WebSocket handshake completed with an immediate close frame and the frontend was unusable. The component check now also requires "http" to be loaded, in addition to "frontend" and "websocket_api", and iterates so every missing component is logged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address review feedback on WebSocket probe - Wrap ws_connect in asyncio.wait_for so the handshake has an explicit bounded timeout (the global websession's default timeout would otherwise apply). - Validate that the auth_required payload is a JSON object before calling .get("type"); a list/string would otherwise raise AttributeError at runtime. - Add a regression test covering a non-dict JSON payload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 10:54:05 +02:00
dependabot[bot]	179c7f0c48	Bump gitpython from 3.1.49 to 3.1.50 (#6813 ) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.49 to 3.1.50. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.49...3.1.50) --- updated-dependencies: - dependency-name: gitpython dependency-version: 3.1.50 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-06 10:13:57 +02:00
Stefan Agner	b871e1ca61	Lower severity of WebSocket delivery failure messages to debug (#6805 ) The fire-and-forget _async_send_command path was raised from DEBUG to WARNING in #6725 for better visibility. In practice it's noisy during normal Core lifecycle events (restart, update): Supervisor fires supervisor_job_start/supervisor_job_end events towards Core while the container is intentionally not running, and each event logs a warning. The DEBUG line from the API layer just above ("Core container is not running") already explains the cause, so the WARNING just restates it. Synchronous async_send_command callers still see raised exceptions, so genuine failures that callers care about are not hidden. Restores the original DEBUG level introduced together with the raise-on-failure behavior in #6553. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 17:08:14 +02:00
Jan Čermák	2920194f16	Update Python to 3.14.4/base image to 3.14-alpine3.22-2026.04.0 (#6810 ) Update to the latest base image: * https://github.com/home-assistant/docker-base/releases/tag/2026.04.0 This also brings Python to 3.14.4, so update it in CI.	2026-05-05 17:05:58 +02:00
Mike Degatano	eb3c388618	Migrate persisted 'addon' field to 'app' in config files (#6786 ) * Migrate persisted 'addon' field to 'app' in discovery and services config Rename the 'addon' key to 'app' in persisted configuration files for discovery messages (discovery.json), service modules (services.json), and supervisor config (supervisor.json), as part of the broader addon->app terminology migration. Changes: - Add ATTR_ADDON = "addon" to const.py for V1 API compat/migration - Add ATTR_ADDONS_CUSTOM_LIST = "addons_custom_list" to const.py for migration - Change ATTR_APPS_CUSTOM_LIST value from "addons_custom_list" to "apps_custom_list" - Add _migrate_supervisor_config() schema pre-processor in validate.py to transparently load old supervisor.json files using the old key - Add ATTR_ADDON to services/const.py; change ATTR_APP value to "app" - Add _migrate_addon_to_app() pre-processors to MQTT, MySQL, and discovery schemas to load old config files that used the "addon" key - Rename Message.addon -> Message.app in Discovery and update all references - Keep hassio_push/discovery payload using "addon" key for HA compatibility - GET /services/{service} and GET /discovery: V1 returns "addon" key, V2 returns "app" key, via dedicated _v1 handler methods following the backups/store pattern, registered with AppVersion guards in _register_services() and _register_discovery() - Broaden FileConfiguration schema type annotation to accept vol.All validators in addition to vol.Schema Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add schema migration tests for addon->app config key rename Test that backwards-compatible migration of old 'addon'/'addons_custom_list' keys to 'app'/'apps_custom_list' works correctly in all affected schemas, and that the new keys are accepted without modification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add an __init__ to discovery tests * Add app_api_client_with_prefix fixture and update V1/V2 tests Move the app-level V1/V2 fixture to tests/api/conftest.py as app_api_client_with_prefix for use across any endpoint that requires app-level credentials (services_role, app.discovery, etc.). - Add app_api_client_with_prefix fixture to conftest.py - Update test_set_service_already_provided and test_del_service_not_provided to use app_api_client_with_prefix (covers both v1 and v2) - Add test_get_service_v1_v2_keys asserting addon/app key per version - Update test_api_discovery_forbidden, test_api_send_del_discovery, test_api_invalid_discovery to use app_api_client_with_prefix - Split test_discovery_not_found into test_discovery_not_found_get (uses api_client_with_prefix, GET requires homeassistant) and test_discovery_not_found_delete (uses app_api_client_with_prefix) - Add test_get_discovery_v1_v2_keys asserting addon/app key per version Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-05 11:18:47 +02:00
dependabot[bot]	da74f1be71	Bump sentry-sdk from 2.58.0 to 2.59.0 (#6800 ) Bumps [sentry-sdk](https://github.com/getsentry/sentry-python) from 2.58.0 to 2.59.0. - [Release notes](https://github.com/getsentry/sentry-python/releases) - [Changelog](https://github.com/getsentry/sentry-python/blob/master/CHANGELOG.md) - [Commits](https://github.com/getsentry/sentry-python/compare/2.58.0...2.59.0) --- updated-dependencies: - dependency-name: sentry-sdk dependency-version: 2.59.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-05 11:18:37 +02:00
dependabot[bot]	9e7e8acfa7	Bump cryptography from 47.0.0 to 48.0.0 (#6799 ) Bumps [cryptography](https://github.com/pyca/cryptography) from 47.0.0 to 48.0.0. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/47.0.0...48.0.0) --- updated-dependencies: - dependency-name: cryptography dependency-version: 48.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-05 11:17:32 +02:00
Stefan Agner	0de6d25fed	Drop legacy test classes in favor of module-level functions (#6796 ) Per CLAUDE.md, plain test_* functions are the project style; class- based test grouping is considered legacy. Convert the 24 test methods in test_pull_progress.py (TestLayerProgress, TestImagePullProgress) to module-level functions — none of them used self, so the rewrite is mechanical. Also rename three helper classes whose names accidentally matched pytest's Test* collection pattern, even though they are fakes/fixtures rather than test cases: - TestAddon -> FakeApp (data holder used as a fake App in pwned tests) - TestDockerInterface -> FakeDockerInterface (fixture/inner helper in docker tests) The two DBusServiceMock subclasses named TestInterface already had __test__ = False and are left alone.	2026-05-04 21:38:22 +02:00
Stefan Agner	75c39ed0d4	Default pytest --timeout=10 in pyproject.toml (#6797 ) CI and tox both passed ``--timeout=10`` explicitly, but a plain local ``pytest`` had no timeout — a hung asyncio task or stuck D-Bus signal handler could stall a developer's run indefinitely while passing CI. Move the timeout into ``[tool.pytest.ini_options]`` so it applies everywhere (pytest auto-discovers ``pyproject.toml`` in the repo root) and drop the now-redundant ``--timeout=10`` flags from ``ci.yaml`` and ``tox.ini``. The full suite already fits comfortably under 10s per test, and ``@pytest.mark.timeout(N)`` remains available for per-test overrides if a specific test ever needs more headroom.	2026-05-04 14:48:36 +02:00
Stefan Agner	f8dbafe0bb	Drop redundant @pytest.mark.asyncio decorators (#6795 ) The pytest config sets ``asyncio_mode = "auto"``, which already auto-marks every ``async def test_*`` as a coroutine test. The 38 ``@pytest.mark.asyncio`` decorators sprinkled across the suite were no-ops kept around from before that flag was set. Remove them along with the now-unused ``import pytest`` lines they were the only consumer of. Pure mechanical cleanup; no test behavior changes.	2026-05-04 14:48:18 +02:00
Stefan Agner	c3e7601ad0	Log when cidfile path cleanup fails (#6788 ) The cleanup of a leftover cidfile path before creating a new container silently suppressed any OSError from rmdir/unlink. When that cleanup fails (e.g. the path is a non-empty directory or still busy from a pending bind unmount), the subsequent touch() raises IsADirectoryError with no breadcrumb explaining why the path was in an unexpected state. Replace the bare suppress(OSError) with an explicit error log so the underlying failure is visible in the Supervisor log when the follow-up touch() blows up. Behavior is otherwise unchanged: a failed cleanup still falls through to touch() as before. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 01:04:56 +02:00
dependabot[bot]	0ad1016bdd	Bump release-drafter/release-drafter from 7.2.0 to 7.2.1 (#6787 ) Bumps [release-drafter/release-drafter](https://github.com/release-drafter/release-drafter) from 7.2.0 to 7.2.1. - [Release notes](https://github.com/release-drafter/release-drafter/releases) - [Commits](https://github.com/release-drafter/release-drafter/compare/5de93583980a40bd78603b6dfdcda5b4df377b32...563bf132657a13ded0b01fcb723c5a58cdd824e2) --- updated-dependencies: - dependency-name: release-drafter/release-drafter dependency-version: 7.2.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-30 10:28:24 +02:00
Stefan Agner	61faa73be5	Return proper API errors when backup mount is down (#6785 ) Follow-up on #6739: with HassioError now logged and captured by Sentry in api_process, BackupMountDownError surfaced as an "unexpected" 400 with a noisy log entry and a Sentry event (SUPERVISOR-1JXW), even though the user had simply asked to back up to a mount that was not currently available. Map this through properly so the API returns a clean, structured 400: - Make BackupMountDownError inherit from APIError, with error_key "backup_mount_down", message_template "Backup mount '{mount}' is down", and the mount name in extra_fields. Clients now get a normalized, translatable message and a stable key instead of the raw "<name> is down, cannot back-up to it" / "...cannot copy to it" strings. - Simplify both raise sites in BackupManager (_check_location and _copy_to_location) to just pass mount=. @api_process turns the result into a 400 without logging or Sentry capture, since this is now a modeled client-state error rather than an unexpected one. The mount being down is a runtime state issue users hit when their NAS/CIFS share is briefly unreachable, not a Supervisor bug worth paging on. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026.04.2	2026-04-29 22:13:11 +02:00
Stefan Agner	14d1b919f3	Queue main builder runs instead of cancelling (#6782 ) The builder workflow used a blanket `cancel-in-progress: true`, which is fine for PR runs but harmful on `main`: when several PRs merge in quick succession and one of them touches `requirements.txt`, the wheels publish step from the in-flight run gets killed mid-upload. Subsequent CI runs (and downstream consumers) then fail to install the wheels for the latest requirements. Scope `cancel-in-progress` to `pull_request` events so pushes to `main` queue behind each other through the existing concurrency group, while PRs still collapse to the latest commit as before.	2026-04-29 16:09:55 +02:00
Stefan Agner	2fcd29b39e	Fix test_core fixture after connectivity rework (#6783 ) #6765 renamed Supervisor.check_connectivity to check_and_update_connectivity, but the mocked_setup_loads fixture in tests/test_core.py still patched the old name. The patch.object call raised AttributeError at fixture setup, erroring out the test_setup_app_file_read_error_not_captured test before it could run. Update the patch target to the new method name so Core.setup() sees an AsyncMock for the connectivity probe again. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:53:48 +02:00
Stefan Agner	33ab5b55f8	Treat JobException as a client-side API error (#6777 ) * Treat JobException as a client-side API error Job condition guards (system not running, no free space, etc.) and concurrency rejections (another job in flight) raised by the @Job decorator are explicit precondition failures with descriptive messages, not unexpected errors. JobException inheriting HassioError directly meant api_process caught them in its HassioError branch — which since #6739 logs them as unexpected and captures them to Sentry. Inherit APIError instead so api_process surfaces these through its APIError branch with the original message and skips the unexpected-error path. Status stays at APIError's default 400, so the API contract is unchanged. Extended test_backup_immediate_errors to assert async_capture_exception is not called for the freeze and free-space condition guards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Silence too-many-ancestors on plugin job error mixins The plugin-specific job error subclasses (CliJobError, ObserverJobError, MulticastJobError, CoreDNSJobError, AudioJobError) cross pylint's too-many-ancestors threshold once JobException inherits APIError. Add the same `# pylint: disable=too-many-ancestors` already used on the ResolutionNotFound subclasses with similar diamond inheritance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Disable too-many-ancestors globally instead of per class The pylint config already disables every other too-many-* rule "for the sake of readability", but kept too-many-ancestors and forced inline disables on diamond-inherited exception classes (the ResolutionNotFound subclasses, and now five plugin job error mixins after the JobException APIError change). Add too-many-ancestors to the global disable list and drop all eight inline annotations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 10:21:13 +02:00
Stefan Agner	9923b8580b	Return proper API errors for invalid hostnames (#6776 ) Follow-up on #6739: with HassioError now logged and captured by Sentry in api_process, hostname rejections from systemd-hostnamed surfaced as "unexpected" 400s with noisy log entries and a Sentry event, even though the user had simply submitted an invalid hostname. Map this through properly so the API returns a clean, structured 400: - Split ErrorType.INVALID_ARGS out of DBusInterfaceMethodError into its own DBusInvalidArgsError. The two cases collapsed there before are semantically different: UNKNOWN_METHOD / INVALID_SIGNATURE mean the call is broken (method missing or types wrong); INVALID_ARGS means the call is valid but the service rejected an argument's value. - Add HostInvalidHostnameError(HostError, APIError) with error_key and extra_fields so clients get a normalized message and a stable key rather than systemd's raw "Invalid static hostname '...'" text. - Translate DBusInvalidArgsError to HostInvalidHostnameError in SystemControl.set_hostname. @api_process turns the result into a 400 without logging or Sentry capture, since this is now a modeled client-input error rather than an unexpected one. Validation continues to live in hostnamed (hostname_is_valid() in systemd's src/basic/hostname-util.c); Supervisor only translates the rejection. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 10:19:27 +02:00
Stefan Agner	0ac8b42062	Rework Supervisor connectivity check with coalescing and force flag (#6765 ) * Rework Supervisor connectivity check with coalescing and force flag Previously, a failed connectivity probe could strand Supervisor in a "no connectivity" state indefinitely. After an Ethernet reconnect, a probe kicked by NetworkManager's connectivity transition could race with CoreDNS being restarted (due to DNS locals changing), time out on DNS, and leave supervisor.connectivity = False. The retry that _on_dns_container_running was meant to fire landed inside the 5 s JobThrottle window from the just-failed probe and was silently dropped, since JobThrottle.THROTTLE drops rather than waits. The rework replaces the @Job(throttle=THROTTLE) decorator and the public connectivity setter with a single authoritative state-updating method: - check_and_update_connectivity(force=False) is the only path that runs the HTTP probe and updates the cached state. Concurrent callers coalesce onto a single in-flight probe. A min-interval throttle lives inside the method and reuses the cached result within window instead of dropping calls. - request_connectivity_check(force=False) is a fire-and-forget wrapper for signal handlers (D-Bus, plugin callbacks) that must return quickly without blocking signal dispatch on the HTTP round-trip. - force=True bypasses the min-interval and, when a probe is in flight, sets a trailing-rerun flag so the owning task runs one more probe after the current one completes. Used for signals that carry fresh state-change information (NM connectivity transition to FULL, DNS container RUNNING, startup, post-NTP sync). - _update_connectivity is the sole writer of the cached flag and emits SUPERVISOR_CONNECTIVITY_CHANGE only on actual transitions. Call sites migrate accordingly. The opportunistic supervisor.connectivity = False writes in update_apparmor, updater.fetch_data, os.manager, and addon_pwned error paths are replaced with request_connectivity_check() calls so the probe remains authoritative - an endpoint-specific failure no longer lies about the overall connectivity state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Propagate connectivity-probe cancellation and skip last-check on cancel Awaiting an asyncio.Task does not propagate cancellation INTO the task, so the previous owner-doesn't-shield comment was misleading: a cancelled owner left the spawned probe running orphaned, and the next caller could start a second probe alongside it. The owner now explicitly cancels and awaits the probe on CancelledError before re-raising. The last-check timestamp is also moved out of the finally block so a cancelled probe does not leave a "fresh result just ran" cache behind that would short-circuit the next non-forced caller. A regression test exercises both: that owner cancellation clears the in-flight reference and leaves the timestamp untouched, and that a subsequent non-forced check therefore still actually probes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Clarify why post-NTP-sync forces a connectivity probe The previous comment claimed the last-check timestamp may be unreliable after a time jump, but _connectivity_last_check uses loop.time() which is monotonic and unaffected by wall-clock corrections. The real reason to force a fresh probe is TLS validation: certificates that appeared expired or not-yet-valid before the system clock was corrected may now verify, so a probe that just failed with an SSL error can succeed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add debug logs to Supervisor connectivity probe paths The original stuck-offline bug was hard to spot in logs because the silent throttle-drop and the cached state had no audit trail. With debug-level logging at each decision point, a future investigation can reconstruct from a single log file: - who requested a check (force flag distinguishes signal-driven probes from precondition / opportunistic-error-path requests) - why a probe did not actually run (in-flight coalesce, cached within min-interval, owner cancellation) - when a forced rerun was queued and when it ran (the precise failure mode that stranded the supervisor in the original incident) - when the cached state actually flipped (with the previous value in the message so transitions are visible) All new lines are debug-level. The existing _do_connectivity_check "failed" / "succeeded" lines are kept unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Skip system-checks fan-out in test_events_on_issue_changes The test asserts that apply_suggestion fires an ISSUE_REMOVED event. ISSUE_REMOVED is fired by dismiss_issue inside FixupBase.__call__, before apply_suggestion calls healthcheck. The healthcheck call afterwards is incidental to this test's intent, but it fans out into check_system() which runs CheckDNSServer (A and AAAA) - real aiodns query_dns() probes against the NetworkManager mock's stub nameserver 192.168.30.1 that each hit the default ~10 s aiodns timeout. The file took ~21 s to run. The slowness has been latent since #3818 (Aug 2022), which added the apply_suggestion step at the end of test_events_on_issue_changes two days after the DNS check landed in its current form (#3811). The default 24 h JobThrottle on CheckDNSServer.run_check tends to mask the cost in full-suite runs once any earlier test has tripped the throttle, which is likely why this slipped through. Mock coresys.resolution.healthcheck for just this one apply_suggestion call rather than introducing a file-wide DNS mock. The patch is local to the slow call site and the test's assertion is unaffected. The file drops from ~21 s to ~2.5 s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 10:14:13 +02:00
Stefan Agner	309237a09e	Refactor Supervisor network reattach path (#6760 ) * Refactor Supervisor network reattach path On fresh startup the Supervisor Docker network is created and known plugin containers are re-attached. Plugin containers (observer, cli, dns, audio) legitimately don't exist yet at that point, which produced noisy ERROR lines before the exception was suppressed by the caller. - attach_container_by_name() now raises DockerNotFound silently on 404 and DockerError without implicit logging on other Docker API errors. - _create_supervisor_network() iterates all managed containers in a single loop using explicit try/except, replacing three separate suppress(DockerError) blocks. Missing containers are logged at DEBUG, unexpected Docker errors at ERROR. - Drop the alias argument on the reattach path. Docker adds the container name as an implicit network alias, and inter-container lookups go through ExtraHosts (/etc/hosts), not Docker DNS, so the explicit alias list was cosmetic and inconsistent with the first-create path anyway. - Consolidate AUDIO_DOCKER_NAME, CLI_DOCKER_NAME, DNS_DOCKER_NAME in supervisor/const.py alongside the existing OBSERVER_DOCKER_NAME and SUPERVISOR_DOCKER_NAME constants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Escalate network attach failures and handle Supervisor specially Pull the Supervisor container out of the reattach loop since it must exist — Supervisor is running the code. Any failure attaching it is a real problem, so log at CRITICAL with exc_info so Sentry captures the full traceback. For plugin containers, escalate non-404 errors from ERROR to CRITICAL (also with exc_info). A DockerError there typically means Docker itself is unhealthy, which affects the whole system and warrants a Sentry report. Missing plugin containers (DockerNotFound) continue to be a DEBUG log since they're expected on fresh install. Addresses review feedback on #6760. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 10:10:51 +02:00
Stefan Agner	641a4181d9	Handle add-on filesystem errors gracefully and reduce Sentry noise (#6707 ) * Handle add-on filesystem errors gracefully and reduce Sentry noise Add AddonFileReadError for add-on metadata read failures (long_description, refresh_path_cache) caused by filesystem errors like EBADMSG (errno 74). The new exception calls check_oserror() to mark the system unhealthy via the resolution system, then raises a translatable API error so callers get a proper error response instead of an unhandled OSError. Fixes SUPERVISOR-BC6 (548K events from the API path) and SUPERVISOR-BZJ (from the startup/load path). In core.py setup(), skip reporting exceptions to Sentry when the error has already been handled by the resolution system. This is detected by checking if a new unhealthy reason was added during the task execution (e.g. via check_oserror). In that case the user is already notified, so we log at error level (no stack trace) instead of critical (which would also send to Sentry via the LoggingIntegration) and skip the explicit capture_exception call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Skip Sentry capture for AppFileReadError in setup() Replace the unhealthy-state comparison logic with an explicit `except AppFileReadError` clause. The error is already reported to the user via the resolution system (check_oserror adds an unhealthy reason), so capturing it to Sentry just adds noise. Log at error level without stack trace instead of critical to avoid the LoggingIntegration picking it up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add tests for AppFileReadError and setup() Sentry handling Test that long_description and refresh_path_cache raise AppFileReadError and mark the system unhealthy for EBADMSG errors, and raise without marking unhealthy for other OSError types. Also test Core.setup() to verify AppFileReadError is handled without Sentry capture while other exceptions are captured as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 10:10:36 +02:00
dependabot[bot]	04a675b48d	Bump gitpython from 3.1.48 to 3.1.49 (#6781 ) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.48 to 3.1.49. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.48...3.1.49) --- updated-dependencies: - dependency-name: gitpython dependency-version: 3.1.49 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-29 10:10:14 +02:00
dependabot[bot]	ee1c6c48f2	Bump gitpython from 3.1.47 to 3.1.48 (#6779 ) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.47 to 3.1.48. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.47...3.1.48) --- updated-dependencies: - dependency-name: gitpython dependency-version: 3.1.48 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-28 09:06:46 +02:00
Stefan Agner	b9922aebc8	Remove legacy core_proxy middleware (#6778 ) The core_proxy middleware was Supervisor's side of the original Core-to-Supervisor proxy auth scheme: when pre-2023.3.4 Home Assistant Core forwarded a user-issued request to Supervisor under its privileged supervisor token, it tagged the request with X-Hass-User-ID and X-Hass-Is-Admin headers identifying the upstream user. Supervisor inspected those headers, plus a header adjacency heuristic fingerprinting aiohttp's proxy header layout, to distinguish forwarded requests from native Core calls and reject proxied requests that lacked user identity. Core 2023.3.4 (PR home-assistant/core#89379) replaced that scheme: Core now does the path-level gating itself before proxying and no longer sends the X-Hass-* headers, so the middleware short-circuits for any Core newer than that. With the 2-year Core support policy introduced in #6148, every supported installation is well past 2023.3.4, making the middleware unreachable in practice. Drop the middleware along with its now-unused supports: the _CORE_VERSION constant, the supervisor_frontend pattern field (a duplicate of the frontend asset list already exempted via no_security_check), and the AwesomeVersion / LANDINGPAGE / version_is_new_enough imports it relied on. The frontend asset bypass itself is unchanged — it still lives in no_security_check.	2026-04-27 19:54:48 -04:00
Mike Degatano	bc24fb5449	Refactor API registration to support v1/v2 via shared methods (#6769 ) * Refactor API registration to support v1/v2 via shared methods - Add AppVersion StrEnum (V1, V2) to supervisor/api/const.py - Replace self.v2_app with self._v2_app and expose a versions property (dict[AppVersion, web.Application]) computed dynamically so that test fixtures reassigning self.webapp are automatically reflected in V1 - All _register_* methods now accept a required app: web.Application parameter; version-specific routes are gated with "if app is self.versions[AppVersion.V1/V2]:" - load() loops over enabled_versions (V1 always, V2 when feature-flagged) and calls each registration method once per version, no duplication - Static resources are registered before webapp.add_subapp() to avoid registering into a frozen router - add_subapp uses self.webapp directly for readability - Fold _register_v2_apps/_register_v2_backups/_register_v2_store into their respective unified methods; remove the now-defunct _register_v2_* helpers and the _api_apps/_api_backups/_api_store instance vars - _register_proxy and _register_ingress updated to accept app; legacy /homeassistant/* proxy routes gated behind V1 conditional Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add dual v1/v2 parametrization to API tests All 163 tests across 17 API modules that register identically on both v1 and v2 now run against both versions via api_client_with_prefix. - tests/api/conftest.py: advanced_logs_tester switched to api_client_with_prefix so log-endpoint tests are auto-parametrized; accepts optional v2_path_prefix kwarg for paths that differ by version - tests/api/test_{auth,discovery,dns,docker,hardware,host,ingress, jobs,mounts,network,os,resolution,security,services,supervisor}.py: api_client -> api_client_with_prefix with path prefix unpacking - supervisor/api/__init__.py: _register_panel() moved outside the version loop -- frontend static assets are V1-only - tests/api/test_panel.py: kept on plain api_client (V1-only) Tests intentionally kept V1-only: - auth/discovery: use indirect api_client parametrize for addon context - homeassistant: all tests call legacy /homeassistant/* paths (V1-only) - jobs (4 tests): inner @Job-decorated classes register names into a module-level set; re-running the same test raises RuntimeError Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Extend dual v1/v2 parametrization to homeassistant and jobs tests tests/api/conftest.py: - Add core_api_client_with_root fixture parametrized over three paths: v1-core: /core/... (canonical v1 path) v1-legacy: /homeassistant/... (legacy v1 alias, same handlers) v2-core: /v2/core/... (canonical v2 path) tests/api/test_homeassistant.py: - Switch all 17 api_client tests to core_api_client_with_root so each test runs against all three access paths (v1 canonical, v1 legacy alias, v2 canonical), exercising every registered route tests/api/test_jobs.py: - Promote four inner TestClass definitions to module-level helpers (_JobsTreeTestHelper, _JobManualCleanupTestHelper, _JobsSortedTestHelper, _JobWithErrorTestHelper) so that @Job name registration into the global _JOB_NAMES set only happens once at import time rather than on each parametrized test run - Replace closure references to outer-scope coresys with self.coresys - Use api_client_with_prefix for dual-version coverage Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix typo Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-27 23:39:47 +02:00
dependabot[bot]	287aee22e6	Bump cryptography from 46.0.7 to 47.0.0 (#6774 ) Bumps [cryptography](https://github.com/pyca/cryptography) from 46.0.7 to 47.0.0. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/46.0.7...47.0.0) --- updated-dependencies: - dependency-name: cryptography dependency-version: 47.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-27 14:25:24 +02:00
dependabot[bot]	71c2200c59	Bump ruff from 0.15.11 to 0.15.12 (#6773 ) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-27 12:47:46 +02:00
Stefan Agner	61ca2524b2	Return proper API errors for mqtt/mysql service conflicts (#6767 ) * Return proper API errors for mqtt/mysql service conflicts After #6739 added unexpected-error logging and Sentry capture to the api_process wrappers, SUPERVISOR-1JTQ and SUPERVISOR-1JWM surfaced as user-triggered service conflicts that were being treated as unexpected errors: - POST /services/{mqtt,mysql} when another app already provides the service. - DELETE /services/{mqtt,mysql} when no app currently provides it. Both paths raised a generic ServicesError, which the API layer turned into an opaque HTTP 400 without a translation key, and which #6739 now also logs and captures via Sentry. Introduce ServiceAlreadyProvidedError (409 Conflict) and ServiceNotProvidedError (404 Not Found) as new-style API exceptions with translation keys and extra_fields, plus a shared APIConflict base class for future 409 responses. The mqtt and mysql service modules now raise these instead, so the API returns structured, translatable responses and these expected user conflicts stop being captured as bugs. Fixes SUPERVISOR-1JTQ Fixes SUPERVISOR-1JWM Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Don't log handled errors verbose Missing/already present service information are well handled errors with clear API responses. The client is supposed to handle these errors. No need to log verbosly. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 21:56:12 +02:00
Stefan Agner	4938fb215d	Improve Docker port-in-use detection and handling (#6766 ) Triaging SUPERVISOR-1JWK turned up a missed port conflict: RE_PORT_CONFLICT_ERROR only matched one of the Docker daemon's port-in-use message shapes. The two variants produced by current moby — "Bind for <ip>:<port> failed: port is already allocated" from portallocator and "failed to bind host port <ip>:<port>/<proto>: address already in use" from osallocator — fell through to DockerAPIError, got re-raised as AppUnknownError, and the watchdog shipped them to Sentry as unknown errors. Widen the regex to match all known shapes (including the older form embedding the container endpoint, still observed from older daemons and wrappers), anchored on the "failed to set up container networking" prefix and one of the "address already in use" or "port is already allocated" suffixes. Log the raw Docker message at debug level before converting, so curious users can still see the exact upstream text (host IP, container endpoint, protocol) when investigating which process is holding the port. The watchdog's _restart_after_problem now catches AppPortConflict explicitly ahead of the generic AppsError handler: log a warning, break the retry loop, do not call async_capture_exception. A port conflict is an environment condition — another process grabbed the port while the add-on was down — so retrying cannot make it succeed and reporting to Sentry is noise. With port conflicts now raised as typed APIError subclasses at the detection site, the DockerAPIError → format_message() rewrite fallback in api_return_error has no work left. Drop the fallback and delete supervisor/utils/log_format.py along with its tests; the module only ever handled port-conflict prose. Fixes SUPERVISOR-1JWK Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 21:55:18 +02:00
Stefan Agner	2011633946	Reuse check_exception_chain in Sentry filter, tighten its types (#6757 ) Switch the Sentry noise filter in filter_data to call the existing check_exception_chain helper instead of an inline loop. One shared utility for "does the chain contain this type" matches what the reviewer suggested and removes a bit of duplication. While touching check_exception_chain: - Walk __cause__ instead of __context__. __cause__ is what Python sets when code uses `raise B() from a`, which is the explicit "caused by" signal we actually want to match. __context__ can also include unrelated in-flight exceptions from surrounding except blocks. Every existing call site in Supervisor uses `raise X from err`, which sets both attributes, so switching is behaviour-preserving for all current callers. - Replace the `Any` type of object_type with `type[BaseException] \| tuple[type[BaseException], ...]`, which is what isinstance/issubclass actually accept and lets mypy catch misuse at the call site. - Replace `issubclass(type(err), object_type)` with `isinstance`, which is the idiomatic form and honours virtual subclasses. Review feedback from #6732. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 21:42:41 +02:00
Stefan Agner	ff90e4b817	Fix UnboundLocalError when Core API fails post-update (#6761 ) When get_config() raised HomeAssistantError after a Core update, the except block set error_state and fell through to the frontend check, which referenced an unbound `data` variable and raised UnboundLocalError. That aborted the update with a JobException and skipped the rollback path entirely. Move the frontend checks into an else branch of the try/except so they only run when get_config() succeeds. When it fails, error_state is set and control falls through to the rollback logic below, which is what PR #6726 intended. Fixes SUPERVISOR-1JVX Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026.04.1	2026-04-23 15:05:40 +02:00
Copilot	91625db2b1	Stop Supervisor from overriding NetworkManager Wi-Fi powersave policy (#6753 ) * Initial plan * Fix wireless profile generation to not force powersave ignore Agent-Logs-Url: https://github.com/home-assistant/supervisor/sessions/6e2e9288-6d9b-403d-9d71-8d6ea44eb91b Co-authored-by: agners <34061+agners@users.noreply.github.com> * Set wireless powersave to default to reset existing profiles Agent-Logs-Url: https://github.com/home-assistant/supervisor/sessions/4a4a2c09-0cdd-4417-9776-688837b51dcc Co-authored-by: agners <34061+agners@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: agners <34061+agners@users.noreply.github.com>	2026-04-23 10:06:49 +02:00
Stefan Agner	814bcc447d	Run Update version job only if version is published (#6758 )	2026-04-22 10:38:34 +02:00
dependabot[bot]	9203c09f53	Bump mypy from 1.20.1 to 1.20.2 (#6756 ) Bumps [mypy](https://github.com/python/mypy) from 1.20.1 to 1.20.2. - [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md) - [Commits](https://github.com/python/mypy/compare/v1.20.1...v1.20.2) --- updated-dependencies: - dependency-name: mypy dependency-version: 1.20.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-22 09:11:11 +02:00
dependabot[bot]	b791e97d0a	Bump pre-commit from 4.5.1 to 4.6.0 (#6755 ) Bumps [pre-commit](https://github.com/pre-commit/pre-commit) from 4.5.1 to 4.6.0. - [Release notes](https://github.com/pre-commit/pre-commit/releases) - [Changelog](https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md) - [Commits](https://github.com/pre-commit/pre-commit/compare/v4.5.1...v4.6.0) --- updated-dependencies: - dependency-name: pre-commit dependency-version: 4.6.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-22 08:56:39 +02:00
dependabot[bot]	a6792f78d4	Bump gitpython from 3.1.46 to 3.1.47 (#6754 ) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.46 to 3.1.47. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.46...3.1.47) --- updated-dependencies: - dependency-name: gitpython dependency-version: 3.1.47 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-22 08:44:54 +02:00
Stefan Agner	97bc19d4b3	Detect container registry rate limits uniformly (#6732 ) * Detect container registry rate limits uniformly Container registry rate limits reach Supervisor in three distinct shapes: 1. HTTP 429 from the daemon - recognised today, but the exception and resolution issue are hardcoded to Docker Hub. Since Core/Supervisor/ plugin images all live on ghcr.io now, virtually every 429 we see in the field is actually a GHCR throttle that we mislabel. The biggest Sentry issue (SUPERVISOR-16BK) has >115k events / >93k users, all pulling a ghcr.io image, yet each user is told to "log into Docker Hub". 2. HTTP 500 with 'toomanyrequests' in the body - not recognised. Docker daemons before 28.3.0 wrap upstream 429s as 500 (fixed upstream by moby/moby 23fa0ae74a, "Cleanup http status error checks"). The large fleet on older daemons still produces this shape. 3. JSON error event during a streaming pull - not recognised. Once the daemon starts writing the 200 OK response body the status is locked in, so rate limits that land during layer download arrive as plain text in the pull stream. Happens on all recent daemon versions - SUPERVISOR-13FQ (>16k events) and SUPERVISOR-13E0 (>8k events) are two large examples. Cases 2 and 3 propagate as plain DockerError, bypass the 429 detection in install() entirely, never produce a DOCKER_RATELIMIT resolution issue, and generate large amounts of Sentry noise. Case 1 is detected but routes every GHCR 429 through Docker-Hub-specific messaging and suggestions. Changes: - Add DockerRegistryRateLimitExceeded as the common base class and GithubContainerRegistryRateLimitExceeded alongside the existing DockerHubRateLimitExceeded. All extend APITooManyRequests so callers and retry logic can key off a single type. - Add GITHUB_RATELIMIT IssueType so GHCR failures don't show the "log in to Docker Hub" suggestion that DOCKER_RATELIMIT carries. - PullLogEntry.exception now maps stream errors containing 'toomanyrequests' to DockerRegistryRateLimitExceeded (case 3). - docker/interface.py:install() routes all three cases through a single _registry_rate_limit_exception() helper that picks the right issue type, suggestion and exception subclass based on the image's registry. - utils/sentry.py filters APITooManyRequests (and anything wrapping it via __cause__) in capture_exception / async_capture_exception. One point of policy, every caller benefits. Callers (supervisor.update(), plugin manager, homeassistant core) are unchanged - UPDATE_FAILED issues still get created alongside the registry-specific rate limit issue, giving users the full picture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Consolidate Sentry noise filtering in one before_send hook Move the APITooManyRequests filter from capture_exception / async_capture_exception wrappers into the existing filter_data before_send hook in supervisor/misc/filter.py, alongside the AddonConfigurationError filter. One isinstance tuple check instead of multiple layers, and every path that reaches Sentry (including logging-integration and excepthook captures, not just our explicit wrappers) now gets the same treatment. The filter walks the __cause__ chain so wrapped rate-limit errors (e.g. DockerHubRateLimitExceeded inside SupervisorUpdateError) still get filtered. A debug log is emitted on each dropped event for observability. Review feedback from mdegat01 on #6732. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Drop GITHUB_RATELIMIT resolution issue There is no actionable remediation for a GHCR rate limit - logging in doesn't lift the quota the way it does for Docker Hub, and the cap is on the authenticated account anyway. A resolution issue that just tells the user "you were rate limited" adds UI noise without helping them. Keep the GithubContainerRegistryRateLimitExceeded exception - retry logic and the Sentry filter still key off it - but don't create a resolution issue. A log entry from the exception constructor is sufficient. Docker Hub still gets DOCKER_RATELIMIT + registry-login suggestion since that is actionable. Review feedback on #6732. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 07:49:01 +02:00
Stefan Agner	53f84ec15b	Bump devcontainer to 6 (#6747 ) With devcontainer 6 dbus-daemon is installed in the container, which is required for tests. The latest version also has support to disable AppArmor using the `SUPERVISOR_UNCONFINED` environment variable.	2026-04-21 16:52:28 +02:00
Stefan Agner	d431526b14	Fix unhandled WebSocket handshake errors and unnecessary token refresh (#6725 ) Raise HomeAssistantWSConnectionError instead of HomeAssistantAPIError for WebSocket handshake failures. The broader HomeAssistantAPIError was not caught by the fire-and-forget send path which only catches HomeAssistantWSError, resulting in "Task exception was never retrieved" errors when Core's WebSocket endpoint isn't ready. Additionally, narrow the retry catch in connect_websocket from HomeAssistantAPIError to HomeAssistantAuthError. The broad catch caused connection errors (not auth failures) to trigger unnecessary token refreshes and retries, spamming "Updated Home Assistant API token" logs. Also raise the log level for failed fire-and-forget WebSocket commands from debug to warning for better visibility. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 16:33:36 +02:00
Stefan Agner	ff2cdbfc36	Log unexpected errors in api_process wrappers (#6739 ) * Log unexpected errors in api_process wrappers The `api_process` and `api_process_raw` decorators silently swallowed any `HassioError` that bubbled up from endpoint handlers, returning `"Unknown error, see Supervisor logs"` to the caller while logging nothing. This made the response message actively misleading: e.g. when an endpoint touching D-Bus hit `DBusNotConnectedError` (raised without a message by `@dbus_connected`), Core would surface `SupervisorBadRequestError: Unknown error, see Supervisor logs` and the Supervisor logs would contain no trace of it. Log the caught `HassioError` with traceback before delegating to `api_return_error` so the "see Supervisor logs" hint is actually actionable. The `APIError` branch is left alone — those carry explicit status codes and messages set by Supervisor code and are already visible in the response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Capture unexpected API errors to Sentry Non-APIError HassioError exceptions reaching api_process indicate missing error handling in the endpoint handler. In addition to the logging added in the previous commit, also send these to Sentry so they surface as actionable issues rather than silently returning "Unknown error, see Supervisor logs" to the caller. * Drop capture exception from set_boot_slot --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 16:04:39 +02:00
Stefan Agner	7fb621234e	Add Unix socket support for Core communication with feature flag (#6742 ) * Use Unix socket for Supervisor to Core communication Reintroduce Unix socket support for Supervisor-to-Core communication (reverted in #6735) with the addition of a feature flag gate. The feature is now controlled by the `core_unix_socket` feature flag and disabled by default. When enabled and Core version supports it, Supervisor communicates with Core via a Unix socket at /run/os/core.sock instead of TCP. This eliminates the need for access token authentication on the socket path, as Core authenticates the peer by the socket connection itself. Key changes: - Add FeatureFlag.CORE_UNIX_SOCKET to gate the feature - HomeAssistantAPI: transport-aware session/url/websocket management - WSClient: separate connect() (Unix, no auth) and connect_with_auth() (TCP) class methods with proper error handling - APIProxy delegates websocket setup to api.connect_websocket() - Container state tracking for Unix session lifecycle - CI builder mounts /run/supervisor for integration tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Sort feature flags alphabetically * Drop per-call max_msg_size from WSClient Hardcode the WebSocket message size cap to 64 MB in WSClient and remove the parameter from WSClient.connect, connect_with_auth, _ws_connect, and HomeAssistantAPI.connect_websocket. This was only ever overridden by APIProxy, so threading it through four layers was unnecessary. max_msg_size is a cap, not a pre-allocation; aiohttp only grows buffers to the size of actual incoming messages. Supervisor's own control channel never approaches 64 MB, so unifying the limit has no runtime cost. Addresses review feedback on #6742. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 15:03:05 +02:00
Mike Degatano	56abe94d74	Add versioned v2 API with apps terminology (#6741 ) * Add versioned v2 API with apps terminology Introduce a v2 API sub-app mounted at /v2 that uses 'apps' terminology throughout, while keeping v1 fully backward-compatible. Key changes: - Add ATTR_ADDONS = 'addons' constant alongside ATTR_APPS = 'apps' so backup file data (which must remain 'addons' for backward compat) and v2 API responses can use distinct constants - Add FeatureFlag.SUPERVISOR_V2_API to gate v2 route registration - Mount aiohttp sub-app at /v2 in RestAPI.load() when flag is enabled - Add _AppSecurityPatterns frozen dataclass and _V1_PATTERNS/_V2_PATTERNS with strict per-version regex sets (no cross-version matching) - Add _register_v2_apps, _register_v2_backups, _register_v2_store route registration methods - Add v1 thin wrapper methods (_v1) for all affected endpoints so business logic lives in the canonical v2 methods - Extract _info_data() helper in APIApps so v1 closure can bypass @api_process and still catch APIAppNotInstalled for store routing - Add _rename_apps_to_addons_in_backups(), _process_location_in_body(), _all_store_apps_info() shared helpers to eliminate duplication - Add api_client_v2, api_client_with_prefix, app_api_client_with_root, store_app_api_client_with_root parameterized test fixtures - Add test_v2_api_disabled_without_feature_flag - Parameterize backup, addons, and store tests to cover both v1 and v2 paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Fix pylint false positive for re.Pattern C extension methods re.Pattern methods (match, search, etc.) are C extension methods. Pylint cannot detect them via static analysis when re.Pattern is used as a type annotation in a dataclass field, producing false E1101 no-member errors. Add generated-members to inform pylint these members exist. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * pylint and feedback fixes * Copilot suggested fixes * Minor feedback fixes --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-20 21:19:27 +02:00
Stefan Agner	38ddb3df54	Fix Core update rollback: delay image cleanup and fix missing rollback path (#6726 ) * Delay old image cleanup until after health checks on Core update Move the old Docker image cleanup from inside _update() to after the post-update health checks (frontend loaded and accessible). This keeps the previous version's image available locally when a rollback is needed, avoiding a potentially slow re-download. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add test assertions for old image cleanup timing on Core update Verify that the old Docker image is cleaned up only after health checks pass, and not when a rollback is triggered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix missing rollback when get_config fails after Core update The early return after setting error_state skipped the rollback block, leaving the system on a broken new version when the API stopped responding after update. The other health check failure paths correctly fall through to the rollback logic; this was the only one that didn't. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 10:57:13 +02:00

1 2 3 4 5 ...

5642 Commits