1
0
mirror of https://github.com/home-assistant/supervisor.git synced 2026-05-18 21:58:52 +01:00
Commit Graph

5642 Commits

Author SHA1 Message Date
dependabot[bot] 39f8a3d116 Bump urllib3 from 2.6.3 to 2.7.0 (#6822)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.6.3 to 2.7.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.6.3...2.7.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-08 10:49:07 +02:00
dependabot[bot] 5141178e7c Bump types-pyyaml from 6.0.12.20260408 to 6.0.12.20260508 (#6823)
Bumps [types-pyyaml](https://github.com/python/typeshed) from 6.0.12.20260408 to 6.0.12.20260508.
- [Commits](https://github.com/python/typeshed/commits)

---
updated-dependencies:
- dependency-name: types-pyyaml
  dependency-version: 6.0.12.20260508
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-08 09:57:42 +02:00
Stefan Agner 67258dea4a Skip post-update health check when Core was not running on entry (#6821)
PR #6726 removed the early return after a HomeAssistantError from the
post-update get_config() call so that a Core that stopped responding
after an update would correctly trigger a rollback. That early return
was, however, also load-bearing for the backup restore flow:
Backup.restore_homeassistant() stops and removes Core before invoking
core.update(target_version) and starts Core later in its own
await_home_assistant_restart stage. With Core not running, _update()
correctly skips the start step, but the unconditional post-update
get_config() now always raises, sets error_state, and triggers a
spurious rollback that re-pulls the previous image and leaves the
system on the wrong version after the restore completes.

Return early from update() when Core was not running on entry. The
caller is responsible for starting Core and there is no live API to
health-check at this point. Genuine update failures (Core was running,
update broke it) are unaffected and still roll back.

Also rename the local rollback to rollback_version for clarity.
2026.05.0
2026-05-07 11:27:28 +02:00
Stefan Agner 44e0e5ee28 Enable Unix socket Core API by default on Core 2026.5.1+ (#6815)
The UNIX_SOCKET_CORE_API feature flag has been the only way to opt into
Unix socket communication between Supervisor and Home Assistant Core.
Now that the implementation has settled, enable it by default for Core
versions at or above 2026.5.1. Versions in the supported range below
that (down to CORE_UNIX_SOCKET_MIN_VERSION) continue to require the
feature flag, preserving the existing opt-in behavior for early dev
builds.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 09:18:51 +02:00
dependabot[bot] 8666c4be77 Bump orjson from 3.11.8 to 3.11.9 (#6818)
Bumps [orjson](https://github.com/ijl/orjson) from 3.11.8 to 3.11.9.
- [Release notes](https://github.com/ijl/orjson/releases)
- [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md)
- [Commits](https://github.com/ijl/orjson/compare/3.11.8...3.11.9)

---
updated-dependencies:
- dependency-name: orjson
  dependency-version: 3.11.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-07 09:14:49 +02:00
dependabot[bot] 8124f09f33 Bump sigstore/cosign-installer from 4.1.1 to 4.1.2 (#6817)
Bumps [sigstore/cosign-installer](https://github.com/sigstore/cosign-installer) from 4.1.1 to 4.1.2.
- [Release notes](https://github.com/sigstore/cosign-installer/releases)
- [Commits](https://github.com/sigstore/cosign-installer/compare/cad07c2e89fa2edd6e2d7bab4c1aa38e53f76003...6f9f17788090df1f26f669e9d70d6ae9567deba6)

---
updated-dependencies:
- dependency-name: sigstore/cosign-installer
  dependency-version: 4.1.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-07 09:14:06 +02:00
Jan Čermák d815c0922f Flatten published Docker image (#6812)
This flattens the Docker image from 9 layers to 5 by using multi-stage
build that squashes layers into logical blocks. The first layer on top
of the base image adds system-wide packages and uv (which is not updated
so often - if it were, it may be wise to move it into the next or
separate layer; it weights roughly 50 MB) which should be preserved
between releases, while the next layer adds all Supervisor Python code
and dependencies.

This means that unless the base image or packages installed in the first
stage are changed (or in other words, only Supervisor code is changed),
only a single layer is pulled from the repository. Previously, it
generally resulted in pull of all the following 4 layers, as just a
change in the requirements invalidated the following layers. The fetched
payload size remains roughly the same.
2026-05-06 12:04:32 +02:00
Stefan Agner c772a9bbb0 Replace fixed-duration sleeps after bus events with gather (#6803)
* Replace fixed-duration sleeps after bus events with gather

Several tests use ``await asyncio.sleep(...)`` to "wait for the
listener to run" after firing a bus event. The fixed duration is
real wall-clock time and the wait can be indeterministic — if the
handler chain happens to need slightly more time on a busy CI
runner, the assertion races the handler.

``Bus.fire_event`` returns the listener tasks since #6252; capture
and ``await asyncio.gather(*tasks)`` instead of sleeping. Touches
test_bus.py (the bus tests were poking scheduling instead of
verifying their assertions), test_home_assistant_watchdog.py,
test_plugin_base.py, addons/test_manager.py, docker/test_addon.py,
and test_store_execute_reload.py.

Other cleanups in the same spirit:

- ``_fire_test_event`` in addons/test_addon.py becomes ``async def``
  and gathers the listener tasks itself, so its 17 call sites
  collapse to a single ``await _fire_test_event(...)``.
- The two test_store_execute_reload.py sites that used the private
  ``_update_connectivity()`` helper are reworked to set the cached
  connectivity flag directly and fire the event themselves so they
  can gather the listener tasks the same way.
- The two ``sleep(1)`` post-pull drains in docker/test_interface.py
  collapse to ``sleep(0)`` (handler tasks are already gathered
  inside pull_image), saving ~2s.
- The ``sleep(0.01)`` waits inside ``container_events()`` task
  bodies (api/test_addons.py, api/test_store.py,
  backups/test_manager.py) are just one-yield-to-the-parent and
  become ``sleep(0)``.

Switching to ``gather`` exposes a few latent test mocks that were
silently swallowing TypeErrors as background-task failures before:

- ``CGroup.add_devices_allowed`` is ``async def`` but was patched
  as a plain MagicMock in docker/test_addon.py — now patched via
  ``new_callable=AsyncMock``.
- The watchdog does ``await (await self.start())`` /
  ``await (await self.restart())`` because ``App.start`` /
  ``App.restart`` return ``asyncio.Task``. The mocks in
  addons/test_addon.py (test_app_watchdog, test_watchdog_on_stop,
  test_watchdog_during_attach) needed
  ``AsyncMock(return_value=<settled future>)`` to mirror that
  shape rather than a plain MagicMock.

* Factor bus.fire_event + gather pattern into a helper

Per review feedback, the ``await asyncio.gather(*coresys.bus.fire_event(...))``
incantation was scattered across many call sites. Add
``tests.common.fire_bus_event`` that takes the coresys, event and data,
fires the event and awaits the spawned listener tasks. Convert all
matching sites to use it, including the ``_fire_test_event`` wrapper
in addons/test_addon.py which now just builds the
``DockerContainerStateEvent`` and delegates.
2026-05-06 12:02:28 +02:00
Stefan Agner ad1a9115d8 Improve and extend frontend probe after update with WebSocket check (#6811)
* Improve and extend frontend probe after update with WebSocket check

The post-update health check introduced in #6311 added
HomeAssistantAPI.check_frontend_available, which fetched the frontend
through the existing Supervisor-internal API connection to Core.
Since #6742 that connection optionally runs over a Unix socket with
no authentication, so the request no longer exercises the same
transport, auth and routing path that an external HTTP client uses.

Move the frontend probe out of HomeAssistantAPI into a small
frontend_check module that talks to Core's TCP endpoints via the
plain websession with no authentication, mirroring what an external
client would see.

While doing this, extend the post-update verification to also probe
the WebSocket endpoint: open /api/websocket and confirm the first
frame is the auth_required text message. This catches the kind of
WebSocket breakage seen in #6802, where api/config still listed
websocket_api as loaded and GET / still returned HTML, but the
WebSocket handshake completed with an immediate close frame and the
frontend was unusable.

The component check now also requires "http" to be loaded, in
addition to "frontend" and "websocket_api", and iterates so every
missing component is logged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address review feedback on WebSocket probe

- Wrap ws_connect in asyncio.wait_for so the handshake has an explicit
  bounded timeout (the global websession's default timeout would
  otherwise apply).
- Validate that the auth_required payload is a JSON object before
  calling .get("type"); a list/string would otherwise raise
  AttributeError at runtime.
- Add a regression test covering a non-dict JSON payload.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 10:54:05 +02:00
dependabot[bot] 179c7f0c48 Bump gitpython from 3.1.49 to 3.1.50 (#6813)
Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.49 to 3.1.50.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.49...3.1.50)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-version: 3.1.50
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-06 10:13:57 +02:00
Stefan Agner b871e1ca61 Lower severity of WebSocket delivery failure messages to debug (#6805)
The fire-and-forget _async_send_command path was raised from DEBUG to
WARNING in #6725 for better visibility. In practice it's noisy during
normal Core lifecycle events (restart, update): Supervisor fires
supervisor_job_start/supervisor_job_end events towards Core while the
container is intentionally not running, and each event logs a warning.
The DEBUG line from the API layer just above ("Core container is not
running") already explains the cause, so the WARNING just restates it.

Synchronous async_send_command callers still see raised exceptions, so
genuine failures that callers care about are not hidden. Restores the
original DEBUG level introduced together with the raise-on-failure
behavior in #6553.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:08:14 +02:00
Jan Čermák 2920194f16 Update Python to 3.14.4/base image to 3.14-alpine3.22-2026.04.0 (#6810)
Update to the latest base image:
* https://github.com/home-assistant/docker-base/releases/tag/2026.04.0

This also brings Python to 3.14.4, so update it in CI.
2026-05-05 17:05:58 +02:00
Mike Degatano eb3c388618 Migrate persisted 'addon' field to 'app' in config files (#6786)
* Migrate persisted 'addon' field to 'app' in discovery and services config

Rename the 'addon' key to 'app' in persisted configuration files for
discovery messages (discovery.json), service modules (services.json),
and supervisor config (supervisor.json), as part of the broader
addon->app terminology migration.

Changes:
- Add ATTR_ADDON = "addon" to const.py for V1 API compat/migration
- Add ATTR_ADDONS_CUSTOM_LIST = "addons_custom_list" to const.py for migration
- Change ATTR_APPS_CUSTOM_LIST value from "addons_custom_list" to "apps_custom_list"
- Add _migrate_supervisor_config() schema pre-processor in validate.py to
  transparently load old supervisor.json files using the old key
- Add ATTR_ADDON to services/const.py; change ATTR_APP value to "app"
- Add _migrate_addon_to_app() pre-processors to MQTT, MySQL, and discovery
  schemas to load old config files that used the "addon" key
- Rename Message.addon -> Message.app in Discovery and update all references
- Keep hassio_push/discovery payload using "addon" key for HA compatibility
- GET /services/{service} and GET /discovery: V1 returns "addon" key,
  V2 returns "app" key, via dedicated _v1 handler methods following the
  backups/store pattern, registered with AppVersion guards in
  _register_services() and _register_discovery()
- Broaden FileConfiguration schema type annotation to accept vol.All
  validators in addition to vol.Schema

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add schema migration tests for addon->app config key rename

Test that backwards-compatible migration of old 'addon'/'addons_custom_list'
keys to 'app'/'apps_custom_list' works correctly in all affected schemas,
and that the new keys are accepted without modification.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add an __init__ to discovery tests

* Add app_api_client_with_prefix fixture and update V1/V2 tests

Move the app-level V1/V2 fixture to tests/api/conftest.py as
app_api_client_with_prefix for use across any endpoint that requires
app-level credentials (services_role, app.discovery, etc.).

- Add app_api_client_with_prefix fixture to conftest.py
- Update test_set_service_already_provided and test_del_service_not_provided
  to use app_api_client_with_prefix (covers both v1 and v2)
- Add test_get_service_v1_v2_keys asserting addon/app key per version
- Update test_api_discovery_forbidden, test_api_send_del_discovery,
  test_api_invalid_discovery to use app_api_client_with_prefix
- Split test_discovery_not_found into test_discovery_not_found_get
  (uses api_client_with_prefix, GET requires homeassistant) and
  test_discovery_not_found_delete (uses app_api_client_with_prefix)
- Add test_get_discovery_v1_v2_keys asserting addon/app key per version

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-05 11:18:47 +02:00
dependabot[bot] da74f1be71 Bump sentry-sdk from 2.58.0 to 2.59.0 (#6800)
Bumps [sentry-sdk](https://github.com/getsentry/sentry-python) from 2.58.0 to 2.59.0.
- [Release notes](https://github.com/getsentry/sentry-python/releases)
- [Changelog](https://github.com/getsentry/sentry-python/blob/master/CHANGELOG.md)
- [Commits](https://github.com/getsentry/sentry-python/compare/2.58.0...2.59.0)

---
updated-dependencies:
- dependency-name: sentry-sdk
  dependency-version: 2.59.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-05 11:18:37 +02:00
dependabot[bot] 9e7e8acfa7 Bump cryptography from 47.0.0 to 48.0.0 (#6799)
Bumps [cryptography](https://github.com/pyca/cryptography) from 47.0.0 to 48.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/47.0.0...48.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 48.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-05 11:17:32 +02:00
Stefan Agner 0de6d25fed Drop legacy test classes in favor of module-level functions (#6796)
Per CLAUDE.md, plain test_* functions are the project style; class-
based test grouping is considered legacy. Convert the 24 test methods
in test_pull_progress.py (TestLayerProgress, TestImagePullProgress)
to module-level functions — none of them used self, so the rewrite is
mechanical.

Also rename three helper classes whose names accidentally matched
pytest's Test* collection pattern, even though they are fakes/fixtures
rather than test cases:
- TestAddon  -> FakeApp (data holder used as a fake App in pwned tests)
- TestDockerInterface -> FakeDockerInterface (fixture/inner helper in
  docker tests)

The two DBusServiceMock subclasses named TestInterface already had
__test__ = False and are left alone.
2026-05-04 21:38:22 +02:00
Stefan Agner 75c39ed0d4 Default pytest --timeout=10 in pyproject.toml (#6797)
CI and tox both passed ``--timeout=10`` explicitly, but a plain local
``pytest`` had no timeout — a hung asyncio task or stuck D-Bus signal
handler could stall a developer's run indefinitely while passing CI.

Move the timeout into ``[tool.pytest.ini_options]`` so it applies
everywhere (pytest auto-discovers ``pyproject.toml`` in the repo
root) and drop the now-redundant ``--timeout=10`` flags from
``ci.yaml`` and ``tox.ini``. The full suite already fits comfortably
under 10s per test, and ``@pytest.mark.timeout(N)`` remains
available for per-test overrides if a specific test ever needs more
headroom.
2026-05-04 14:48:36 +02:00
Stefan Agner f8dbafe0bb Drop redundant @pytest.mark.asyncio decorators (#6795)
The pytest config sets ``asyncio_mode = "auto"``, which already
auto-marks every ``async def test_*`` as a coroutine test. The 38
``@pytest.mark.asyncio`` decorators sprinkled across the suite were
no-ops kept around from before that flag was set. Remove them along
with the now-unused ``import pytest`` lines they were the only
consumer of.

Pure mechanical cleanup; no test behavior changes.
2026-05-04 14:48:18 +02:00
Stefan Agner c3e7601ad0 Log when cidfile path cleanup fails (#6788)
The cleanup of a leftover cidfile path before creating a new container
silently suppressed any OSError from rmdir/unlink. When that cleanup
fails (e.g. the path is a non-empty directory or still busy from a
pending bind unmount), the subsequent touch() raises IsADirectoryError
with no breadcrumb explaining why the path was in an unexpected state.

Replace the bare suppress(OSError) with an explicit error log so the
underlying failure is visible in the Supervisor log when the follow-up
touch() blows up. Behavior is otherwise unchanged: a failed cleanup
still falls through to touch() as before.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 01:04:56 +02:00
dependabot[bot] 0ad1016bdd Bump release-drafter/release-drafter from 7.2.0 to 7.2.1 (#6787)
Bumps [release-drafter/release-drafter](https://github.com/release-drafter/release-drafter) from 7.2.0 to 7.2.1.
- [Release notes](https://github.com/release-drafter/release-drafter/releases)
- [Commits](https://github.com/release-drafter/release-drafter/compare/5de93583980a40bd78603b6dfdcda5b4df377b32...563bf132657a13ded0b01fcb723c5a58cdd824e2)

---
updated-dependencies:
- dependency-name: release-drafter/release-drafter
  dependency-version: 7.2.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-30 10:28:24 +02:00
Stefan Agner 61faa73be5 Return proper API errors when backup mount is down (#6785)
Follow-up on #6739: with HassioError now logged and captured by Sentry
in api_process, BackupMountDownError surfaced as an "unexpected" 400
with a noisy log entry and a Sentry event (SUPERVISOR-1JXW), even
though the user had simply asked to back up to a mount that was not
currently available.

Map this through properly so the API returns a clean, structured 400:

- Make BackupMountDownError inherit from APIError, with error_key
  "backup_mount_down", message_template "Backup mount '{mount}' is
  down", and the mount name in extra_fields. Clients now get a
  normalized, translatable message and a stable key instead of the raw
  "<name> is down, cannot back-up to it" / "...cannot copy to it"
  strings.
- Simplify both raise sites in BackupManager (_check_location and
  _copy_to_location) to just pass mount=. @api_process turns the
  result into a 400 without logging or Sentry capture, since this is
  now a modeled client-state error rather than an unexpected one.

The mount being down is a runtime state issue users hit when their
NAS/CIFS share is briefly unreachable, not a Supervisor bug worth
paging on.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026.04.2
2026-04-29 22:13:11 +02:00
Stefan Agner 14d1b919f3 Queue main builder runs instead of cancelling (#6782)
The builder workflow used a blanket `cancel-in-progress: true`, which is
fine for PR runs but harmful on `main`: when several PRs merge in quick
succession and one of them touches `requirements.txt`, the wheels
publish step from the in-flight run gets killed mid-upload. Subsequent
CI runs (and downstream consumers) then fail to install the wheels for
the latest requirements.

Scope `cancel-in-progress` to `pull_request` events so pushes to `main`
queue behind each other through the existing concurrency group, while
PRs still collapse to the latest commit as before.
2026-04-29 16:09:55 +02:00
Stefan Agner 2fcd29b39e Fix test_core fixture after connectivity rework (#6783)
#6765 renamed Supervisor.check_connectivity to
check_and_update_connectivity, but the mocked_setup_loads fixture in
tests/test_core.py still patched the old name. The patch.object call
raised AttributeError at fixture setup, erroring out the
test_setup_app_file_read_error_not_captured test before it could run.

Update the patch target to the new method name so Core.setup() sees an
AsyncMock for the connectivity probe again.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:53:48 +02:00
Stefan Agner 33ab5b55f8 Treat JobException as a client-side API error (#6777)
* Treat JobException as a client-side API error

Job condition guards (system not running, no free space, etc.) and
concurrency rejections (another job in flight) raised by the @Job
decorator are explicit precondition failures with descriptive messages,
not unexpected errors. JobException inheriting HassioError directly
meant api_process caught them in its HassioError branch — which since
#6739 logs them as unexpected and captures them to Sentry.

Inherit APIError instead so api_process surfaces these through its
APIError branch with the original message and skips the
unexpected-error path. Status stays at APIError's default 400, so the
API contract is unchanged.

Extended test_backup_immediate_errors to assert async_capture_exception
is not called for the freeze and free-space condition guards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Silence too-many-ancestors on plugin job error mixins

The plugin-specific job error subclasses (CliJobError, ObserverJobError,
MulticastJobError, CoreDNSJobError, AudioJobError) cross pylint's
too-many-ancestors threshold once JobException inherits APIError. Add
the same `# pylint: disable=too-many-ancestors` already used on the
ResolutionNotFound subclasses with similar diamond inheritance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Disable too-many-ancestors globally instead of per class

The pylint config already disables every other too-many-* rule "for the
sake of readability", but kept too-many-ancestors and forced inline
disables on diamond-inherited exception classes (the ResolutionNotFound
subclasses, and now five plugin job error mixins after the JobException
APIError change).

Add too-many-ancestors to the global disable list and drop all eight
inline annotations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:21:13 +02:00
Stefan Agner 9923b8580b Return proper API errors for invalid hostnames (#6776)
Follow-up on #6739: with HassioError now logged and captured by Sentry
in api_process, hostname rejections from systemd-hostnamed surfaced as
"unexpected" 400s with noisy log entries and a Sentry event, even
though the user had simply submitted an invalid hostname.

Map this through properly so the API returns a clean, structured 400:

- Split ErrorType.INVALID_ARGS out of DBusInterfaceMethodError into its
  own DBusInvalidArgsError. The two cases collapsed there before are
  semantically different: UNKNOWN_METHOD / INVALID_SIGNATURE mean the
  call is broken (method missing or types wrong); INVALID_ARGS means
  the call is valid but the service rejected an argument's value.
- Add HostInvalidHostnameError(HostError, APIError) with error_key and
  extra_fields so clients get a normalized message and a stable key
  rather than systemd's raw "Invalid static hostname '...'" text.
- Translate DBusInvalidArgsError to HostInvalidHostnameError in
  SystemControl.set_hostname. @api_process turns the result into a 400
  without logging or Sentry capture, since this is now a modeled
  client-input error rather than an unexpected one.

Validation continues to live in hostnamed (hostname_is_valid() in
systemd's src/basic/hostname-util.c); Supervisor only translates the
rejection.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:19:27 +02:00
Stefan Agner 0ac8b42062 Rework Supervisor connectivity check with coalescing and force flag (#6765)
* Rework Supervisor connectivity check with coalescing and force flag

Previously, a failed connectivity probe could strand Supervisor in a
"no connectivity" state indefinitely. After an Ethernet reconnect, a
probe kicked by NetworkManager's connectivity transition could race
with CoreDNS being restarted (due to DNS locals changing), time out on
DNS, and leave supervisor.connectivity = False. The retry that
_on_dns_container_running was meant to fire landed inside the 5 s
JobThrottle window from the just-failed probe and was silently dropped,
since JobThrottle.THROTTLE drops rather than waits.

The rework replaces the @Job(throttle=THROTTLE) decorator and the
public connectivity setter with a single authoritative state-updating
method:

- check_and_update_connectivity(force=False) is the only path that
  runs the HTTP probe and updates the cached state. Concurrent callers
  coalesce onto a single in-flight probe. A min-interval throttle
  lives inside the method and reuses the cached result within window
  instead of dropping calls.
- request_connectivity_check(force=False) is a fire-and-forget wrapper
  for signal handlers (D-Bus, plugin callbacks) that must return
  quickly without blocking signal dispatch on the HTTP round-trip.
- force=True bypasses the min-interval and, when a probe is in flight,
  sets a trailing-rerun flag so the owning task runs one more probe
  after the current one completes. Used for signals that carry fresh
  state-change information (NM connectivity transition to FULL, DNS
  container RUNNING, startup, post-NTP sync).
- _update_connectivity is the sole writer of the cached flag and
  emits SUPERVISOR_CONNECTIVITY_CHANGE only on actual transitions.

Call sites migrate accordingly. The opportunistic
supervisor.connectivity = False writes in update_apparmor,
updater.fetch_data, os.manager, and addon_pwned error paths are
replaced with request_connectivity_check() calls so the probe remains
authoritative - an endpoint-specific failure no longer lies about the
overall connectivity state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Propagate connectivity-probe cancellation and skip last-check on cancel

Awaiting an asyncio.Task does not propagate cancellation INTO the task,
so the previous owner-doesn't-shield comment was misleading: a cancelled
owner left the spawned probe running orphaned, and the next caller could
start a second probe alongside it. The owner now explicitly cancels and
awaits the probe on CancelledError before re-raising.

The last-check timestamp is also moved out of the finally block so a
cancelled probe does not leave a "fresh result just ran" cache behind
that would short-circuit the next non-forced caller.

A regression test exercises both: that owner cancellation clears the
in-flight reference and leaves the timestamp untouched, and that a
subsequent non-forced check therefore still actually probes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Clarify why post-NTP-sync forces a connectivity probe

The previous comment claimed the last-check timestamp may be unreliable
after a time jump, but _connectivity_last_check uses loop.time() which
is monotonic and unaffected by wall-clock corrections. The real reason
to force a fresh probe is TLS validation: certificates that appeared
expired or not-yet-valid before the system clock was corrected may now
verify, so a probe that just failed with an SSL error can succeed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add debug logs to Supervisor connectivity probe paths

The original stuck-offline bug was hard to spot in logs because the
silent throttle-drop and the cached state had no audit trail. With
debug-level logging at each decision point, a future investigation can
reconstruct from a single log file:

- who requested a check (force flag distinguishes signal-driven probes
  from precondition / opportunistic-error-path requests)
- why a probe did not actually run (in-flight coalesce, cached within
  min-interval, owner cancellation)
- when a forced rerun was queued and when it ran (the precise failure
  mode that stranded the supervisor in the original incident)
- when the cached state actually flipped (with the previous value in
  the message so transitions are visible)

All new lines are debug-level. The existing _do_connectivity_check
"failed" / "succeeded" lines are kept unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Skip system-checks fan-out in test_events_on_issue_changes

The test asserts that apply_suggestion fires an ISSUE_REMOVED event.
ISSUE_REMOVED is fired by dismiss_issue inside FixupBase.__call__, before
apply_suggestion calls healthcheck. The healthcheck call afterwards is
incidental to this test's intent, but it fans out into check_system()
which runs CheckDNSServer (A and AAAA) - real aiodns query_dns() probes
against the NetworkManager mock's stub nameserver 192.168.30.1 that each
hit the default ~10 s aiodns timeout. The file took ~21 s to run.

The slowness has been latent since #3818 (Aug 2022), which added the
apply_suggestion step at the end of test_events_on_issue_changes two
days after the DNS check landed in its current form (#3811). The default
24 h JobThrottle on CheckDNSServer.run_check tends to mask the cost in
full-suite runs once any earlier test has tripped the throttle, which is
likely why this slipped through.

Mock coresys.resolution.healthcheck for just this one apply_suggestion
call rather than introducing a file-wide DNS mock. The patch is local to
the slow call site and the test's assertion is unaffected. The file
drops from ~21 s to ~2.5 s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:14:13 +02:00
Stefan Agner 309237a09e Refactor Supervisor network reattach path (#6760)
* Refactor Supervisor network reattach path

On fresh startup the Supervisor Docker network is created and known
plugin containers are re-attached. Plugin containers (observer, cli,
dns, audio) legitimately don't exist yet at that point, which produced
noisy ERROR lines before the exception was suppressed by the caller.

- attach_container_by_name() now raises DockerNotFound silently on 404
  and DockerError without implicit logging on other Docker API errors.
- _create_supervisor_network() iterates all managed containers in a
  single loop using explicit try/except, replacing three separate
  suppress(DockerError) blocks. Missing containers are logged at DEBUG,
  unexpected Docker errors at ERROR.
- Drop the alias argument on the reattach path. Docker adds the
  container name as an implicit network alias, and inter-container
  lookups go through ExtraHosts (/etc/hosts), not Docker DNS, so the
  explicit alias list was cosmetic and inconsistent with the
  first-create path anyway.
- Consolidate AUDIO_DOCKER_NAME, CLI_DOCKER_NAME, DNS_DOCKER_NAME in
  supervisor/const.py alongside the existing OBSERVER_DOCKER_NAME and
  SUPERVISOR_DOCKER_NAME constants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Escalate network attach failures and handle Supervisor specially

Pull the Supervisor container out of the reattach loop since it must
exist — Supervisor is running the code. Any failure attaching it is a
real problem, so log at CRITICAL with exc_info so Sentry captures the
full traceback.

For plugin containers, escalate non-404 errors from ERROR to CRITICAL
(also with exc_info). A DockerError there typically means Docker
itself is unhealthy, which affects the whole system and warrants a
Sentry report. Missing plugin containers (DockerNotFound) continue to
be a DEBUG log since they're expected on fresh install.

Addresses review feedback on #6760.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:10:51 +02:00
Stefan Agner 641a4181d9 Handle add-on filesystem errors gracefully and reduce Sentry noise (#6707)
* Handle add-on filesystem errors gracefully and reduce Sentry noise

Add AddonFileReadError for add-on metadata read failures (long_description,
refresh_path_cache) caused by filesystem errors like EBADMSG (errno 74).
The new exception calls check_oserror() to mark the system unhealthy via
the resolution system, then raises a translatable API error so callers
get a proper error response instead of an unhandled OSError.

Fixes SUPERVISOR-BC6 (548K events from the API path) and
SUPERVISOR-BZJ (from the startup/load path).

In core.py setup(), skip reporting exceptions to Sentry when the error
has already been handled by the resolution system. This is detected by
checking if a new unhealthy reason was added during the task execution
(e.g. via check_oserror). In that case the user is already notified, so
we log at error level (no stack trace) instead of critical (which would
also send to Sentry via the LoggingIntegration) and skip the explicit
capture_exception call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Skip Sentry capture for AppFileReadError in setup()

Replace the unhealthy-state comparison logic with an explicit
`except AppFileReadError` clause. The error is already reported to
the user via the resolution system (check_oserror adds an unhealthy
reason), so capturing it to Sentry just adds noise.

Log at error level without stack trace instead of critical to avoid
the LoggingIntegration picking it up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add tests for AppFileReadError and setup() Sentry handling

Test that long_description and refresh_path_cache raise AppFileReadError
and mark the system unhealthy for EBADMSG errors, and raise without
marking unhealthy for other OSError types.

Also test Core.setup() to verify AppFileReadError is handled without
Sentry capture while other exceptions are captured as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 10:10:36 +02:00
dependabot[bot] 04a675b48d Bump gitpython from 3.1.48 to 3.1.49 (#6781)
Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.48 to 3.1.49.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.48...3.1.49)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-version: 3.1.49
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-29 10:10:14 +02:00
dependabot[bot] ee1c6c48f2 Bump gitpython from 3.1.47 to 3.1.48 (#6779)
Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.47 to 3.1.48.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.47...3.1.48)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-version: 3.1.48
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-28 09:06:46 +02:00
Stefan Agner b9922aebc8 Remove legacy core_proxy middleware (#6778)
The core_proxy middleware was Supervisor's side of the original
Core-to-Supervisor proxy auth scheme: when pre-2023.3.4 Home Assistant
Core forwarded a user-issued request to Supervisor under its
privileged supervisor token, it tagged the request with X-Hass-User-ID
and X-Hass-Is-Admin headers identifying the upstream user. Supervisor
inspected those headers, plus a header adjacency heuristic
fingerprinting aiohttp's proxy header layout, to distinguish forwarded
requests from native Core calls and reject proxied requests that
lacked user identity.

Core 2023.3.4 (PR home-assistant/core#89379) replaced that scheme:
Core now does the path-level gating itself before proxying and no
longer sends the X-Hass-* headers, so the middleware short-circuits
for any Core newer than that. With the 2-year Core support policy
introduced in #6148, every supported installation is well past
2023.3.4, making the middleware unreachable in practice.

Drop the middleware along with its now-unused supports: the
_CORE_VERSION constant, the supervisor_frontend pattern field (a
duplicate of the frontend asset list already exempted via
no_security_check), and the AwesomeVersion / LANDINGPAGE /
version_is_new_enough imports it relied on. The frontend asset bypass
itself is unchanged — it still lives in no_security_check.
2026-04-27 19:54:48 -04:00
Mike Degatano bc24fb5449 Refactor API registration to support v1/v2 via shared methods (#6769)
* Refactor API registration to support v1/v2 via shared methods

- Add AppVersion StrEnum (V1, V2) to supervisor/api/const.py
- Replace self.v2_app with self._v2_app and expose a versions property
  (dict[AppVersion, web.Application]) computed dynamically so that test
  fixtures reassigning self.webapp are automatically reflected in V1
- All _register_* methods now accept a required app: web.Application
  parameter; version-specific routes are gated with
  "if app is self.versions[AppVersion.V1/V2]:"
- load() loops over enabled_versions (V1 always, V2 when feature-flagged)
  and calls each registration method once per version, no duplication
- Static resources are registered before webapp.add_subapp() to avoid
  registering into a frozen router
- add_subapp uses self.webapp directly for readability
- Fold _register_v2_apps/_register_v2_backups/_register_v2_store into
  their respective unified methods; remove the now-defunct _register_v2_*
  helpers and the _api_apps/_api_backups/_api_store instance vars
- _register_proxy and _register_ingress updated to accept app; legacy
  /homeassistant/* proxy routes gated behind V1 conditional

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add dual v1/v2 parametrization to API tests

All 163 tests across 17 API modules that register identically on both
v1 and v2 now run against both versions via api_client_with_prefix.

- tests/api/conftest.py: advanced_logs_tester switched to
  api_client_with_prefix so log-endpoint tests are auto-parametrized;
  accepts optional v2_path_prefix kwarg for paths that differ by version
- tests/api/test_{auth,discovery,dns,docker,hardware,host,ingress,
  jobs,mounts,network,os,resolution,security,services,supervisor}.py:
  api_client -> api_client_with_prefix with path prefix unpacking
- supervisor/api/__init__.py: _register_panel() moved outside the
  version loop -- frontend static assets are V1-only
- tests/api/test_panel.py: kept on plain api_client (V1-only)

Tests intentionally kept V1-only:
- auth/discovery: use indirect api_client parametrize for addon context
- homeassistant: all tests call legacy /homeassistant/* paths (V1-only)
- jobs (4 tests): inner @Job-decorated classes register names into a
  module-level set; re-running the same test raises RuntimeError

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Extend dual v1/v2 parametrization to homeassistant and jobs tests

tests/api/conftest.py:
- Add core_api_client_with_root fixture parametrized over three paths:
    v1-core:   /core/...          (canonical v1 path)
    v1-legacy: /homeassistant/... (legacy v1 alias, same handlers)
    v2-core:   /v2/core/...       (canonical v2 path)

tests/api/test_homeassistant.py:
- Switch all 17 api_client tests to core_api_client_with_root so each
  test runs against all three access paths (v1 canonical, v1 legacy
  alias, v2 canonical), exercising every registered route

tests/api/test_jobs.py:
- Promote four inner TestClass definitions to module-level helpers
  (_JobsTreeTestHelper, _JobManualCleanupTestHelper,
  _JobsSortedTestHelper, _JobWithErrorTestHelper) so that @Job name
  registration into the global _JOB_NAMES set only happens once at
  import time rather than on each parametrized test run
- Replace closure references to outer-scope coresys with self.coresys
- Use api_client_with_prefix for dual-version coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix typo

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-27 23:39:47 +02:00
dependabot[bot] 287aee22e6 Bump cryptography from 46.0.7 to 47.0.0 (#6774)
Bumps [cryptography](https://github.com/pyca/cryptography) from 46.0.7 to 47.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/46.0.7...47.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 47.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-27 14:25:24 +02:00
dependabot[bot] 71c2200c59 Bump ruff from 0.15.11 to 0.15.12 (#6773)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-27 12:47:46 +02:00
Stefan Agner 61ca2524b2 Return proper API errors for mqtt/mysql service conflicts (#6767)
* Return proper API errors for mqtt/mysql service conflicts

After #6739 added unexpected-error logging and Sentry capture to the
api_process wrappers, SUPERVISOR-1JTQ and SUPERVISOR-1JWM surfaced as
user-triggered service conflicts that were being treated as unexpected
errors:

- POST /services/{mqtt,mysql} when another app already provides the
  service.
- DELETE /services/{mqtt,mysql} when no app currently provides it.

Both paths raised a generic ServicesError, which the API layer turned
into an opaque HTTP 400 without a translation key, and which #6739 now
also logs and captures via Sentry.

Introduce ServiceAlreadyProvidedError (409 Conflict) and
ServiceNotProvidedError (404 Not Found) as new-style API exceptions with
translation keys and extra_fields, plus a shared APIConflict base class
for future 409 responses. The mqtt and mysql service modules now raise
these instead, so the API returns structured, translatable responses and
these expected user conflicts stop being captured as bugs.

Fixes SUPERVISOR-1JTQ
Fixes SUPERVISOR-1JWM

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Don't log handled errors verbose

Missing/already present service information are well handled errors with
clear API responses. The client is supposed to handle these errors. No
need to log verbosly.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 21:56:12 +02:00
Stefan Agner 4938fb215d Improve Docker port-in-use detection and handling (#6766)
Triaging SUPERVISOR-1JWK turned up a missed port conflict:
RE_PORT_CONFLICT_ERROR only matched one of the Docker daemon's
port-in-use message shapes. The two variants produced by current moby
— "Bind for <ip>:<port> failed: port is already allocated" from
portallocator and "failed to bind host port <ip>:<port>/<proto>:
address already in use" from osallocator — fell through to
DockerAPIError, got re-raised as AppUnknownError, and the watchdog
shipped them to Sentry as unknown errors.

Widen the regex to match all known shapes (including the older form
embedding the container endpoint, still observed from older daemons
and wrappers), anchored on the "failed to set up container networking"
prefix and one of the "address already in use" or "port is already
allocated" suffixes. Log the raw Docker message at debug level before
converting, so curious users can still see the exact upstream text
(host IP, container endpoint, protocol) when investigating which
process is holding the port.

The watchdog's _restart_after_problem now catches AppPortConflict
explicitly ahead of the generic AppsError handler: log a warning,
break the retry loop, do not call async_capture_exception. A port
conflict is an environment condition — another process grabbed the
port while the add-on was down — so retrying cannot make it succeed
and reporting to Sentry is noise.

With port conflicts now raised as typed APIError subclasses at the
detection site, the DockerAPIError → format_message() rewrite fallback
in api_return_error has no work left. Drop the fallback and delete
supervisor/utils/log_format.py along with its tests; the module only
ever handled port-conflict prose.

Fixes SUPERVISOR-1JWK

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 21:55:18 +02:00
Stefan Agner 2011633946 Reuse check_exception_chain in Sentry filter, tighten its types (#6757)
Switch the Sentry noise filter in filter_data to call the existing
check_exception_chain helper instead of an inline loop. One shared
utility for "does the chain contain this type" matches what the
reviewer suggested and removes a bit of duplication.

While touching check_exception_chain:

- Walk __cause__ instead of __context__. __cause__ is what Python sets
  when code uses `raise B() from a`, which is the explicit "caused by"
  signal we actually want to match. __context__ can also include
  unrelated in-flight exceptions from surrounding except blocks.
  Every existing call site in Supervisor uses `raise X from err`, which
  sets both attributes, so switching is behaviour-preserving for all
  current callers.
- Replace the `Any` type of object_type with
  `type[BaseException] | tuple[type[BaseException], ...]`, which is
  what isinstance/issubclass actually accept and lets mypy catch
  misuse at the call site.
- Replace `issubclass(type(err), object_type)` with `isinstance`, which
  is the idiomatic form and honours virtual subclasses.

Review feedback from #6732.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 21:42:41 +02:00
Stefan Agner ff90e4b817 Fix UnboundLocalError when Core API fails post-update (#6761)
When get_config() raised HomeAssistantError after a Core update, the
except block set error_state and fell through to the frontend check,
which referenced an unbound `data` variable and raised UnboundLocalError.
That aborted the update with a JobException and skipped the rollback
path entirely.

Move the frontend checks into an else branch of the try/except so they
only run when get_config() succeeds. When it fails, error_state is set
and control falls through to the rollback logic below, which is what
PR #6726 intended.

Fixes SUPERVISOR-1JVX

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026.04.1
2026-04-23 15:05:40 +02:00
Copilot 91625db2b1 Stop Supervisor from overriding NetworkManager Wi-Fi powersave policy (#6753)
* Initial plan

* Fix wireless profile generation to not force powersave ignore

Agent-Logs-Url: https://github.com/home-assistant/supervisor/sessions/6e2e9288-6d9b-403d-9d71-8d6ea44eb91b

Co-authored-by: agners <34061+agners@users.noreply.github.com>

* Set wireless powersave to default to reset existing profiles

Agent-Logs-Url: https://github.com/home-assistant/supervisor/sessions/4a4a2c09-0cdd-4417-9776-688837b51dcc

Co-authored-by: agners <34061+agners@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: agners <34061+agners@users.noreply.github.com>
2026-04-23 10:06:49 +02:00
Stefan Agner 814bcc447d Run Update version job only if version is published (#6758) 2026-04-22 10:38:34 +02:00
dependabot[bot] 9203c09f53 Bump mypy from 1.20.1 to 1.20.2 (#6756)
Bumps [mypy](https://github.com/python/mypy) from 1.20.1 to 1.20.2.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.20.1...v1.20.2)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.20.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-22 09:11:11 +02:00
dependabot[bot] b791e97d0a Bump pre-commit from 4.5.1 to 4.6.0 (#6755)
Bumps [pre-commit](https://github.com/pre-commit/pre-commit) from 4.5.1 to 4.6.0.
- [Release notes](https://github.com/pre-commit/pre-commit/releases)
- [Changelog](https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md)
- [Commits](https://github.com/pre-commit/pre-commit/compare/v4.5.1...v4.6.0)

---
updated-dependencies:
- dependency-name: pre-commit
  dependency-version: 4.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-22 08:56:39 +02:00
dependabot[bot] a6792f78d4 Bump gitpython from 3.1.46 to 3.1.47 (#6754)
Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.46 to 3.1.47.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.46...3.1.47)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-version: 3.1.47
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-22 08:44:54 +02:00
Stefan Agner 97bc19d4b3 Detect container registry rate limits uniformly (#6732)
* Detect container registry rate limits uniformly

Container registry rate limits reach Supervisor in three distinct shapes:

  1. HTTP 429 from the daemon - recognised today, but the exception and
     resolution issue are hardcoded to Docker Hub. Since Core/Supervisor/
     plugin images all live on ghcr.io now, virtually every 429 we see in
     the field is actually a GHCR throttle that we mislabel. The biggest
     Sentry issue (SUPERVISOR-16BK) has >115k events / >93k users, all
     pulling a ghcr.io image, yet each user is told to "log into
     Docker Hub".
  2. HTTP 500 with 'toomanyrequests' in the body - not recognised. Docker
     daemons before 28.3.0 wrap upstream 429s as 500 (fixed upstream by
     moby/moby 23fa0ae74a, "Cleanup http status error checks"). The large
     fleet on older daemons still produces this shape.
  3. JSON error event during a streaming pull - not recognised. Once the
     daemon starts writing the 200 OK response body the status is locked
     in, so rate limits that land during layer download arrive as plain
     text in the pull stream. Happens on all recent daemon versions -
     SUPERVISOR-13FQ (>16k events) and SUPERVISOR-13E0 (>8k events) are
     two large examples.

Cases 2 and 3 propagate as plain DockerError, bypass the 429 detection in
install() entirely, never produce a DOCKER_RATELIMIT resolution issue, and
generate large amounts of Sentry noise. Case 1 is detected but routes
every GHCR 429 through Docker-Hub-specific messaging and suggestions.

Changes:

- Add DockerRegistryRateLimitExceeded as the common base class and
  GithubContainerRegistryRateLimitExceeded alongside the existing
  DockerHubRateLimitExceeded. All extend APITooManyRequests so callers
  and retry logic can key off a single type.
- Add GITHUB_RATELIMIT IssueType so GHCR failures don't show the
  "log in to Docker Hub" suggestion that DOCKER_RATELIMIT carries.
- PullLogEntry.exception now maps stream errors containing
  'toomanyrequests' to DockerRegistryRateLimitExceeded (case 3).
- docker/interface.py:install() routes all three cases through a single
  _registry_rate_limit_exception() helper that picks the right issue
  type, suggestion and exception subclass based on the image's registry.
- utils/sentry.py filters APITooManyRequests (and anything wrapping it
  via __cause__) in capture_exception / async_capture_exception. One
  point of policy, every caller benefits.

Callers (supervisor.update(), plugin manager, homeassistant core) are
unchanged - UPDATE_FAILED issues still get created alongside the
registry-specific rate limit issue, giving users the full picture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Consolidate Sentry noise filtering in one before_send hook

Move the APITooManyRequests filter from capture_exception /
async_capture_exception wrappers into the existing filter_data
before_send hook in supervisor/misc/filter.py, alongside the
AddonConfigurationError filter.

One isinstance tuple check instead of multiple layers, and every path
that reaches Sentry (including logging-integration and excepthook
captures, not just our explicit wrappers) now gets the same treatment.
The filter walks the __cause__ chain so wrapped rate-limit errors
(e.g. DockerHubRateLimitExceeded inside SupervisorUpdateError) still
get filtered. A debug log is emitted on each dropped event for
observability.

Review feedback from mdegat01 on #6732.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Drop GITHUB_RATELIMIT resolution issue

There is no actionable remediation for a GHCR rate limit - logging in
doesn't lift the quota the way it does for Docker Hub, and the cap is
on the authenticated account anyway. A resolution issue that just tells
the user "you were rate limited" adds UI noise without helping them.

Keep the GithubContainerRegistryRateLimitExceeded exception - retry
logic and the Sentry filter still key off it - but don't create a
resolution issue. A log entry from the exception constructor is
sufficient. Docker Hub still gets DOCKER_RATELIMIT + registry-login
suggestion since that is actionable.

Review feedback on #6732.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 07:49:01 +02:00
Stefan Agner 53f84ec15b Bump devcontainer to 6 (#6747)
With devcontainer 6 dbus-daemon is installed in the container, which
is required for tests. The latest version also has support to disable
AppArmor using the `SUPERVISOR_UNCONFINED` environment variable.
2026-04-21 16:52:28 +02:00
Stefan Agner d431526b14 Fix unhandled WebSocket handshake errors and unnecessary token refresh (#6725)
Raise HomeAssistantWSConnectionError instead of HomeAssistantAPIError
for WebSocket handshake failures. The broader HomeAssistantAPIError was
not caught by the fire-and-forget send path which only catches
HomeAssistantWSError, resulting in "Task exception was never retrieved"
errors when Core's WebSocket endpoint isn't ready.

Additionally, narrow the retry catch in connect_websocket from
HomeAssistantAPIError to HomeAssistantAuthError. The broad catch caused
connection errors (not auth failures) to trigger unnecessary token
refreshes and retries, spamming "Updated Home Assistant API token" logs.

Also raise the log level for failed fire-and-forget WebSocket commands
from debug to warning for better visibility.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 16:33:36 +02:00
Stefan Agner ff2cdbfc36 Log unexpected errors in api_process wrappers (#6739)
* Log unexpected errors in api_process wrappers

The `api_process` and `api_process_raw` decorators silently swallowed
any `HassioError` that bubbled up from endpoint handlers, returning
`"Unknown error, see Supervisor logs"` to the caller while logging
nothing. This made the response message actively misleading: e.g. when
an endpoint touching D-Bus hit `DBusNotConnectedError` (raised without
a message by `@dbus_connected`), Core would surface
`SupervisorBadRequestError: Unknown error, see Supervisor logs` and
the Supervisor logs would contain no trace of it.

Log the caught `HassioError` with traceback before delegating to
`api_return_error` so the "see Supervisor logs" hint is actually
actionable. The `APIError` branch is left alone — those carry explicit
status codes and messages set by Supervisor code and are already
visible in the response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Capture unexpected API errors to Sentry

Non-APIError HassioError exceptions reaching api_process indicate
missing error handling in the endpoint handler. In addition to the
logging added in the previous commit, also send these to Sentry so
they surface as actionable issues rather than silently returning
"Unknown error, see Supervisor logs" to the caller.

* Drop capture exception from set_boot_slot

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 16:04:39 +02:00
Stefan Agner 7fb621234e Add Unix socket support for Core communication with feature flag (#6742)
* Use Unix socket for Supervisor to Core communication

Reintroduce Unix socket support for Supervisor-to-Core communication
(reverted in #6735) with the addition of a feature flag gate. The
feature is now controlled by the `core_unix_socket` feature flag and
disabled by default.

When enabled and Core version supports it, Supervisor communicates with
Core via a Unix socket at /run/os/core.sock instead of TCP. This
eliminates the need for access token authentication on the socket path,
as Core authenticates the peer by the socket connection itself.

Key changes:
- Add FeatureFlag.CORE_UNIX_SOCKET to gate the feature
- HomeAssistantAPI: transport-aware session/url/websocket management
- WSClient: separate connect() (Unix, no auth) and connect_with_auth()
  (TCP) class methods with proper error handling
- APIProxy delegates websocket setup to api.connect_websocket()
- Container state tracking for Unix session lifecycle
- CI builder mounts /run/supervisor for integration tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Sort feature flags alphabetically

* Drop per-call max_msg_size from WSClient

Hardcode the WebSocket message size cap to 64 MB in WSClient and remove
the parameter from WSClient.connect, connect_with_auth, _ws_connect,
and HomeAssistantAPI.connect_websocket. This was only ever overridden
by APIProxy, so threading it through four layers was unnecessary.

max_msg_size is a cap, not a pre-allocation; aiohttp only grows buffers
to the size of actual incoming messages. Supervisor's own control
channel never approaches 64 MB, so unifying the limit has no runtime
cost.

Addresses review feedback on #6742.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 15:03:05 +02:00
Mike Degatano 56abe94d74 Add versioned v2 API with apps terminology (#6741)
* Add versioned v2 API with apps terminology

Introduce a v2 API sub-app mounted at /v2 that uses 'apps' terminology
throughout, while keeping v1 fully backward-compatible.

Key changes:
- Add ATTR_ADDONS = 'addons' constant alongside ATTR_APPS = 'apps' so
  backup file data (which must remain 'addons' for backward compat) and
  v2 API responses can use distinct constants
- Add FeatureFlag.SUPERVISOR_V2_API to gate v2 route registration
- Mount aiohttp sub-app at /v2 in RestAPI.load() when flag is enabled
- Add _AppSecurityPatterns frozen dataclass and _V1_PATTERNS/_V2_PATTERNS
  with strict per-version regex sets (no cross-version matching)
- Add _register_v2_apps, _register_v2_backups, _register_v2_store route
  registration methods
- Add v1 thin wrapper methods (*_v1) for all affected endpoints so
  business logic lives in the canonical v2 methods
- Extract _info_data() helper in APIApps so v1 closure can bypass
  @api_process and still catch APIAppNotInstalled for store routing
- Add _rename_apps_to_addons_in_backups(), _process_location_in_body(),
  _all_store_apps_info() shared helpers to eliminate duplication
- Add api_client_v2, api_client_with_prefix, app_api_client_with_root,
  store_app_api_client_with_root parameterized test fixtures
- Add test_v2_api_disabled_without_feature_flag
- Parameterize backup, addons, and store tests to cover both v1 and v2
  paths

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix pylint false positive for re.Pattern C extension methods

re.Pattern methods (match, search, etc.) are C extension methods.
Pylint cannot detect them via static analysis when re.Pattern is used
as a type annotation in a dataclass field, producing false E1101
no-member errors. Add generated-members to inform pylint these members
exist.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* pylint and feedback fixes

* Copilot suggested fixes

* Minor feedback fixes

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-20 21:19:27 +02:00
Stefan Agner 38ddb3df54 Fix Core update rollback: delay image cleanup and fix missing rollback path (#6726)
* Delay old image cleanup until after health checks on Core update

Move the old Docker image cleanup from inside _update() to after the
post-update health checks (frontend loaded and accessible). This keeps
the previous version's image available locally when a rollback is
needed, avoiding a potentially slow re-download.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add test assertions for old image cleanup timing on Core update

Verify that the old Docker image is cleaned up only after health checks
pass, and not when a rollback is triggered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix missing rollback when get_config fails after Core update

The early return after setting error_state skipped the rollback block,
leaving the system on a broken new version when the API stopped
responding after update. The other health check failure paths correctly
fall through to the rollback logic; this was the only one that didn't.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 10:57:13 +02:00