* Remove CLI command hint from unknown error messages
Since #6303 introduced specific error messages for many cases,
the generic "check with 'ha supervisor logs'" hint in unknown
error messages is no longer as useful. Remove the CLI command
part while keeping the "Check supervisor logs for details" rider.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Use consistently "Supervisor logs" with capitalization
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Add specific error message for registry authentication failures
When a Docker image pull fails with 401 Unauthorized and registry
credentials are configured, raise DockerRegistryAuthError instead of
a generic DockerError. This surfaces a clear message to the user
("Docker registry authentication failed for <registry>. Check your
registry credentials") instead of "An unknown error occurred with
addon <name>".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add tests for registry authentication error handling
Test that a 401 during image pull raises DockerRegistryAuthError when
credentials are configured, and falls back to generic DockerError
when no credentials are present.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add tests for addon install/update/rebuild auth failure handling
Test that DockerRegistryAuthError propagates correctly through
addon install, update, and rebuild paths without being wrapped
in a generic AddonUnknownError.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Include Docker registry configurations in backups
Docker registry credentials were removed from backup metadata in a prior
change to avoid exposing secrets in unencrypted data. Now that the encrypted
supervisor.tar inner archive exists, add docker.json alongside mounts.json
to securely backup and restore registry configurations.
On restore, registries from the backup are merged with any existing ones.
Old backups without docker.json are handled gracefully.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Increase test coverage by testing more error paths
* Address review feedback for Docker registry backup
Remove unnecessary dict() copy when serializing registries for backup
since the property already returns a dict.
Change DockerConfig.registries to use direct key access instead of
.get() with a default. The schema guarantees the key exists, and
.get() with a default would return a detached temporary dict that
silently discards updates.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Reuse IMAGE_REGISTRY_REGEX for docker_image validation
Replace the monolithic regex in docker_image validator with a
function-based approach that reuses get_registry_from_image() from
docker.utils for robust registry detection. This properly handles
domains, IPv4/IPv6 addresses, ports, and localhost while still
rejecting tags (managed separately by the add-on system).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Address review feedback: reorder checks for efficiency
Check falsy value before isinstance, and empty path before tag check,
as suggested in PR review.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aiodocker derives ServerAddress for X-Registry-Auth by doing
image.partition("/"). For Docker Hub images like
"homeassistant/amd64-supervisor", this extracts "homeassistant"
(the namespace) instead of "docker.io" (the registry).
With the classic graphdriver image store, ServerAddress was never
checked and credentials were sent regardless. With the containerd
image store (default since Docker v29 / HAOS 15), the resolver
compares ServerAddress against the actual registry host and silently
drops credentials on mismatch, falling back to anonymous access.
Fix by prefixing Docker Hub images with "docker.io/" when registry
credentials are configured, so aiodocker sets ServerAddress correctly.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Added support for negative numbers in options
* Do not allow -. as float
* Added tests for integers and floats in options.
* Fixed ruff errors
* Added tests for outside of int/float limits
* Include network storage mount configurations in backups
When creating backups, now stores mount configurations (CIFS/NFS shares)
including server, share, credentials, and other settings. When restoring
from a backup, mount configurations are automatically restored.
This fixes the issue where network storage definitions for backups
were lost after restoring from a backup, requiring users to manually
reconfigure their network storage mounts.
Changes:
- Add ATTR_MOUNTS schema to backup validation
- Add store_mounts() method to save mount configs during backup
- Add restore_mounts() method to restore mount configs during restore
- Add MOUNTS stage to backup/restore job stages
- Update BackupManager to call mount backup/restore methods
- Add tests for mount backup/restore functionality
Fixes home-assistant/core#148663
* Address reviewer feedback for mount backup/restore
Changes based on PR review:
- Store mount configs in encrypted mounts.tar instead of unencrypted
backup metadata (security fix for passwords)
- Separate mount restore into config save + async activation tasks
(mounts activate in background, failures don't block restore)
- Add replace_default_backup_mount parameter to control whether to
overwrite existing default mount setting
- Remove unnecessary broad exception handler for default mount setter
- Simplify schema: ATTR_MOUNTS is now just a boolean flag since
actual data is in the encrypted tar file
- Update tests to reflect new async API and return types
* Fix code review issues in mount backup/restore
- Add bind mount handling for MEDIA and SHARE usage types in
_activate_restored_mount() to mirror MountManager.create_mount()
- Fix double save_data() call by using needs_save flag
- Import MountUsage const for usage type checks
* Add pylint disable comments for protected member access
* Tighten broad exception handlers in mount backup restore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Address second round of reviewer feedback
- Catch OSError separately and check errno.EBADMSG for drive health
- Validate mounts JSON against SCHEMA_MOUNTS_CONFIG before importing
- Use mount_data[ATTR_NAME] instead of .get("name", "unknown")
- Overwrite existing mounts on restore instead of skipping
- Move restore_mount/activate logic to MountManager (no more
protected-access in Backup)
- Drop unused replace_default_backup_mount parameter
- Fix test_backup_progress: add mounts stage to expected events
- Fix test_store_mounts: avoid create_mount which requires dbus
* Rename mounts.tar to supervisor.tar for generic supervisor config
Rename the inner tar from mounts.tar to supervisor.tar so it can hold
multiple config files (mounts.json now, docker credentials later).
Rename store_mounts/restore_mounts to store_supervisor_config/
restore_supervisor_config and update stage names accordingly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix pylint protected-access and test timeouts in backup tests
- Add pylint disable comment for _mounts protected access in test_backup.py
- Mock restore_supervisor_config in test_full_backup_to_mount and
test_partial_backup_to_mount to avoid D-Bus mount activation during
restore that causes timeouts in the test environment
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Address agners review feedback
- Change create_inner_tar() to create_tar() per #6575
- Remove "r" argument from SecureTarFile (now read-only by default)
- Change warning to info for missing mounts tar (old backups won't have it)
- Narrow exception handler to (MountError, vol.Invalid, KeyError, OSError)
* Update supervisor/backups/backup.py
* Address agners feedback: remove metadata flag, add mount feature check
- Remove ATTR_SUPERVISOR boolean flag from backup metadata; instead
check for physical presence of supervisor.tar (like folder backups)
- Remove has_supervisor_config property
- Always attempt supervisor config restore (tar existence check handles it)
- Add HostFeature.MOUNT check in _activate_restored_mount before
attempting to activate mounts on systems without mount support
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
systemd only emits bus signals (including PropertiesChanged) when at
least one client has called Subscribe() on the Manager interface. On
regular HAOS systems, systemd-logind calls Subscribe which enables
signals for all bus clients. However, in environments without
systemd-logind (such as the Supervisor devcontainer with systemd), no
signals are emitted, causing the firewall unit wait to time out.
Explicitly calling Subscribe() has no downsides and makes it clear
that the Supervisor relies on these signals. There is no need to call
Unsubscribe() as systemd automatically tracks clients and stops
emitting signals when all subscribers have disconnected from the bus.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add iptables rules via a systemd transient unit to drop traffic
addressed to the bridge gateway IP from non-bridge interfaces.
The firewall manager waits for the transient unit to complete and
verifies success via D-Bus property change signals. On failure, the
system is marked unhealthy and host-network add-ons are prevented
from booting.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Wait for addon startup task before unload to prevent data access race
Replace the cancel-based approach in unload() with an await of the outer
_wait_for_startup_task. The container removal and state change resolve the
startup event naturally, so we just need to ensure the task completes
before addon data is removed. This prevents a KeyError on self.name access
when _wait_for_startup times out after data has been removed.
Also simplify _wait_for_startup by removing the unnecessary inner task
wrapper — asyncio.wait_for can await the event directly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Drop asyncio.sleep() in test_manager.py
* Only clear startup task reference if still the current task
Prevent a race where an older _wait_for_startup task's finally block
could wipe the reference to a newer task, causing unload() to skip
the await.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Reuse existing pending startup wait task when addon is already running
If start() is called while the addon is already running and a startup
wait task is still pending, return the existing task instead of creating
a new one.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Fix fallback time sync, create repair issue if time is out of sync
The "poor man's NTP" using the whois service didn't work because it attempted
to sync the time when the NTP service was enabled, which is rejected by the
timedated service. To fix this, Supervisor now first disables the
systemd-timesyncd service and creates a repair issue before adjusting the time.
The timesyncd service stays disabled until submitting the fixup. Theoretically,
if the time moves backwards from an invalid time in the future,
systemd-timesyncd could otherwise restore the wrong time from a timestamp if we
did that after the time was set.
Also, the sync is now performed if the time is more that 1 hour off and in both
directions (previously it only intervened if it was more than 3 days in the
past).
Fixes#6015, refs #6549
* Update test_adjust_system_datetime_if_time_behind
The core_security check (HA < 2021.1.5 with custom components) and the
ResolutionNotify class that created persistent notifications for it are
no longer needed. The minimum supported HA version is well past 2021.1.5,
so this check can never trigger. The notify module was the only consumer
of persistent notifications and had no other users.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Treat empty string password as None in backup restore
Work around a securetar 2026.2.0 bug where an empty string password
sets encrypted=True but fails to derive a key, leading to an
AttributeError on restore. This also restores consistency with backup
creation which uses a truthiness check to skip encryption for empty
passwords.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Explicitly mention that "" is treated as no password
* Add tests for empty string password handling in backups
Verify that empty string password is treated as no password on both
backup creation (not marked as protected) and restore (normalized to
None in set_password).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Improve comment
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Use Python 3.14(.3) in CI and base image
Update base image to the latest tag using Python 3.14.3 and update Python
version in CI workflows to 3.14.
With Python 3.14, backports.zstd is no longer necessary as it's now available
in the standard library.
* Update wheels ABI in the wheels builder to cp314
* Use explicit Python fix version in GH actions
Specify explicitly Python 3.14.3, as the setup-python action otherwise default
to 3.14.2 when 3.14.3, leading to different version in CI and in production.
* Update Python version references in pyproject.toml
* Fix all ruff quoted-annotation (UP037) errors
* Revert unquoting of DBus types in tests and ignore UP037 where needed
* Respect auto-update setting for plug-in auto-updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Also skip auto-updating plug-ins in decorator
* Raise if auto-update flag is not set and plug-in is not up to date
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
aiohttp's BasicAuth.decode() raises ValueError for any non-Basic auth
method (e.g. Bearer tokens). This propagated as an unhandled exception,
causing a 500 response instead of the expected 401 Unauthorized.
Catch the ValueError in _process_basic() and raise HTTPUnauthorized with
the WWW-Authenticate realm header so clients get a proper 401 response.
Fixes SUPERVISOR-BFG
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The _migrate function in addons/validate.py is the first validator in the
SCHEMA_ADDON_CONFIG All() chain and was called directly with raw config data.
If a malformed add-on config file contained a non-dict value (e.g. a string),
config.get() would raise an AttributeError instead of a proper voluptuous
Invalid error, causing an unhandled exception.
Add an isinstance check at the top of _migrate to raise vol.Invalid for
non-dict inputs, letting validation fail gracefully.
Fixes SUPERVISOR-HMP
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Drop unsupported architectures and machines from Supervisor
Since #5620 Supervisor no longer updates the version information on
unsupported architectures and machines. This means users can no longer
update to newer version of Supervisor since that PR got released.
Furthermore since #6347 we also no longer build for these
architectures. With this, any code related to these architectures
becomes dead code and should be removed.
This commit removes all refrences to the deprecated architectures and
machines from Supervisor.
This affects the following architectures:
- armhf
- armv7
- i386
And the following machines:
- odroid-xu
- qemuarm
- qemux86
- raspberrypi
- raspberrypi2
- raspberrypi3
- raspberrypi4
- tinker
* Create issue if an app using a deprecated architecture is installed
This adds a check to the resolution system to detect if an app is
installed that uses a deprecated architecture. If so, it will show a
warning to the user and recommend them to uninstall the app.
* Formally deprecate machine add-on configs as well
Not only deprecate add-on configs for unsupported architectures, but
also for unsupported machines.
* For installed add-ons architecture must always exist
Fail hard in case of missing architecture, as this is a required field
for installed add-ons. This will prevent the Supervisor from running
with an unsupported configuration and causing further issues down the
line.
* Fix add-on build using wrong architecture for non-native arch add-ons
When building a locally-built add-on (no image tag), the architecture
was always set to sys_arch.default (e.g. amd64 on x86_64) instead of
matching against the add-on's declared architectures. This caused an
i386-only add-on to incorrectly build as amd64.
Use sys_arch.match() against the add-on's declared arch list in all
code paths: the arch property, image name generation, BUILD_ARCH build
arg, and default base image selection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Use CpuArch enums to fix tests
* Explicitly set _supported_arch as new list to fix tests
* Fix pytests
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Use verbose log output for plug-ins
All three plug-ins which support logging (dns, multicast and audio)
should use the verbose log format by default to make sure the log lines
are annotated with timestamp. Introduce a new flag default_verbose for
advanced logs.
* Use default_verbose for host logs as well
Use the new default_verbose flag for advanced logs, to make it more
explicit that we want timestamps for host logs as well.
The /os/info API endpoint has been using D-Bus property TimeUSec which got
cached between requests, so the time returned was not always the same as
current time on the host system at the time of the request. Since there's no
reason to use D-Bus API for the time, as Supervisor runs on the same machine
and time is global, simply format current datetime object with Python and
return it in the response.
Fixes#6581
* Handle missing Accept header in host logs
Avoid indexing request headers directly in the host advanced logs handler when Accept is absent, preventing KeyError crashes on valid requests without that header. Fixes SUPERVISOR-1939.
* Add pytest
* Ensure uuid of dismissed suggestion/issue matches an existing one
* Fix lint, test and feedback issues
* Adjust existing tests and remove new ones for not found errors
* fix device access issue usage
* Bump securetar from 2025.12.0 to 2026.2.0
Adapt to the new securetar API:
- Use SecureTarArchive for outer backup tar (replaces SecureTarFile
with gzip=False for the outer container)
- create_inner_tar() renamed to create_tar(), password now inherited
from the archive rather than passed per inner tar
- SecureTarFile no longer accepts a mode parameter (read-only by
default, InnerSecureTarFile for writing)
- Pass create_version=2 to keep protected backups at version 2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Reformat imports
* Rename _create_cleanup to _create_finalize and update docstring
* Use constant for SecureTar create version
* Add test for SecureTarReadError in validate_backup
securetar >= 2026.2.0 raises SecureTarReadError instead of
tarfile.ReadError for invalid passwords. Catching this exception
and raising BackupInvalidError is required so Core shows the
encryption key dialog to the user.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Handle InvalidPasswordError for v3 backups
* Address typos
* Add securetar v3 encrypted password test fixture
Add a test fixture for a securetar v3 encrypted backup with password.
This will be used in the test suite to verify that the backup
extraction process correctly handles encrypted backups.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Harden backup tar extraction with Python data filter
Replace filter="fully_trusted" with a custom backup_data_filter that
wraps tarfile.data_filter. This adds protection against symlink attacks
(absolute targets, destination escapes), device node injection, and
path traversal, while resetting uid/gid and sanitizing permissions.
Unlike using data_filter directly, the custom filter skips problematic
entries with a warning instead of aborting the entire extraction. This
ensures existing backups containing absolute symlinks (e.g. in shared
folders) still restore successfully with the dangerous entries omitted.
Also removes the now-redundant secure_path member filtering, as
data_filter is a strict superset of its protections. Fixes a standalone
bug in _folder_restore which had no member filtering at all.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Simplify security tests to test backup_data_filter directly
Test the public backup_data_filter function with plain tarfile
extraction instead of going through Backup internals. Removes
protected-access pylint warnings and unnecessary coresys setup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Switch to tar filter instead of custom data filter wrapper
Replace backup_data_filter (which wrapped data_filter and skipped
problematic entries) with the built-in tar filter. The tar filter
rejects path traversal and absolute names while preserving uid/gid
and file permissions, which is important for add-ons running as
non-root users.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Apply suggestions from code review
Co-authored-by: Erik Montnemery <erik@montnemery.com>
* Use BackupInvalidError instead of BackupError for tarfile.TarError
Make sure FilterErrors lead to BackupInvalidError instead of BackupError,
as they are not related to the backup process itself but rather to the
integrity of the backup data.
* Improve test coverage and use pytest.raises
* Only make FilterError a BackupInvalidError
* Add test case for FilterError during Home Assistant Core restore
* Add test cases for Add-ons
* Fix pylint warnings
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Erik Montnemery <erik@montnemery.com>
* Unify Core user listing with HomeAssistantUser model
Replace the ingress-specific IngressSessionDataUser with a general
HomeAssistantUser dataclass that models the Core config/auth/list WS
response. This deduplicates the WS call (previously in both auth.py
and module.py) into a single HomeAssistant.list_users() method.
- Add HomeAssistantUser dataclass with fields matching Core's user API
- Remove get_users() and its unnecessary 5-minute Job throttle
- Auth and ingress consumers both use HomeAssistant.list_users()
- Auth API endpoint uses typed attribute access instead of dict keys
- Migrate session serialization from legacy "displayname" to "name"
- Accept both keys in schema/deserialization for backwards compat
- Add test for loading persisted sessions with legacy displayname key
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Tighten list_users() to trust Core's auth/list contract
Core's config/auth/list WS command always returns a list, never None.
Replace the silent `if not raw: return []` (which also swallowed empty
lists) with an assert, remove the dead AuthListUsersNoneResponseError
exception class, and document the HomeAssistantWSError contract in the
docstring.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Remove | None from async_send_command return type
The WebSocket result is always set from data["result"] in _receive_json,
never explicitly to None. Remove the misleading | None from the return
type of both WSClient and HomeAssistantWebSocket async_send_command, and
drop the now-unnecessary assert in list_users.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Use HomeAssistantWSConnectionError in _ensure_connected
_ensure_connected and connect_with_auth raise on connection-level
failures, so use the more specific HomeAssistantWSConnectionError
instead of the broad HomeAssistantWSError. This allows callers to
distinguish connection errors from Core API errors (e.g. unsuccessful
WebSocket command responses). Also document that _ensure_connected can
propagate HomeAssistantAuthError from ensure_access_token.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Remove user list cache from _find_user_by_id
Drop the _list_of_users cache to avoid stale auth data in ingress
session creation. The method now fetches users fresh each time and
returns None on any API error instead of serving potentially outdated
cached results.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add periodic progress logging during initial Core installation
Log installation progress every 15 seconds while downloading the
Home Assistant Core image during initial setup (landing page to core
transition). Uses asyncio.Event with wait_for timeout to produce
time-based logs independent of Docker pull events, ensuring visibility
even when the network stalls.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add test coverage
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Fix getting Supervisor IP address in testing
Newer Docker versions (probably newer than 29.x) do not have a global
IPAddress attribute under .NetworkSettings anymore. There is a network
specific map under Networks. For our case the hassio has the relevant
IP address. This network specific maps already existed before, hence
the new inspect format works for old as well as new Docker versions.
While at it, also adjust the test fixture.
* Actively wait for hassio IPAddress to become valid
* Remove blocking I/O added to import_image
* Add scanned modules to extra blockbuster functions
* Use same cast avoidance approach in export_image
* Remove unnecessary local image_writer variable
* Remove unnecessary local image_tar_stream variable
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Raise HomeAssistantWSError when Core WebSocket is unreachable
Previously, async_send_command silently returned None when Home Assistant
Core was not reachable, leading to misleading error messages downstream
(e.g. "returned invalid response of None instead of a list of users").
Refactor _can_send to _ensure_connected which now raises
HomeAssistantWSError on connection failures while still returning False
for silent-skip cases (shutdown, unsupported version). async_send_message
catches the exception to preserve fire-and-forget behavior.
Update callers that don't handle HomeAssistantWSError: _hardware_events
and addon auto-update in tasks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Simplify HomeAssistantWebSocket command/message distinction
The WebSocket layer had a confusing split between "messages" (fire-and-forget)
and "commands" (request/response) that didn't reflect Home Assistant Core's
architecture where everything is just a WS command.
- Remove dead WSClient.async_send_message (never called)
- Rename async_send_message → _async_send_command (private, fire-and-forget)
- Rename send_message → send_command (sync wrapper)
- Simplify _ensure_connected: drop message param, always raise on failure
- Simplify async_send_command: always raise on connection errors
- Remove MIN_VERSION gating (minimum supported Core is now 2024.2+)
- Remove begin_backup/end_backup version guards for Core < 2022.1.0
- Add debug logging for silently ignored connection errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Wait for Core to come up before backup
This is crucial since the WebSocket command to Core now fails with the
new error handling if Core is not running yet.
* Wait for Core install job instead
* Use CLI to fetch jobs instead of Supervisor API
The Supervisor API needs authentication token, which we have not
available at this point in the workflow. Instead of fetching the token,
we can use the CLI, which is available in the container.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When an addon updates from having no ingress to having ingress, the
ingress token map was never rebuilt. Both update() and rebuild() called
_check_ingress_port() to assign a dynamic port but skipped the
sys_ingress.reload() call that registers the token. This caused
Ingress.get() to return None, resulting in a 503 error.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add D-Bus tolerant enum base classes to prevent crashes on unknown values
D-Bus services (systemd, NetworkManager, RAUC, UDisks2) can introduce
new enum values at any time via OS updates. Standard Python enum
construction raises ValueError for unknown values, which would crash
the Supervisor.
Introduce DBusStrEnum and DBusIntEnum base classes that use Python's
_missing_ hook to create pseudo-members for unknown values. These
pseudo-members pass isinstance checks (satisfying typeguard), preserve
the original value, don't pollute __members__, and report unknown
values to Sentry (deduplicated per class+value) for observability.
Migrate 17 D-Bus enums in dbus/const.py and udisks2/const.py to the
new base classes. Enums only sent TO D-Bus (StopUnitMode, StartUnitMode,
etc.) are left unchanged. Remove the manual try/except workaround in
NetworkInterface.type now that DBusIntEnum handles it automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add explicit enum conversions for systemd-resolved D-Bus properties
The resolved properties (dns_over_tls, dns_stub_listener, dnssec, llmnr,
multicast_dns, resolv_conf_mode) were returning raw string values from
D-Bus without converting to their declared enum types. This would fail
runtime type checking with typeguard.
Now safe to add explicit conversions since these enums use DBusStrEnum,
which tolerates unknown values from D-Bus without crashing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Avoid blocking I/O in D-Bus enum Sentry reporting
Move sentry_sdk.capture_message out of the event loop by adding a
fire_and_forget_capture_message helper that offloads the call to the
executor when a running loop is detected.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Handle exceptions when reporting message to Sentry
* Narrow typing of reported values
Use str/int explicitly since that is what the two existing Enum classes
can actually report.
* Adjust test style
* Apply suggestions from code review
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add a type check for device options in AddonOptions._single_validate
to ensure the value is a string before passing it to Path(). When a
non-string value (e.g. a dict) is provided for a device option, this
now raises a proper vol.Invalid error instead of an unhandled TypeError.
Fixes SUPERVISOR-175H
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Replace the dynamic `getattr(self.sys_websession, method)(...)` pattern
with the explicit `self.sys_websession.request(method, ...)` call. This
is type-safe and avoids runtime failures from typos in method names.
Also wrap the timeout parameter in `aiohttp.ClientTimeout` for
consistency with the typed `request()` signature.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Fix environment variable type errors by converting IP addresses to strings
Environment variables must be strings, but IPv4Address and IPv4Network
objects were being passed directly to container environment dictionaries,
causing typeguard validation errors.
Changes:
- Convert IPv4Address objects to strings in homeassistant.py for
SUPERVISOR and HASSIO environment variables
- Convert IPv4Network object to string in observer.py for
NETWORK_MASK environment variable
- Update tests to expect string values instead of IP objects in
environment dictionaries
- Remove unused ip_network import from test_observer.py
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Use explicit string conversion for extra_hosts IP addresses
Use the !s format specifier in the f-string to explicitly convert
IPv4Address objects to strings when building the ExtraHosts list.
While f-strings implicitly convert objects to strings, using !s makes
the conversion explicit and consistent with the environment variable
fixes in the previous commit.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add the Docker storage driver (e.g., overlay2, vfs) to the context
information sent with Sentry error reports. This helps correlate
issues with specific storage backends and improves debugging of
Docker-related problems.
The storage driver is now included in both SETUP and RUNNING state
error reports under contexts.docker.storage_driver.
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Fix MCP API proxy support for streaming and headers
This commit fixes two issues with using the core API core/api/mcp through
the API proxy:
1. **Streaming support**: The proxy now detects text/event-stream responses
and properly streams them instead of buffering all data. This is required
for MCP's Server-Sent Events (SSE) transport.
2. **Header forwarding**: Added MCP-required headers to the forwarded headers:
- Accept: Required for content negotiation
- Last-Event-ID: Required for resuming broken SSE connections
- Mcp-Session-Id: Required for session management across requests
The proxy now also preserves MCP-related response headers (Mcp-Session-Id)
and sets X-Accel-Buffering to "no" for streaming responses to prevent
buffering by intermediate proxies.
Tests added to verify:
- MCP headers are properly forwarded to Home Assistant
- Streaming responses (text/event-stream) are handled correctly
- Response headers are preserved
* Refactor: reuse stream logic for SSE responses (#3)
* Fix ruff format + cover streaming payload error
* Fix merge error
* Address review comments (headers / streaming proxy) (#4)
* Address review: header handling for streaming/non-streaming
* Forward MCP-Protocol-Version and Origin headers
* Do not forward Origin header through API proxy (#5)
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
The CpuArch enum was being used inconsistently throughout the codebase,
with some code expecting enum values and other code expecting strings.
This caused type checking issues and potential runtime errors.
Changes:
- Fix match_base() to return CpuArch enum instead of str
- Add explicit string conversions using !s formatting where arch values
are used in f-strings (build.py, model.py)
- Convert CpuArch to str explicitly in contexts requiring strings
(docker/addon.py, misc/filter.py)
- Update all tests to use CpuArch enum values instead of strings
- Update test mocks to return CpuArch enum values
This ensures type consistency and improves MyPy type checking accuracy
across the architecture detection and management code.
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The manifest fetcher was using docker.io as the registry API endpoint,
but Docker Hub's actual registry API is at registry-1.docker.io. When
trying to access https://docker.io/v2/..., requests were being redirected
to https://www.docker.com/ (the marketing site), which returned HTML
instead of JSON, causing manifest fetching to fail.
This matches exactly what Docker itself does internally - see
daemon/pkg/registry/config.go:49 where Docker hardcodes
DefaultRegistryHost = "registry-1.docker.io" for registry operations.
Changes:
- Add DOCKER_HUB_API constant for the actual API endpoint
- Add _get_api_endpoint() helper to translate docker.io to
registry-1.docker.io for HTTP API calls
- Update _get_auth_token() and _fetch_manifest() to use the API endpoint
- Keep docker.io as the registry identifier for naming and credentials
- Add tests to verify the API endpoint translation
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Migrate info and events to aiodocker
* Migrate container logs to aiodocker
* Fix dns plugin loop test
* Fix mocking for docker info
* Fixes from feedback
* Harden monitor error handling
* Deleted failing tests because they were not useful
* Fix Docker exec exit code handling by using detach=False
When executing commands inside containers using `container_run_inside()`,
the exec metadata did not contain a valid exit code because `detach=True`
starts the exec in the background and returns immediately before completion.
Root cause: With `detach=True`, Docker's exec start() returns an awaitable
that yields output bytes. However, the await only waits for the HTTP/REST
call to complete, NOT for the actual exec command to finish. The command
continues running in the background after the HTTP response is received.
Calling `inspect()` immediately after returns `ExitCode: None` because
the exec hasn't completed yet.
Solution: Use `detach=False` which returns a Stream object that:
- Automatically waits for exec completion by reading from the stream
- Provides actual command output (not just empty bytes)
- Makes exit code immediately available after stream closes
- No polling needed
Changes:
- Switch from `detach=True` to `detach=False` in container_run_inside()
- Read output from stream using async context manager
- Add defensive validation to ensure ExitCode is never None
- Update tests to mock the Stream interface using AsyncMock
- Add debug log showing exit code after command execution
Fixes#6518
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Address review feedback
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The Supervisor's /core/api proxy previously only supported GET and POST
methods, returning 405 Method Not Allowed for DELETE requests. This
prevented addons from calling Home Assistant Core REST API endpoints
that require DELETE methods, such as deleting automations, scripts,
or scenes.
The underlying proxy implementation already supported passing through
any HTTP method via request.method.lower(), so only the route
registration was needed.
Fixes#6509
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The aiodocker 0.25.0 upgrade (PR #6448) changed how DockerError handles
the message parameter. The library now extracts the message string from
Docker API JSON responses before passing it to DockerError, rather than
passing the entire dict.
The port conflict detection tests were written before this change and
incorrectly passed dicts to DockerError. This caused TypeErrors when
the port conflict detection code tried to match err.message with a
regex, expecting a string but receiving a dict.
Update both test_addon_start_port_conflict_error and
test_observer_start_port_conflict to pass message strings directly,
matching the real aiodocker 0.25.0 behavior.
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Map port conflict on start error into a known error
* Apply suggestions from code review
* Run ruff format
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Use count-based progress for Docker image pulls
Refactor Docker image pull progress to use a simpler count-based approach
where each layer contributes equally (100% / total_layers) regardless of
size. This replaces the previous size-weighted calculation that was
susceptible to progress regression.
The core issue was that Docker rate-limits concurrent downloads (~3 at a
time) and reports layer sizes only when downloading starts. With size-
weighted progress, large layers appearing late would cause progress to
drop dramatically (e.g., 59% -> 29%) as the total size increased.
The new approach:
- Each layer contributes equally to overall progress
- Per-layer progress: 70% download weight, 30% extraction weight
- Progress only starts after first "Downloading" event (when layer
count is known)
- Always caps at 99% - job completion handles final 100%
This simplifies the code by moving progress tracking to a dedicated
module (pull_progress.py) and removing complex size-based scaling logic
that tried to account for unknown layer sizes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Exclude already-existing layers from pull progress calculation
Layers that already exist locally should not count towards download
progress since there's nothing to download for them. Only layers that
need pulling are included in the progress calculation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add registry manifest fetcher for size-based pull progress
Fetch image manifests directly from container registries before pulling
to get accurate layer sizes upfront. This enables size-weighted progress
tracking where each layer contributes proportionally to its byte size,
rather than equal weight per layer.
Key changes:
- Add RegistryManifestFetcher that handles auth discovery via
WWW-Authenticate headers, token fetching with optional credentials,
and multi-arch manifest list resolution
- Update ImagePullProgress to accept manifest layer sizes via
set_manifest() and calculate size-weighted progress
- Fall back to count-based progress when manifest fetch fails
- Pre-populate layer sizes from manifest when creating layer trackers
The manifest fetcher supports ghcr.io, Docker Hub, and private
registries by using credentials from Docker config when available.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Clamp progress to 100 to prevent floating point precision issues
Floating point arithmetic in weighted progress calculations can produce
values slightly above 100 (e.g., 100.00000000000001). This causes
validation errors when the progress value is checked.
Add min(100, ...) clamping to both size-weighted and count-based
progress calculations to ensure the result never exceeds 100.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Use sys_websession for manifest fetcher instead of creating new session
Reuse the existing CoreSys websession for registry manifest requests
instead of creating a new aiohttp session. This improves performance
and follows the established pattern used throughout the codebase.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Make platform parameter required and warn on missing platform
- Make platform a required parameter in get_manifest() and _fetch_manifest()
since it's always provided by the calling code
- Return None and log warning when requested platform is not found in
multi-arch manifest list, instead of falling back to first manifest
which could be the wrong architecture
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Log manifest fetch failures at warning level
Users will notice degraded progress tracking when manifest fetch fails,
so log at warning level to help diagnose issues.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add pylint disable comments for protected access in manifest tests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Separate download_current and total_size updates in pull progress
Update download_current and total_size independently in the DOWNLOADING
handler. This ensures download_current is updated even when total is
not yet available.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Reject invalid platform format in manifest selection
---------
Co-authored-by: Claude <noreply@anthropic.com>
During system shutdown (reboot/poweroff), the watchdog was incorrectly
detecting the Home Assistant Core container as failed and attempting to
restart it. This occurred because Docker was stopping all containers in
parallel with Supervisor's own shutdown sequence, causing the watchdog
to trigger while add-ons were still being stopped.
This led to an abrupt termination of Core before it could cleanly shut
down its SQLite database, resulting in a warning on the next startup:
"The system could not validate that the sqlite3 database was shutdown
cleanly".
The fix registers a supervisor state change listener that unregisters
the watchdog when entering any shutdown state (SHUTDOWN, STOPPING, or
CLOSE). This prevents restart attempts during both user-initiated
reboots (via API) and external shutdown signals (Docker SIGTERM,
console reboot commands).
Since SHUTDOWN, STOPPING, and CLOSE are terminal states with no reverse
transition back to RUNNING, no re-registration logic is needed.
Fixes#6511
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>