During system shutdown (reboot/poweroff), the watchdog was incorrectly
detecting the Home Assistant Core container as failed and attempting to
restart it. This occurred because Docker was stopping all containers in
parallel with Supervisor's own shutdown sequence, causing the watchdog
to trigger while add-ons were still being stopped.
This led to an abrupt termination of Core before it could cleanly shut
down its SQLite database, resulting in a warning on the next startup:
"The system could not validate that the sqlite3 database was shutdown
cleanly".
The fix registers a supervisor state change listener that unregisters
the watchdog when entering any shutdown state (SHUTDOWN, STOPPING, or
CLOSE). This prevents restart attempts during both user-initiated
reboots (via API) and external shutdown signals (Docker SIGTERM,
console reboot commands).
Since SHUTDOWN, STOPPING, and CLOSE are terminal states with no reverse
transition back to RUNNING, no re-registration logic is needed.
Fixes#6511
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Check frontend availability after Home Assistant Core updates
Add verification that the frontend is actually accessible at "/" after core
updates to ensure the web interface is serving properly, not just that the
API endpoints respond.
Previously, the update verification only checked API endpoints and whether
the frontend component was loaded. This could miss cases where the API is
responsive but the frontend fails to serve the UI.
Changes:
- Add check_frontend_available() method to HomeAssistantAPI that fetches
the root path and verifies it returns HTML content
- Integrate frontend check into core update verification flow after
confirming the frontend component is loaded
- Trigger automatic rollback if frontend is inaccessible after update
- Fix blocking I/O calls in rollback log file handling to use async
executor
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Avoid checking frontend if config data is None
* Improve pytest tests
* Make sure Core returns a valid config
* Remove Core version check in frontend availability test
The call site already makes sure that an actual Home Assistant Core
instance is running before calling the frontend availability test.
So this is rather redundant. Simplify the code by removing the version
check and update tests accordingly.
* Add test coverage for get_config
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Migrate all docker container interactions to aiodocker
* Remove containers_legacy since its no longer used
* Add back remove color logic
* Revert accidental invert of conditional in setup_network
* Fix typos found by copilot
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Revert "Apply suggestions from code review"
This reverts commit 0a475433ea.
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix too short timeouts for Synology NAS
With Home Assistant Core 2025.12.x updates available the STARTUP_API_RESPONSE_TIMEOUT that HA supervisor is willing to wait (before assuming a startup failure and rolling back the entire core update) seems to be too low on not-so-beefy hosts. The problem has been seen on Synology NAS machines running Home Assistant on the side (like in my case). I have doubled the timeout from 3 to 6 minutes and the upgrade to Core 2025.12.1 works on my Synology DS723+. My update took 4min 56s -- hence the timeout increase was proven necessary.
* Fix tests for increased API Timeout
* Increase the timeout to 10 minutes
* Increase the timeout in tests
---------
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Fix private registry authentication for aiodocker image pulls
After PR #6252 migrated image pulling from dockerpy to aiodocker,
private registry authentication stopped working. The old _docker_login()
method stored credentials in ~/.docker/config.json via dockerpy, but
aiodocker doesn't read that file - it requires credentials passed
explicitly via the auth parameter.
Changes:
- Remove unused _docker_login() method (dockerpy login was ineffective)
- Pass credentials directly to pull_image() via new auth parameter
- Add auth parameter to DockerAPI.pull_image() method
- Add unit tests for Docker Hub and custom registry authentication
Fixes#6345🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Ignore protected access in test
* Fix plug-in pull test
* Fix HA core tests
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Migrate images from dockerpy to aiodocker
* Add missing coverage and fix bug in repair
* Bind libraries to different files and refactor images.pull
* Use the same socket again
Try using the same socket again.
* Fix pytest
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Send progress updates during image pull for install/update
* Add extra to tests about job APIs
* Sent out of date progress to sentry and combine done event
* Pulling container image layer
* Unify Supervisor event message functions
Unify functions which send WebSocket messages of type
"supervisor/event". This deduplicates code and hopefully avoids further
diversication in the future.
While at it, remove unused HomeAssistantWSNotSupported exception. It
seems the only place this exception is used got removed in #3317.
* Test message delivery during shutdown states
* Move read_text to executor
* switch to async_capture_exception
* Finish moving read_text to executor
* Cover read_bytes and some write_text calls as well
* Fix await issues
* Fix format_message
* Move read_text to executor
* Fix issues found by coderabbit
* formated to formatted
* switch to async_capture_exception
* Find and replace got one too many
* Update patch mock to async_capture_exception
* Drop Sentry capture from format_message
The error handling got introduced in #2052, however, #2100 essentially
makes sure there will never be a byte object passed to this function.
And even if, the Sentry aiohttp plug-in will properly catch such an
exception.
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Initialize Supervisor Core state in constructor
Make sure the Supervisor Core state is set to a value early on. This
makes sure that the state is always of type CoreState, and makes sure
that any use of the state can rely on it being an actual value from the
CoreState enum.
This fixes Sentry filter during early startup, where the state
previously was None. Because of that, the Sentry filter tried to
collect more Context, which lead to an exception and not reporting
errors.
* Fix pytest
It seems that with initializing the state early, the pytest actually
runs a system evaluation with:
Starting system evaluation with state initialize
Before it did that with:
Starting system evaluation with state None
It detects that the container runs as privileged, and declares the
system as unhealthy.
It is unclear to me why coresys.core.healthy was checked in this
context, it doesn't seem useful. Just remove the check, and validate
the state through the getter instead.
* Update supervisor/core.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Make sure Supervisor container is privileged in pytest
With the Supervisor Core state being valid now, some evaluations
now actually run when loading the resolution center. This leads to
Supervisor getting declared unhealthy due to not running in a privileged
container under pytest.
Fake the host container to be privileged to make evaluations not
causing the system to be declared unhealthy under pytest.
* Avoid writing actual Supervisor run state file
With the Supervisor Core state being valid from the very start, we end
up writing a state everytime.
Instead of actually writing a state file, simply validate the the
necessary calls are being made. This is more conform to typical unit
tests and avoids writing a file for every test.
* Extend WebSocket client fixture and use it consistently
Extend the ha_ws_client WebSocket client fixture to set Supervisor Core
into run state and clear all pending messages.
Currently only some tests use the ha_ws_client WebSocket client fixture.
Use it consistently for all tests.
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Allow adoption of existing data disk
* Fix existing tests
* Add test cases and fix image issues
* Fix addon build test
* Run checks during setup not startup
* Addon load mimics plugin and HA load for docker part
* Default image accessible in except
* Migrate to Ruff for lint and format
* Fix pylint issues
* DBus property sets into normal awaitable methods
* Fix tests relying on separate tasks in connect
* Fixes from feedback
* Bad message error marks system as unhealthy
* Finish adding test cases for changes
* Rename test file for uniqueness
* bad_message to oserror_bad_message
* Omit some checks and check for network mounts
As shown in home-assistant/operating-system#3007, error messages printed
to logs when container installation fails can cause some confusion,
because they are sometimes printed to the log on the landing page.
Adjust all wordings of "retry in" to "retrying in" to make it obvious
this happens automatically.
* Don't check if Core is running to trigger rollback
Currently we check for Core API access and that the state is running. If
this is not fulfilled within 5 minutes, we rollback to the previous
version.
It can take quite a while until Home Assistant Core is in state running.
In fact, after going through bootstrap, it can theoretically take
indefinitely (as in there is no timeout from Core side).
So to trigger rollback, rather than check the state to be running, just
check if the API is accessible in this case. This prevents spurious
rollbacks.
* Check Core status with and timeout after a longer time
Instead of checking the Core API just for response, do check the
state. Use a timeout which is long enough to cover all stages and
other timeouts during Core startup.
* Introduce get_api_state and better status messages
* Update supervisor/homeassistant/api.py
Co-authored-by: J. Nick Koston <nick@koston.org>
* Add successful start test
---------
Co-authored-by: J. Nick Koston <nick@koston.org>
* Add job group execution limit option
* Fix pylint issues
* Assign variable before usage
* Cleanup jobs when done
* Remove isinstance check for performance
* Explicitly raise from None
* Add some more documentation info
* Reduce executor code for docker
* Fix pylint errors and move import/export image
* Fix test and a couple other risky executor calls
* Fix dataclass and return
* Fix test case and add one for corrupt docker
* Add some coverage
* Undo changes to docker manager startup
* Add update freeze option
* Freeze to auto update and plugin condition
* Add tests
* Add supervisor_version evaluation
* OS updates require supervisor up to date
* Run version check during startup
* Docker events based watchdog
* Separate monitor from DockerAPI since it needs coresys
* Move monitor into dockerAPI
* Fix properties on coresys
* Add watchdog tests
* Added tests
* pylint issue
* Current state failures test
* Thread-safe event processing
* Use labels property
* Initial WS support
* test
* Update frontend to fc7c4af2
* Fix issue with closing states
* log error
* make data optional
* limit stopping states
* Move wrappers to HomeAssistantWebSocket
* use info
* Use call_soon
* Use lookuptable for WS commands
* Fix tests