During system shutdown (reboot/poweroff), the watchdog was incorrectly
detecting the Home Assistant Core container as failed and attempting to
restart it. This occurred because Docker was stopping all containers in
parallel with Supervisor's own shutdown sequence, causing the watchdog
to trigger while add-ons were still being stopped.
This led to an abrupt termination of Core before it could cleanly shut
down its SQLite database, resulting in a warning on the next startup:
"The system could not validate that the sqlite3 database was shutdown
cleanly".
The fix registers a supervisor state change listener that unregisters
the watchdog when entering any shutdown state (SHUTDOWN, STOPPING, or
CLOSE). This prevents restart attempts during both user-initiated
reboots (via API) and external shutdown signals (Docker SIGTERM,
console reboot commands).
Since SHUTDOWN, STOPPING, and CLOSE are terminal states with no reverse
transition back to RUNNING, no re-registration logic is needed.
Fixes#6511
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Catch ValueError exceptions with "Cannot update a job that is done"
during image pull progress updates. This error occurs intermittently
when progress events arrive after a job has completed. It is not clear
why this happens, maybe the job gets prematurely marked as done or
the pull events arrive in a different order than expected.
Rather than failing the entire pull operation, we now:
- Log a warning with context (layer ID, status, progress)
- Send the error to Sentry for tracking and investigation
- Continue with the pull operation
This prevents pull failures while gathering information to help
identify and fix the root cause of the race condition.
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The test was failing intermittently in CI because concurrent async
operations in asyncio.gather() were getting slightly different
timestamps (microseconds apart) despite being inside a time_machine
context.
When test2.execute() calls were timestamped at start+2ms due to async
scheduling delays, they weren't cleaned up in the final test block
(cutoff = start+1ms), causing a false rate limit error.
Fix by using tick=False to completely freeze time during the gather,
ensuring all 4 calls get the exact same timestamp.
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Check frontend availability after Home Assistant Core updates
Add verification that the frontend is actually accessible at "/" after core
updates to ensure the web interface is serving properly, not just that the
API endpoints respond.
Previously, the update verification only checked API endpoints and whether
the frontend component was loaded. This could miss cases where the API is
responsive but the frontend fails to serve the UI.
Changes:
- Add check_frontend_available() method to HomeAssistantAPI that fetches
the root path and verifies it returns HTML content
- Integrate frontend check into core update verification flow after
confirming the frontend component is loaded
- Trigger automatic rollback if frontend is inaccessible after update
- Fix blocking I/O calls in rollback log file handling to use async
executor
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Avoid checking frontend if config data is None
* Improve pytest tests
* Make sure Core returns a valid config
* Remove Core version check in frontend availability test
The call site already makes sure that an actual Home Assistant Core
instance is running before calling the frontend availability test.
So this is rather redundant. Simplify the code by removing the version
check and update tests accordingly.
* Add test coverage for get_config
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Add route_metric attribute to IpProperties class
Signed-off-by: David Rapan <david@rapan.cz>
* Refactor dbus setting IP constants
Signed-off-by: David Rapan <david@rapan.cz>
* Add route metric
Signed-off-by: David Rapan <david@rapan.cz>
* Merge test_api_network_interface_info
Signed-off-by: David Rapan <david@rapan.cz>
* Add test case for route metric update
Signed-off-by: David Rapan <david@rapan.cz>
---------
Signed-off-by: David Rapan <david@rapan.cz>
* Migrate all docker container interactions to aiodocker
* Remove containers_legacy since its no longer used
* Add back remove color logic
* Revert accidental invert of conditional in setup_network
* Fix typos found by copilot
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Revert "Apply suggestions from code review"
This reverts commit 0a475433ea.
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Improve Supervisor startup wait logic in CI workflow
The 'Wait for Supervisor to come up' step was failing intermittently when
the Supervisor API wasn't immediately available. The original script relied
on bash's lenient error handling in command substitution, which could fail
unpredictably.
Changes:
- Use curl -f flag to properly handle HTTP errors
- Use jq -e for robust JSON validation and exit code handling
- Add explicit 5-minute timeout with elapsed time tracking
- Reduce log noise by only reporting progress every 15 seconds
- Add comprehensive error diagnostics on timeout:
* Show last API response received
* Dump last 50 lines of Supervisor logs
- Show startup time on success for performance monitoring
This makes the CI workflow more reliable and easier to debug when the
Supervisor fails to start.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Use YAML anchor to deduplicate wait step in CI workflow
The 'Wait for Supervisor to come up' step appears twice in the
run_supervisor job - once after starting and once after restarting.
Use a YAML anchor to define the step once and reference it on the
second occurrence.
This reduces duplication by 28 lines and makes future maintenance
easier by ensuring both wait steps remain identical.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add more syslog identifiers (most importantly containerd) extracted from real
systems that were missing in the list. This should make the host logs contain
the same events as journalctl logs, minus audit logs and Docker container logs.