1
0
mirror of https://github.com/home-assistant/supervisor.git synced 2026-07-02 11:25:37 +01:00

Derive App state from container state (#6890)

* Derive App state from container state

The App.state setter mixed two responsibilities: it both mutated a
private `_state` field and dispatched side effects (WebSocket events,
issue dismissal, startup_event signaling). On top of that, an installed
but never-started app stayed in AppState.UNKNOWN forever, because the
attach() image-only fallback never fires a container state-change event
and the AppState therefore kept its constructor default. Conceptually,
ContainerState.UNKNOWN ("container does not exist") and AppState.UNKNOWN
("nothing observed yet") happened to share a name but meant different
things, which made the distinction easy to lose.

Make App.state a pure derived property. The source of truth is the last
observed ContainerState (cached on the App), plus a sticky operation-
error flag for start/stop failures that the docker event stream cannot
reflect. When no container has been observed yet, the derivation falls
back to install signals: an attached instance (image present) is
STOPPED, otherwise UNKNOWN. As a side effect, an installed-but-never-
started app now correctly reports STOPPED instead of UNKNOWN.

container_state_changed updates the cached container state and routes
all side effects through a single _emit_state_change helper that diffs
old vs new derived state. The two start/stop failure paths route
through _set_operation_error. Uninstall resets the cached signals so
the derivation naturally returns UNKNOWN.

Tests use a new tests/common.force_app_state helper that pokes the
underlying signals directly; the production class no longer carries
test-only setters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix App state drive to AppState.UNKNOWN

* Unify state mutation through _update_state

Previously, state-driving signal changes were spread across two helpers
(_set_operation_error, _emit_state_change(old_state)) and required each
caller to capture self.state before mutating a private field — leaking
implementation details to call sites and raising the "why am I emitting
the old state?" question pointed out in code review.

Replace both helpers with a single _update_state(*, container_state=,
operation_error=) entry point. Callers describe what changed via
keyword arguments (None leaves a signal untouched); the helper captures
the previous state, applies the updates, recomputes the derived state
and emits side effects if anything changed.

Diff against a tracked _last_state instead of a freshly derived
"current" state, so that an out-of-band mutation between updates does
not silently shift the comparison baseline. The concrete case is
App.uninstall: instance.remove() clears the docker meta mid-flow, which
would otherwise reshape the derivation (RUNNING with no healthcheck
becomes STARTED instead of STARTUP) and suppress the STARTUP transition
that resolves the start-wait task. As a side effect, the initial
UNKNOWN -> STOPPED transition on attach is also now reliably emitted.

Switch the uninstall path to ContainerState.UNKNOWN ("we know there is
no container") rather than the constructor sentinel None.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Cache app state instead of deriving on every read

Building on the previous commit, make App.state a plain read of a
cached _state field rather than re-deriving on every property access.
The derivation moves to _derive_state(), and _update_state() is the
sole place that recomputes and assigns _state, so the value consumers
read always matches what was last emitted to listeners.

This removes the _last_state bookkeeping introduced previously: with a
single cached value there is no longer a separate "derived now" vs
"last emitted" distinction to reconcile, and out-of-band mutations
(e.g. instance.remove() clearing _meta during uninstall) can no longer
silently shift what state returns between updates.

Call _update_state() at the end of load() so the cached state settles
once attach() has run. Image-only attaches do not fire a docker event,
so without this an installed app would stay in the constructor-default
UNKNOWN until first start; this also makes the initial transition on
attach observable to listeners.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Pass operation error to _derive_state instead of storing it

The two state-driving signals were not symmetric. _container_state is
genuinely persisted state ("the last thing docker told us") that
re-derivation legitimately reads across calls. _operation_error, on the
other hand, is a momentary "force ERROR for this transition" signal; the
persistence of an error condition already lives in the cached _state.

Storing it as an instance attribute implied a sticky cross-call behavior
that no call path actually exercised: every caller either set it
explicitly right before deriving (start/stop failures, container events)
or ran argless only at load time, where no failure has occurred.

Drop the _operation_error field and pass operation_error as a parameter
to _derive_state(), defaulting to False in _update_state(). A container
observation now supersedes a prior error implicitly via the default,
which lets the container-event and uninstall call sites drop their
explicit operation_error=False.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Settle load state synchronously from current_state

The argless _update_state() settle at the end of load() raced attach()'s
container-state event. attach() fires DOCKER_CONTAINER_STATE_CHANGE via
the bus, which schedules the container_state_changed listener as a task
rather than running it inline. In the deprecated-arch early-return path
there is no await between attach() and the settle, so the listener had
not run yet: _container_state was still None and the settle derived
STOPPED (instance attached) — emitting a transient UNKNOWN->STOPPED even
for a running container before the listener corrected it. The main path
only avoided this incidentally, by having awaits (check_image,
save_persist) in between for the listener to run.

Derive the load-time state synchronously from instance.current_state()
instead of relying on the asynchronously delivered event. current_state()
returns the real container state, or UNKNOWN when only an image is
present (which derives to STOPPED), so both paths settle correctly
without racing the event.

Add a regression test that loading a running container settles to
STARTED, and mock current_state() in the state-listener test which
relies on a clean UNKNOWN baseline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Stefan Agner
2026-06-01 19:50:06 +02:00
committed by GitHub
parent b74b65a235
commit a973d22e35
9 changed files with 149 additions and 44 deletions
+68 -20
View File
@@ -148,6 +148,10 @@ class App(AppModel):
"""Initialize data holder."""
super().__init__(coresys, slug)
self.instance: DockerApp = DockerApp(coresys, self)
# Last observed container state; None until first event arrives.
self._container_state: ContainerState | None = None
# Cached app state. Updated only via ``_update_state`` so the value
# consumers see matches what was last emitted.
self._state: AppState = AppState.UNKNOWN
self._manual_stop: bool = False
self._listeners: list[EventListener] = []
@@ -176,13 +180,55 @@ class App(AppModel):
@property
def state(self) -> AppState:
"""Return state of the app."""
"""Return current state of the app."""
return self._state
@state.setter
def state(self, new_state: AppState) -> None:
"""Set the app into new state."""
if self._state == new_state:
def _derive_state(self, operation_error: bool) -> AppState:
"""Derive the app state from the cached docker signals.
Falls back to install signals when no container has been observed
yet: an attached instance (image present) is STOPPED, otherwise
UNKNOWN. ``operation_error`` forces ERROR for start/stop failures
that the docker event stream cannot reflect (e.g. the container
never came up).
"""
if operation_error:
return AppState.ERROR
match self._container_state:
case ContainerState.RUNNING:
return (
AppState.STARTUP if self.instance.healthcheck else AppState.STARTED
)
case ContainerState.HEALTHY | ContainerState.UNHEALTHY:
return AppState.STARTED
case ContainerState.STOPPED:
return AppState.STOPPED
case ContainerState.FAILED:
return AppState.ERROR
case _:
# No container observed (None) or container missing (UNKNOWN):
# fall back to install status.
return AppState.STOPPED if self.instance.attached else AppState.UNKNOWN
def _update_state(
self,
*,
container_state: ContainerState | None = None,
operation_error: bool = False,
) -> None:
"""Update the cached container state and emit a transition if it changed.
Re-derives the app state and emits side effects if it differs from
the cached state. ``container_state=None`` leaves the last observed
container state untouched. ``operation_error`` is a momentary signal
forcing ERROR for the current transition; a later container
observation supersedes it via the default.
"""
if container_state is not None:
self._container_state = container_state
new_state = self._derive_state(operation_error)
if new_state == self._state:
return
old_state = self._state
self._state = new_state
@@ -251,6 +297,7 @@ class App(AppModel):
)
with suppress(DockerError):
await self.instance.attach(version=self.version)
self._update_state(container_state=await self.instance.current_state())
return
default_image = self._image(self.data)
@@ -286,6 +333,13 @@ class App(AppModel):
self.persist[ATTR_IMAGE] = default_image
await self.save_persist()
# Settle the cached state from the attached container (UNKNOWN when
# only an image is present, which derives to STOPPED). attach() emits
# its runtime event asynchronously, so query synchronously here rather
# than racing that event during load.
with suppress(DockerError):
self._update_state(container_state=await self.instance.current_state())
def _create_missing_image_issue(self) -> None:
"""Surface a repair suggestion for a missing app image."""
self.sys_resolution.create_issue(
@@ -884,7 +938,9 @@ class App(AppModel):
_LOGGER.error("Could not remove image for app %s: %s", self.slug, err)
raise AppUnknownError(app=self.slug) from err
self.state = AppState.UNKNOWN
# The container (and possibly the image) is gone; the state derives
# back to UNKNOWN via the image-removed instance.
self._update_state(container_state=ContainerState.UNKNOWN)
await self.unload()
@@ -1208,7 +1264,7 @@ class App(AppModel):
) from err
except DockerError as err:
_LOGGER.error("Could not start container for app %s: %s", self.slug, err)
self.state = AppState.ERROR
self._update_state(operation_error=True)
raise AppUnknownError(app=self.slug) from err
self._wait_for_startup_task = self.sys_create_task(self._wait_for_startup())
@@ -1226,7 +1282,7 @@ class App(AppModel):
await self.instance.stop()
except DockerError as err:
_LOGGER.error("Could not stop container for app %s: %s", self.slug, err)
self.state = AppState.ERROR
self._update_state(operation_error=True)
raise AppUnknownError(app=self.slug) from err
@Job(
@@ -1690,22 +1746,12 @@ class App(AppModel):
await asyncio.sleep(delay)
async def container_state_changed(self, event: DockerContainerStateEvent) -> None:
"""Set app state from container state."""
"""Update cached container state and emit transitions."""
if event.name != self.instance.name:
return
if event.state == ContainerState.RUNNING:
self._manual_stop = False
self.state = (
AppState.STARTUP if self.instance.healthcheck else AppState.STARTED
)
elif event.state in [
ContainerState.HEALTHY,
ContainerState.UNHEALTHY,
]:
self.state = AppState.STARTED
elif event.state == ContainerState.STOPPED:
self.state = AppState.STOPPED
elif event.state == ContainerState.FAILED:
if event.exit_code == EXIT_CODE_SIGTERM_DEFAULT:
_LOGGER.warning(
@@ -1722,7 +1768,9 @@ class App(AppModel):
self.name,
event.exit_code,
)
self.state = AppState.ERROR
# An observed container state supersedes any prior operation error.
self._update_state(container_state=event.state)
async def watchdog_container(self, event: DockerContainerStateEvent) -> None:
"""Process state changes in app container and restart if necessary."""
+2 -1
View File
@@ -23,6 +23,7 @@ from supervisor.docker.monitor import DockerContainerStateEvent
from supervisor.exceptions import HassioError
from supervisor.store.repository import Repository
from ..common import force_app_state
from ..const import TEST_ADDON_SLUG
@@ -41,7 +42,7 @@ async def test_apps_info(
):
"""Test getting app info."""
client, root = app_api_client_with_root
install_app_ssh.state = AppState.STOPPED
force_app_state(install_app_ssh, AppState.STOPPED)
install_app_ssh.ingress_panel = True
install_app_ssh.protected = True
install_app_ssh.watchdog = False
+3 -3
View File
@@ -11,7 +11,7 @@ from supervisor.const import AppState
from supervisor.coresys import CoreSys
from supervisor.discovery import Message
from tests.common import load_json_fixture
from tests.common import force_app_state, load_json_fixture
async def test_api_discovery_forbidden(
@@ -63,7 +63,7 @@ async def test_api_list_discovery(
Message(app="local_ssh", service="adguard", config=ANY, uuid=ANY),
]
install_app_ssh.state = AppState.STARTED
force_app_state(install_app_ssh, AppState.STARTED)
resp = await api_client.get(f"{prefix}/discovery")
assert resp.status == 200
result = await resp.json()
@@ -77,7 +77,7 @@ async def test_api_list_discovery(
}
]
install_app_ssh.state = skip_state
force_app_state(install_app_ssh, skip_state)
resp = await api_client.get(f"{prefix}/discovery")
assert resp.status == 200
result = await resp.json()
+21 -5
View File
@@ -47,7 +47,7 @@ from supervisor.utils.dt import utcnow
from .test_manager import BOOT_FAIL_ISSUE, BOOT_FAIL_SUGGESTIONS
from tests.common import fire_bus_event, get_fixture_path, is_in_list
from tests.common import fire_bus_event, force_app_state, get_fixture_path, is_in_list
from tests.const import TEST_ADDON_SLUG
@@ -126,7 +126,10 @@ def test_options_merge(coresys: CoreSys, install_app_ssh: App) -> None:
async def test_app_state_listener(coresys: CoreSys, install_app_ssh: App) -> None:
"""Test app is setting state from docker events."""
with patch.object(DockerApp, "attach"):
with (
patch.object(DockerApp, "attach"),
patch.object(DockerApp, "current_state", return_value=ContainerState.UNKNOWN),
):
await install_app_ssh.load()
assert install_app_ssh.state == AppState.UNKNOWN
@@ -462,6 +465,19 @@ async def test_listeners_removed_on_uninstall(
)
@pytest.mark.usefixtures("tmp_supervisor_data", "path_extern")
async def test_load_settles_state_from_running_container(
install_app_ssh: App, container: DockerContainer
) -> None:
"""Test load derives the state from a running container, not the default."""
container.show.return_value["State"]["Status"] = "running"
container.show.return_value["State"]["Running"] = True
await install_app_ssh.load()
# State is settled synchronously from current_state(), so it reflects the
# running container immediately rather than the image-only STOPPED default.
assert install_app_ssh.state == AppState.STARTED
@pytest.mark.usefixtures("tmp_supervisor_data", "path_extern")
async def test_start(coresys: CoreSys, install_app_ssh: App) -> None:
"""Test starting an app without healthcheck."""
@@ -1257,10 +1273,10 @@ async def test_app_state_dismisses_issue(
suggestions: list[SuggestionType],
):
"""Test an app state change dismisses the issues."""
install_app_ssh.state = initial_state
force_app_state(install_app_ssh, initial_state)
coresys.resolution.add_issue(issue, suggestions)
install_app_ssh.state = target_state
force_app_state(install_app_ssh, target_state)
assert coresys.resolution.issues == []
assert coresys.resolution.suggestions == []
@@ -1270,7 +1286,7 @@ async def test_app_disable_boot_dismisses_boot_fail(
):
"""Test a disabling boot dismisses the boot fail issue."""
install_app_ssh.boot = AppBoot.AUTO
install_app_ssh.state = AppState.ERROR
force_app_state(install_app_ssh, AppState.ERROR)
coresys.resolution.add_issue(
BOOT_FAIL_ISSUE, [suggestion.type for suggestion in BOOT_FAIL_SUGGESTIONS]
)
+2 -2
View File
@@ -39,7 +39,7 @@ from supervisor.store.repository import RepositoryLocal
from supervisor.utils import check_exception_chain
from supervisor.utils.common import write_json_file
from tests.common import fire_bus_event, load_json_fixture
from tests.common import fire_bus_event, force_app_state, load_json_fixture
from tests.const import TEST_ADDON_SLUG
BOOT_FAIL_ISSUE = Issue(
@@ -219,7 +219,7 @@ async def test_app_shutdown_error(
coresys: CoreSys, install_app_ssh: App, capture_exception: Mock
):
"""Test errors captured during app shutdown."""
install_app_ssh.state = AppState.STARTED
force_app_state(install_app_ssh, AppState.STARTED)
with patch.object(DockerApp, "stop", side_effect=DockerNotFound()):
await coresys.apps.shutdown(AppStartup.APPLICATION)
+3 -3
View File
@@ -43,7 +43,7 @@ from supervisor.mounts.mount import Mount
from supervisor.resolution.const import UnhealthyReason
from supervisor.utils.json import read_json_file, write_json_file
from tests.common import get_fixture_path
from tests.common import force_app_state, get_fixture_path
from tests.const import TEST_ADDON_SLUG
from tests.dbus_service_mocks.base import DBusServiceMock
from tests.dbus_service_mocks.systemd import Systemd as SystemdService
@@ -1255,7 +1255,7 @@ async def test_restore_progress(
container.show.return_value["State"]["Status"] = "running"
container.show.return_value["State"]["Running"] = True
install_app_ssh.path_data.mkdir()
install_app_ssh.state = AppState.STARTED
force_app_state(install_app_ssh, AppState.STARTED)
await coresys.core.set_state(CoreState.RUNNING)
coresys.hardware.disk.get_disk_free_space = lambda x: 5000
@@ -1387,7 +1387,7 @@ async def test_restore_progress(
container.show.return_value["State"]["Status"] = "stopped"
container.show.return_value["State"]["Running"] = False
install_app_ssh.state = AppState.STOPPED
force_app_state(install_app_ssh, AppState.STOPPED)
app_backup: Backup = await coresys.backups.do_backup_partial(apps=["local_ssh"])
ha_ws_client.async_send_command.reset_mock()
+39 -1
View File
@@ -12,8 +12,10 @@ from typing import Any, Self
from dbus_fast.aio.message_bus import MessageBus
from supervisor.const import BusEvent
from supervisor.apps.app import App
from supervisor.const import AppState, BusEvent
from supervisor.coresys import CoreSys
from supervisor.docker.const import ContainerState
from supervisor.jobs.decorator import Job
from supervisor.resolution.validate import get_valid_modules
from supervisor.utils.yaml import read_yaml_file
@@ -21,6 +23,42 @@ from supervisor.utils.yaml import read_yaml_file
from .dbus_service_mocks.base import DBusServiceMock
def force_app_state(app: App, state: AppState) -> None:
"""Drive an app's derived state to ``state`` by setting underlying signals.
The ``App.state`` value is derived from the last observed container
state and a momentary operation-error signal. Tests sometimes need a
specific AppState as setup without spinning up real Docker events;
this helper maps each AppState back to plausible signals and routes
them through the normal state update path.
"""
# pylint: disable=protected-access
container_state: ContainerState | None = None
operation_error = False
match state:
case AppState.UNKNOWN:
# The derivation falls back to STOPPED when ``instance.attached``
# is true; clear the docker metadata so the helper is
# deterministic regardless of prior fixture setup.
app.instance._meta = None
container_state = ContainerState.UNKNOWN
case AppState.STOPPED:
container_state = ContainerState.STOPPED
case AppState.STARTED:
container_state = ContainerState.HEALTHY
case AppState.STARTUP:
container_state = ContainerState.RUNNING
# STARTUP only resolves from RUNNING when the container has a
# healthcheck configured; ensure one is present in the mocked
# container metadata.
meta = app.instance._meta or {}
meta.setdefault("Config", {})["Healthcheck"] = {"Test": ["CMD", "true"]}
app.instance._meta = meta
case AppState.ERROR:
operation_error = True
app._update_state(container_state=container_state, operation_error=operation_error)
async def fire_bus_event(coresys: CoreSys, event: BusEvent, data: Any) -> None:
"""Fire a bus event and await its listener tasks.
@@ -14,6 +14,7 @@ from supervisor.resolution.const import ContextType, IssueType, SuggestionType
from supervisor.resolution.data import Issue, Suggestion
from supervisor.resolution.fixups.app_execute_restart import FixupAppExecuteRestart
from tests.common import force_app_state
from tests.const import TEST_ADDON_SLUG
DEVICE_ACCESS_MISSING_ISSUE = Issue(
@@ -29,12 +30,12 @@ EXECUTE_RESTART_SUGGESTION = Suggestion(
@pytest.mark.usefixtures("path_extern")
async def test_fixup(coresys: CoreSys, install_app_ssh: App):
"""Test fixup restarts app."""
install_app_ssh.state = AppState.STARTED
force_app_state(install_app_ssh, AppState.STARTED)
app_execute_restart = FixupAppExecuteRestart(coresys)
assert app_execute_restart.auto is False
async def mock_stop(*args, **kwargs):
install_app_ssh.state = AppState.STOPPED
force_app_state(install_app_ssh, AppState.STOPPED)
coresys.resolution.add_issue(
DEVICE_ACCESS_MISSING_ISSUE,
@@ -59,7 +60,7 @@ async def test_fixup_stop_error(
coresys: CoreSys, install_app_ssh: App, caplog: pytest.LogCaptureFixture
):
"""Test fixup fails on stop app failure."""
install_app_ssh.state = AppState.STARTED
force_app_state(install_app_ssh, AppState.STARTED)
app_execute_start = FixupAppExecuteRestart(coresys)
coresys.resolution.add_issue(
@@ -83,7 +84,7 @@ async def test_fixup_start_error(
coresys: CoreSys, install_app_ssh: App, caplog: pytest.LogCaptureFixture
):
"""Test fixup logs a start app failure."""
install_app_ssh.state = AppState.STARTED
force_app_state(install_app_ssh, AppState.STARTED)
app_execute_start = FixupAppExecuteRestart(coresys)
coresys.resolution.add_issue(
@@ -14,6 +14,7 @@ from supervisor.resolution.data import Suggestion
from supervisor.resolution.fixups.app_execute_start import FixupAppExecuteStart
from tests.apps.test_manager import BOOT_FAIL_ISSUE
from tests.common import force_app_state
EXECUTE_START_SUGGESTION = Suggestion(
SuggestionType.EXECUTE_START, ContextType.ADDON, reference="local_ssh"
@@ -26,12 +27,12 @@ EXECUTE_START_SUGGESTION = Suggestion(
@pytest.mark.usefixtures("path_extern")
async def test_fixup(coresys: CoreSys, install_app_ssh: App, state: AppState):
"""Test fixup starts app."""
install_app_ssh.state = AppState.UNKNOWN
force_app_state(install_app_ssh, AppState.UNKNOWN)
app_execute_start = FixupAppExecuteStart(coresys)
assert app_execute_start.auto is False
async def mock_start(*args, **kwargs):
install_app_ssh.state = state
force_app_state(install_app_ssh, state)
coresys.resolution.add_issue(
BOOT_FAIL_ISSUE,
@@ -52,7 +53,7 @@ async def test_fixup(coresys: CoreSys, install_app_ssh: App, state: AppState):
@pytest.mark.usefixtures("path_extern")
async def test_fixup_start_error(coresys: CoreSys, install_app_ssh: App):
"""Test fixup fails on start app failure."""
install_app_ssh.state = AppState.UNKNOWN
force_app_state(install_app_ssh, AppState.UNKNOWN)
app_execute_start = FixupAppExecuteStart(coresys)
coresys.resolution.add_issue(
@@ -76,11 +77,11 @@ async def test_fixup_wait_start_failure(
coresys: CoreSys, install_app_ssh: App, state: AppState
):
"""Test fixup fails if app does not complete startup."""
install_app_ssh.state = AppState.UNKNOWN
force_app_state(install_app_ssh, AppState.UNKNOWN)
app_execute_start = FixupAppExecuteStart(coresys)
async def mock_start(*args, **kwargs):
install_app_ssh.state = state
force_app_state(install_app_ssh, state)
coresys.resolution.add_issue(
BOOT_FAIL_ISSUE,