supervisor

mirror of https://github.com/home-assistant/supervisor.git synced 2026-07-08 06:13:47 +01:00
Files
T
Stefan Agner 267fc6cd71 mounts: make is_mounted honest about server reachability (#6838 )
* mounts: use softerr for NFS instead of soft

Switch the NFS mount option from `soft` to `softerr`. For HAOS-style
supervisor mounts (media, share, backup — not the root filesystem) the
error semantics matter:

* `softerr` returns `ETIMEDOUT` on timeout instead of `EIO` (`soft`).
  `EIO` is indistinguishable from "the disk is dying"; tools like
  SQLite, restic, rsync, ffmpeg tend to treat it as a hard storage
  failure (mark database corrupt, abort backup with a hard error,
  etc.). `ETIMEDOUT` is unambiguously "the network/server is gone,
  transient" and is more commonly handled as retry-later. Supervisor
  can also surface a clear "server unreachable" notification rather
  than a generic I/O error.

* `softerr` was added in kernel 5.10 precisely to give the fail-fast
  behavior of `soft` with a distinct errno so well-behaved apps can
  do the right thing.

* For writes-must-not-be-lost use cases (databases, paid storage,
  evidence-grade logging) one would want `hard,intr` and a different
  recovery story. HAOS NFS mounts are not that — they're add-on
  storage where "the share went offline, try again later" is the
  correct user-visible behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* mounts: make is_mounted honest about server reachability

systemd's "active/mounted" state is derived from /proc/self/mountinfo
and doesn't reflect whether the backing server actually answers. For
CIFS in particular, smb3_reconfigure never contacts the server, so a
reload of a dead share returns active/mounted with no recovery
attempted. PR #4882 added a Path.is_mount() cross-check to catch this,
but Path.is_mount() relies on os.stat() of the mountpoint root — and
both NFS (softreval) and CIFS (cached root inode attrs) can serve that
stat from local state without going to the wire, so it lies in exactly
the dead-share scenario it was meant to detect.

The visible failure: an API reload of a backup mount whose server had
gone away "succeeded", supervisor then kicked off sys_backups.reload()
against the dead share, and the executor fanned out hundreds of
"Task exception was never retrieved" OSError(112) tracebacks as each
backup tarball's stat() parked in the kernel and failed.

Replace the local-state checks with a statvfs() probe in
NetworkMount.is_mounted(). os.statvfs() returns per-filesystem data
(free blocks, total blocks) that has no client-side cache in either
kernel — neither cifs_statfs() nor nfs_statfs() has an early-return on
cache freshness; both build and send a real FSSTAT / QUERY_FS_INFO
request. So the kernel either reaches the server or gives up with
ETIMEDOUT / EHOSTDOWN / ECONNABORTED. is_mounted() now reflects actual
reachability, finally fulfilling PR #4882's stated intent.

With is_mounted() honest, the reload/restart machinery falls out
naturally: Mount.reload()'s fast path is just `if await self.is_mounted()`,
the post-reload check is the same call, and the "reload succeeded
systemd-wise but probe failed" branch collapses into the existing
"not mounted after reload, try restart" branch. mount(), _restart()
and update() already call is_mounted(); they now get the probe for
free.

To make the probe meaningful, the network mount option strings get
explicit kernel-side timeouts:

* NFS switches from `soft` to `softerr` so timeouts surface as
  ETIMEDOUT rather than EIO. EIO is indistinguishable from
  "disk dying" and gets misinterpreted by SQLite/restic/rsync as a
  hard storage failure; ETIMEDOUT is unambiguously transient.
* CIFS gains `soft,echo_interval=10,retrans=0`, giving a ~30s
  per-operation detection budget (3 x echo_interval since last server
  response) that matches the NFS budget from `timeo=100,retrans=2`.
  Both protocols now fail bounded operations in roughly the same time.

The probe is intentionally not wrapped in an asyncio timeout: the
kernel-side bound is authoritative, and an asyncio timeout would only
orphan the executor thread without unblocking the syscall. The probe
emits debug logs with timing so the ~30s syscall wait on a dead share
is visible in LOGLEVEL=debug traces instead of appearing as a hang.

Tests:

* mock_is_mount fixture extended to also patch os.statvfs so existing
  tests that rely on a healthy mount don't need to know about the
  probe.
* New manager test split into _healthy_skips_systemd (probe succeeds,
  fast path, no systemd call) and _probe_failure_triggers_systemd_reload
  (probe fails, escalation runs). API reload test covers both paths.
* Existing tests that simulated "mount is down" via
  mock_is_mount.return_value=False updated to simulate probe failure
  via OSError(EHOSTDOWN), since is_mount is no longer the signal.

* mounts: drive systemd job waits via JobRemoved instead of state polling

`_update_state_await` had a race that has been present since network
mounts were introduced (#4269) and survived every subsequent rewrite
(#4733, c75f36305): the wait function reads the unit's ActiveState
*after* the systemd dispatch returns, then matches against a target
set that almost always includes the pre-call state (ACTIVE). If
systemd has not yet started dispatching the queued job — and for
fast operations like CIFS reload (smb3_reconfigure completes in
milliseconds with no server contact) it routinely has not — the wait
matches immediately on the pre-call state and returns "done" before
the operation has actually begun. The race was invisible until the
probe commit started doing honest network work after the wait
returned, at which point we spent a full ~30s NFS probe wait on a
mount that systemd was still in the middle of restarting.

Switch the job-dispatching paths (mount/unmount/reload/restart) to
systemd's `JobRemoved` Manager signal. The pattern:

    async with sys_dbus.systemd.job_removed() as jobs:
        job_path = await sys_dbus.systemd.restart_unit(...)
        await jobs.wait_for_job(job_path)

is structurally race-free: subscription is set up before dispatch,
the job_path is allocated synchronously inside the dispatch call,
and any JobRemoved for that path after subscription is queued. Even
for operations that complete faster than we can process the dispatch
return value (the CIFS reload case), the signal cannot be missed.

`_update_state_await` is kept for `load()` — that path observes
existing state without having dispatched a job, so the
PropertiesChanged-driven wait is the right primitive there.

* mounts: verify the path is a mount point before trusting statvfs

The userspace probe trusted statvfs() alone to declare a network mount
healthy. statvfs is uncacheable client-side for both NFS and CIFS, so
on a real mount it forces a wire RPC — but if the mount is gone from
the kernel mount table (e.g., after a restart cycle whose umount
succeeded but whose mount step failed, leaving the unit ACTIVE in
systemd's view), statvfs() operates on a plain directory on the
underlying root filesystem and happily returns the root fs's stats.
The mount is reported healthy when in fact it doesn't exist.

Add a pre-check using `Path.is_mount` — a parent-vs-path `st_dev`
comparison via stat() — to detect "this path is not actually a mount
point" before issuing statvfs. For the ghost-mount case (path on the
root fs) both stats are local and return immediately. For a real
mount the path-stat may cross into the filesystem driver, but the
result is correct in either case and the statvfs that follows
catches server-unreachable mounts that is_mount can't.

The full is_mounted contract is now: ACTIVE per systemd, present as
a mount point in our namespace, and server-reachable per statvfs.

* mounts: skip wasted probes when systemd already reported job failed

JobRemoved tells us whether the systemd job completed successfully —
we just weren't using it. The previous reload/restart paths ran their
post-job probe unconditionally, including the case where systemd had
already told us the operation failed. On a dead NFS share the kernel
is in transport-reconnect churn for tens of seconds after a killed
mount helper exits, so that probe takes 90+ seconds — and confirms
exactly what systemd already said. Plain wasted time.

* `Mount.reload()` now uses the systemd job result. On "done" we
  still probe (CIFS reload is local-only — smb3_reconfigure returns
  "done" even against a dead server, so the probe is the only check
  that catches the lying-CIFS case). On anything else (failed,
  timeout, canceled, dependency, skipped, or our own timeout
  returning None) we escalate directly to `_restart()` without
  probing.
* `Mount._restart()` skips the probe entirely on non-"done" results.
  The mount is definitively not active in that case and the probe
  would just spend another 30-90s on a dead share.

Also bump the JobRemoved wait timeout from UPDATE_STATE_TIMEOUT (40s,
sized for a single-helper invocation) to a new SYSTEMD_JOB_TIMEOUT
(90s). RestartUnit runs as stop + start, each bounded by the unit's
TimeoutSec (35s from #6834), so the worst case is ~70s plus systemd
queue dispatch. The previous 40s budget caused supervisor to time
out exactly one second before JobRemoved fired in the observed
dead-NFS restart cycle. UPDATE_STATE_TIMEOUT is kept at 40s for the
PropertiesChanged-driven wait in `load()`, where the layered timeout
invariant from #6827 still applies.

* dbus: filter signals at the wrapper level, drop JobRemovedSignal class

Mike's review on #6838: instead of carrying a bespoke JobRemovedSignal
context-manager class, push the filtering concept into the generic
DBusSignalWrapper. Any caller that subscribes to a broadcast signal
but only cares about specific payloads can pass a predicate.

* DBus.signal() gains an optional `message_filter: Callable[..., bool]`.
* DBusSignalWrapper.wait_for_signal() loops past messages where the
  filter returns False, returning the next match.
* JobRemovedSignal goes away. supervisor/dbus/systemd.py exposes a
  small factory `job_removed_filter(get_job_path)` that returns the
  matching predicate; the getter indirection lets callers subscribe
  before the dispatch returns the job path (which is required to be
  race-free — see the previous commit).

Mount._run_systemd_job() switches to:

    job_path: str | None = None
    async with systemd.connected_dbus.signal(
        DBUS_SIGNAL_SYSTEMD_JOB_REMOVED,
        job_removed_filter(lambda: job_path),
    ) as signal:
        job_path = await dispatch
        _id, _path, _unit, result = await signal.wait_for_signal()

The behavior is identical to the previous JobRemovedSignal-based
implementation. The signal queue still buffers messages received
after AddMatch is installed, so subscribing before dispatch keeps
the race-free guarantee.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>