The deprecated-key-path option is no longer handled, but it doesn't cause
problems because the key is explicitly ignored. It was completely removed in
Docker 19.03.0 [1].
As such, the option and the pre-start script to fix the corrupted key.json can
be removed now, as it has no effect, only printing confusing message when
Docker service fails to start.
[1] 98fc09128b
Prefer the containerd snapshotter by using it by default for new installs and
when no Docker data is present (e.g. after datadisk wipe). The snapshotter is
enabled by a dockerd flag which is set when a flag file is present in the data
partition. This flag file can be used also to opt-in for this snapshotter on
legacy installs (high level API through OS Agent and Supervisor TBD), to
migrate to the containerd snapshotter this file can be simply created manually.
Testing shown no major problems when migrating, the old overlay2 folder can be
(and should be - to avoid situations where the data disk might run out of
space) deleted before the docker.service is started in the docker-prepare
script.
Note that there's no offline migration path, OS needs to be connected to the
internet to re-download the images when migrating. This could be theoretically
possible through docker image save/load functions but guarding for enough of
space and other edge cases would be probably too complex to justify it.
Refs #4252
Refs #4253 - easier opt-in method is still needed
Closes#4254 - migration is handled seamlessly by Docker
* Improve UX of HA CLI wrapper and emergency console
For many users, the emergency console gives feeling that the system is
completely broken. However, there are various cases when the system just takes
just a bit longer to start up and the emergency message is shown, while it
finishes a proper startup shortly after. This change tries to improve the UX in
several ways:
* The limit before a forced emergency console startup is changed to 3 minutes
* Waiting can be interrupted with Ctrl+C (reset counter is cleared then)
* Some hints what to check have been added before starting the shell
* Also, because if the HA CLI failed for 5 times in a row in quick succession,
the CLI startup was then not retried anymore and user may have been left with
a black screen, the restart limits timeouts have been adjusted only to back
off and never mark the unit as failed
Closes#4273
* Use /bin/sh and printf to silence linter errors
Use the --cidfile Docker CLI argument when starting the container and
bind-mount the generated file containing full ID of the container to the
container itself.
Using --mount instead of --volume is needed, as --volume is racy and creates
empty directory volume at the destination path instead.
This is prerequisite for home-assistant/supervisor#6006 but can come handy for
other cases too.
This reverts commit 22fe9b19ee.
There are major issues when OS has no internet connectivity - in such case the
script doesn't go the expected happy path after the rework and eventually
removes the Docker image, essentially bricking offline installations.
Since there is no immediate benefit for HAOS and such change turns out to be
high risk considering the planned release, leave it to be implemented later.
This knob controls whether Linux throws away its congestion
window (cwnd) after a connection has been idle for at least one
retransmission timeout (RTO). With a value of 0, Linux keeps the
cwnd it had before the idle period and can send that amount
immediately when the application resumes writing (still bounded
by the receiver's advertised window and by pacing).
With slow start after idle enabled (the default), Linux allows
only about 10 MSS (~14 KiB) in the first burst after idle. Even
when a connection stays open to web clients, a short idle forces
multiple round trips to ramp back up.
On Wi-Fi, local connections often have very low RTTs, which drives
the RTO down. Between page navigations the connection is considered
idle by Linux. If the next request happens during a transient
latency spike on the Wi-Fi link, the sender starts with a tiny
cwnd and must grow it over many RTTs, so the spike causes outsized
and visible loading delays.
For devices behind typical Internet uplinks, the higher RTT makes
the initial ramp-up feel even slower until the window regains size.
However, here the connection does take longer to drop to idle, for
Linux standards. So the connection is less likely to be considered
idle between navigations.
This change does not affect flows with very small receive windows
(e.g. many microcontrollers), which are limited by the peer's
advertised window rather than the sender's cwnd.
Example RTOs on low jitter, low loss connections:
Defaults:
TCP_RTO_MIN = 200 ms
TCP_RTO_MAX = 120 s
low-jitter path so rttvar_us = 200 ms
HZ = 1000 or 250 or 100 (depending on the kernel settings)
*31 ms average RTT*
- SRTT ≈ 31 ms; RTTVAR ≈ 200 ms → Sum = 231 ms
- 'usecs_to_jiffies(231000)' = 231 jiffies (HZ 1000) -> RTO ≈ 231 ms
- If 'HZ = 250' (4 ms tick), ceil(231/4)=58 jiffies -> 232 ms RTO
- If 'HZ = 100' (10 ms tick), ceil(231/10)=23 jiffies -> 240 ms RTO
*178 ms average RTT*
- HZ=1000 (1 ms tick): 378 ms RTO
- HZ=250 (4 ms tick): ceil(378/4)=95 -> 380 ms RTO
- HZ=100 (10 ms tick): ceil(378/10)=38 -> 380 ms RTO
*292 ms average RTT*
- HZ=1000 (1 ms tick): 492 ms RTO
- HZ=250 (4 ms tick): ceil(492/4)=123 -> 492 ms RTO
- HZ=100 (10 ms tick): ceil(492/10)=50 -> 500 ms RTO
Any loss or jitter will increase those RTO values.
Set net.ipv4.tcp_thin_linear_timeouts=1 to switch retransmission
timeout (RTO) backoff from exponential to linear for 'thin' TCP flows.
This reduces tail latency for API-style connections that typically have
very few packets in flight, improving recovery from sporadic loss without
changing anything for larger TCP transfers.
Kernel definition: A flow is considered thin when 'tp->packets_out < 4'
and while not in the initial slow start.
See tcp_stream_is_thin(tp) in include/net/tcp.h.
Increase the BlueZ temporary device timeout from the default 30s to 195s.
This prevents devices from being removed from D-Bus during connection
retries, especially when multiple connection attempts are queued.
The 195s timeout aligns with Home Assistant's Bluetooth stack behavior
for ESPHome proxies and prevents the 'device removal spiral' that occurs
when devices timeout during sequential connection attempts.
To make system timezone configurable, we need to have /etc/localtime
writable, and it must be possible to atomically create a symlink from
this place, which means the whole parent folder must be writable. We
don't have /etc writable and can't use the usual bind mount for this.
Latest Systemd v258 has patch that allows setting an environment
variable that sets where the localtime should be written. This can be
persisted in the overlay partition, with a symlink from /etc/localtime
leading there, finally pointing to the actual zoneinfo file. If the
symlink doesn't exist, create it by hassos-overlay script (it's not
really needed as UTC is the default, but Systemd does the same if you
change from non-UTC timezone back to UTC).
Also disable BR2_TARGET_LOCALTIME, so /etc/localtime and /etc/timezone
(the latter is only informative and non-standard) are not written by the
tzdata package build.
The ip6tables configuration is now enabled by default since Docker 27
(see https://github.com/moby/moby/pull/47747). The experimental config
got introduced with the ip6tables flag in #2051. There is no other
experimental feature used from what I am aware of, so lets remove the
experimental flag as well.
Bind-mount Systemd Journal socket to the Supervisor container. This way
Supervisor can use the socket directly for writing log entries using the
Systemd native Journal protocol [1] instead of logging to stderr of the
container.
[1] https://systemd.io/JOURNAL_NATIVE_PROTOCOL/
As the reported MemTotal can fluctuate a bit on some systems, e.g. because the
reserved memory changes between kernel version or other factors affect it like
VRAM, the swap file can be recreated unnecessarily between boots. Allow for
some fluctuation (up to +-32MB) before the swapfile is recreated.
This was a problem already before the recent haos-swapfile changes, however,
before it checked if the existing swapfile isn't smaller than the desired
value. If the MemTotal fluctuated there, the swapfile size eventually settled
on the highest value seen and it wasn't recreated anymore. With this change,
things should be stable even more.
In some cases, the wipe service may be called due to a race condition for the
second time during the boot, very likely failing because the filesystems are
already mounted. This can not be reproduced on OVA but can be fairly easy
triggered e.g. on RPi. As we want the service to be executed exactly only once,
we can do what's suggested in [1] and set the RemainAfterExit=yes. That should
ensure the unit is not ever started for the second time.
[1] https://www.github.com/systemd/systemd/issues/29367
Use simple shell script to perform device wipe instead of calling OS Agent to
do that through the UDisks2 API. While it might have been a good idea to use
high level interface for that back then, it turns out it causes more issues
than the benefits it could bring.
Main problem currently is that the OS Agent needs to read sysctl variables, but
those are only set after mounting the overlay partition. But at the same time,
the overlay partition can't be mounted if we want to wipe it - this creates a
dependency cycle through the haos-agent.service.
To get rid of the cycle and simplify things, use a shell script doing basically
the same what the OS Agent does. Since the wipe functionality only makes sense
to be implemented on HAOS targets (not on Supervised), there's little point of
having it in higher layer of abstraction that OS Agent provides.
It should be also checked if changes from #1291 are needed anymore, as the
driving factor for those have been probably the wipe feature in OS Agent too,
but at this point they seem to be harmless.
Allow configuration of the swap size via /etc/default/haos-swapfile file. By
setting the SWAPSIZE variable in this file, swapfile get recreated on the next
reboot to the defined size. Size can be either in bytes or with optional units
(B/K/M/G, accepting some variations but always interpreted as power of 10). The
size is then rounded to 4k block size. If no override is defined or the value
can't be parsed, it falls back to previously used 33% of system RAM.
Fixes#968
* Move rauc.db to boot partition
The RAUC metadata file contains information that is tightly related to the
system and kernel partitions. With the possibility to migrate data disk, the
rauc.db can contain bogus information when moved to a different system. Removal
of the file on "device wipe" is also not desirable, because the information
about slot status is lost.
Relocate the rauc.db to the boot partition after a system upgrade (as this
can't be handled by RAUC hooks, because it needs to be executed after all slots
and metadata is written) and adjust the script for recreating it. The downside
is that its content in /mnt/data would be recreated if the boot slot is changed
or system downgraded but this should be handled quite gracefully.
Also remove the raucdb-first-boot service which is no longer necessary
with the file not present in the data partition.
* Fix shellcheck and mount path
The /etc/usb_modeswitch.d is present and empty but it can't be written to allow
user modification. Bind-mount it like other /etc folders to make it possible to
adjust usb_modeswitch config.
Fixes#3785
If data disk is adopted on Yellow using the mechanism added in #3686, it
contains RAUC version information that is very likely invalid. In such case,
remove the file on first boot and have it recreated by the raucdb-update
service.
If HAOS on Yellow is booted for the first time with NVMe data disk present, it
should be preferred over the empty eMMC data partition. This will ease
reinstall of the system and migration from CM4 to CM5. All other data disks
(e.g. if a USB drive is used for them) are still treated as before, requiring
manual adoption using the Supervisor repair.
The timeout of 90s was introduced before it was ensured that the timesync
systemd unit starts after network is online. Now with that, it makes less sense
to wait that long - if network is unreachable at the point the time
synchronization starts, and the server fails to reply on the first sync, the
polling interval is exponentially increased and the benefit of waiting for more
attempts is doubtful.
Since another synchronization attempt is done after network changes its state,
we should rely on that instead of having the 90 seconds interval as a waiting
period for plugging the network cable. Worst case, there are other mechanisms
that should set the time to a reasonably accurate value, making the NTP sync
less importart for most of the cases.
* Relocate HAOS Systemd drop-ins to /usr/lib/systemd
With some exceptions, Systemd drop-ins overriding default unit configuration
have been placed to `/etc/systemd/system`. This is meant for user overrides of
those, or per `man 5 systemd.unit` for "system unites created by the
administrator". Relocate all of these to `/usr/lib/systemd` which should be
used as path for units "installed by the distribution package manager" which is
closer to what we're trying to achieve.
This will make it easier to detect changes to unit files once we enable the
possibility to edit the content of /etc.
* Patch systemd-timesyncd.service instead of replacing it fully
Reduce verbosity from deactivated Docker mounts, triggered by the Docker
healthcheck. These messages do not carry any value for us and logs supplied by
users are often spammed mostly with these. Moreover, they sometimes cause
confusion that something is wrong, see for example #3021.
Unfortunately, it's not possible to use LogFilterPatterns= here, because it's
not applied to these messages, as explicitly said in the docs:
Filtering is based on the unit for which LogFilterPatterns= is defined
meaning log messages coming from systemd(1) about the unit are not taken
into account.
runc 1.2.0 supposedly should fix this, but it's unclear when it would be
available, so let's stick to this solution (reducing verbosity from debug to
notice for all units `run-docker-*.mount`) for the time being.
RAUC currently doesn't know the version of the booted slot when booted for the
first time or after wiping the data partition. As a result `ha os info` is
missing this information too.
As there's no built-in mechanism for generating these data by RAUC itself, add
a oneshot service that checks if the boot slot information is contained in the
rauc.db and if not, then generate it.
RAUC seems to cope quite well even with bogus data contained in rauc.db but in
any case, a test has been added to check that everything works as expected.
* Only run HA CLI interactively if stdout is a terminal
Flags for running HA CLI commands in an interactive shell added in #3238
cause the command to fail if the process is not running in a terminal.
This is needed for example for the fsfreeze hook, otherwise the command
fails, as seen in this trace when the hook is executed:
-----------
+ '[' thaw '=' freeze ]
+ '[' thaw '=' thaw ]
+ echo 'File system thaw requested, thawing Home Assistant'
File system thaw requested, thawing Home Assistant
+ ha backups thaw
the input device is not a TTY
------------
However, for example on Proxmox this message is not logged anywhere and
the hook just fails silently (i.e. it doesn't cause the backup to fail).
Fixes#3251
* Use -i also when not running in a terminal
CP15 barrier instruction emulation only exists on arm64 architecture.
Avoid sysctl writing an error to the journal when the setting doesn't
exist by prepending a dash.
Use -i (--interactive) and -t (--tty) to start the HA CLI interactively.
This is required by some commands like the new device wipe command added
with https://github.com/home-assistant/cli/pull/464.
* Add initial Raspberry Pi 5 buildroot config
* Add machine-id support via cmdline.txt
* Add new entry if entry is missing
* Don't overwrite cmdline.txt when adding machine-id
Use sed to append the new cmdline parameter to the first line.
* Skeleton script for RAUC custom bootloader interface
* Deploy kernel/device-tree into a RAUC slot specific directory
This allows us to use the os_prefix feature to switch between slot A and
B. Compared to the boot_partition option, this option allows to use a
shared config.txt and cmdline.txt, which makes it more like how HAOS
currently works on other Raspberry Pis.
* Deploy new kernel/device-tree to correct slot on installation
* Increase boot size to 128MB
This makes sure we can store up to three kernels (slot A, B and an
temporary one while installing the OTA update).
* Initial tryboot implementation using os_prefix
* Make sure to delete the old slot completely
* Add Busybox xargs for tryboot bootloader script
* Compare tryboot bootloader file silently
* Revert "Increase boot size to 128MB"
This reverts commit 7f2c69b58f02f500d6aeee4f0a419046899b5e38.
* Use compressed kernel
* Address shellcheck
* Address shellcheck issue in rauc-hook
* Fix shellcheck for rpi-tryboot.sh
* Do not follow source - it gets checked separately
* Correctly set the slot to boot
* Apply suggestions from code review
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Drop serial console from default cmdline.txt
* Resync rpi5_64_defconfig with rpi4_64_defconfig
* Improve machine-id match
Only match actual hexadecimal characters.
* Deploy firmware overlays to OS prefix directory
* Add Raspberry Pi 5 to documentation
* Bump buildroot
* buildroot fd1dc86f40...f13ad03408 (1):
> linux: add in-tree device tree overlay support
* Install device tree overlays from Kernel sources
* Drop RPi RF modules for now
No Raspberry Pi 5 specific device tree overlays are available, drop RPi
RF mod for now.
* Use Raspberry 5 specific identifiers for Supervisor/OS Agent
* Bump buildroot
* buildroot f13ad03408...07e08e01b2 (1):
> linux: fix add in-tree device tree overlay support
* Revert "Drop RPi RF modules for now"
This reverts commit 46fc1701e4.
---------
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Fix Supervisor image corruption detection
When multiple images match the reference, multiple IDs are passed as a
single argument to docker image rm, leading to an error:
Error response from daemon: page not found
Make sure to pass the ids as separate argument to make the delete work
in any case.
* Cleanup reusing Supervisor from an old/unused reference
As noted in #2113, we don't need this logic anymore after a major OS
releases. So simply drop the logic to also make the image corruption
detection work again.
* Make sure image IDs are sorted to make them unique
With the move to Docker 23 containerd stores its metadata no longer
undernath the Docker data directory but at its default location at
/var/lib/containerd. Previously Docker passed a containerd configuration
toml file which explicitly set the metadata root underneath Docker's
data directory.
On Home Assistant OS, the new location /var/lib/containerd is on a tmpfs
file system. For unknown reasons, it seems that if containerd's root
directory is on a tmpfs this leads to significantly more syscalls and
hence CPU load.
Change the metadata location to be on the data partition again. Since
containerd is treated separately from Docker these days, use a new
root directory under /mnt/data for containerd as well. With this, the
CPU load of containerd is back to normal.
* Bump buildroot
* buildroot a1bdf74b19...f125c3e292 (1):
> package/containerd: add control for additional build tags
* Drop unnecessary containerd changes
Now that the snappshotter and the CRI plug-ins are disabled we don't
need to configure or disable them via configuration anymore. Drop the
unnecessary configs.
* Add fsfreeze support for QEMU/KVM/Proxmox installations
Add fsfreeze scripts which calls the new Supervisor API to freeze Home
Assistant Core and add-ons which support the backup freeze scripts
(`backup_pre` and `backup_post`).
This allows to create safe snapshots with databases running.
* Fix lint issues
* Download latest stable Supervisor after device wipe
Currently we download the latest tag after a device wipe, which gives us
the latest Supervisor (which quite likely can be a development version).
Use the stable version file instead to get the tag to be used to
download the Supervisor.
* Delete potentially corrupted updater info
Pull in the swapfile creation service haos-swapfile.service when
swap.target is reached. This makes sure the service is started even when
other targets are used (e.g. rescue.target).
* Delete Bluetooth device cache regularly
Delete stale Bluetooth devices from the BlueZ device cache every week.
This makes sure that the overlay partition doesn't run out of inodes
which has happened in real world scenarios where many new Bluetooth
devices are discovered.
BlueZ maintains these files on a best effort base. So removing them
while BlueZ is running should be safe.
An alternative considered was to lower BlueZ GATT caching (e.g. by
using Cache=yes instead of always, to cache only paired devices).
However, this would hurt performance and battery lifetime of Bluetooth
devices due to additional unnecessary GATT attributes reads. This is in
particular true for Bluetooth 5.1 devices which support the Database
Hash charactristic. Caching has also helped reliability with
intermittent connections (see
https://github.com/bluez/bluez/issues/191).
More importantly, besides the GATT attribute cache the same files are
also used to cache the device names as well. This is independent of the
above mentioned GATT cache configuration (see device_store_cached_name
in BlueZ). So disabling the GATT caching alone wouldn't solve the
particular problem we are facing.
See also: https://github.com/home-assistant/supervisor/issues/4490
* Use access timestamp instead of modification timestamp
The modification timestamp gets updated regularly (on each connect) it
seems. However, using access timestamp might be more accurate, as it
seems to preserves slightly more cache files. This additional devices
might be devices we don't regularly connect but are still around (and
therefor we shouldn't reread the GATT attributes regularly).
So deleting cache entries with access time older than 7 days. Which
essentially deletes all the entries of devices which haven't been seen
the last 7 days.
In case a system takes a bit longer to boot (e.g. due to SWAP
initialization on first boot, especially on a system with lots of memory
and not very fast strage, e.g. an ODROID-M1 using an SD card) we might
time-out waiting for time synchronization before the time
synchronization service even got started. By ordering the
systemd-time-wait-sync.service after the network is online, the timeout
of this service should be started much later. With that the
systemd-time-wait-sync.service shouldn't timeout any longer.
To read the current LED configuration correctly /mnt/boot is required.
This change makes sure that the boot partition is mounted when the OS
Agent starts.
* Enable Multi-Gen LRU
Multi-Gen LRU should improve performance under memory pressure. This is
especially useful for embedded platforms where memory is scarce.
* Add service to configure Multi-Gen LRU
Use min_ttl_ms of 1 which is the least aggressive in terms of lag. Since
we are a server application, we can tune trashing prevention with a
higher acceptable lag.