Remove net.ipv6.conf.all.forwarding=1 from 60-otbr-ip-forward.conf
and rely on Docker to enable IPv6 forwarding instead, just as we
already rely on it for IPv4 forwarding (needed for NAT64 in OTBR).
When this sysctl was added (d9ec60316), Docker did not enable IPv6 by
default. Since Docker 27 (April 2024), IPv6 support — including
ip6tables — is enabled by default, and Docker enables IPv6 forwarding
at startup just like it does for IPv4.
Importantly, when Docker enables forwarding itself (rather than finding
it already on), it also sets the FORWARD chain policy to DROP as a
safety measure, Pre-enabling the sysctl prevents this, leaving the IPv6
FORWARD chain at ACCEPT. By removing our sysctl, we get the same
protective DROP policy for IPv6 that we already benefit from for IPv4.
For users having non-English, and especially non-qwerty layouts, using the host
shell can be very awkward. There was no option to change the keymaps as they
haven't been installed in the OS, and the persistence couldn't have been
achieved because of read-only /etc.
With upstream patch merged in #4224, we have an option to put
/etc/vconsole.conf to a writable location and use the same approach as in the
timezone PR. This is needed because even if we only bind-mounted the file from
the overlay directory, the Systemd services which start early will still refer
to the inode on the read-only FS. Also, gzip is required as current version of
kbd in Buildroot (v2.6.4) always compresses the keymaps using gzip. We can get
rid of this after we bump to kbd v2.9.0 [1] or newer. The overall bloat in
local build of the OS is slightly over 1 MiB, so it is acceptable.
With these changes, the `localectl set-keymap` command can be used to use any
available keymap from the installed `kbd` package (refer to `localectl
list-keymaps` for complete lists) and persist it between reboots.
[1] https://github.com/legionus/kbd/releases/tag/v2.9.0Fixes#1775
The deprecated-key-path option is no longer handled, but it doesn't cause
problems because the key is explicitly ignored. It was completely removed in
Docker 19.03.0 [1].
As such, the option and the pre-start script to fix the corrupted key.json can
be removed now, as it has no effect, only printing confusing message when
Docker service fails to start.
[1] 98fc09128b
Prefer the containerd snapshotter by using it by default for new installs and
when no Docker data is present (e.g. after datadisk wipe). The snapshotter is
enabled by a dockerd flag which is set when a flag file is present in the data
partition. This flag file can be used also to opt-in for this snapshotter on
legacy installs (high level API through OS Agent and Supervisor TBD), to
migrate to the containerd snapshotter this file can be simply created manually.
Testing shown no major problems when migrating, the old overlay2 folder can be
(and should be - to avoid situations where the data disk might run out of
space) deleted before the docker.service is started in the docker-prepare
script.
Note that there's no offline migration path, OS needs to be connected to the
internet to re-download the images when migrating. This could be theoretically
possible through docker image save/load functions but guarding for enough of
space and other edge cases would be probably too complex to justify it.
Refs #4252
Refs #4253 - easier opt-in method is still needed
Closes#4254 - migration is handled seamlessly by Docker
* Improve UX of HA CLI wrapper and emergency console
For many users, the emergency console gives feeling that the system is
completely broken. However, there are various cases when the system just takes
just a bit longer to start up and the emergency message is shown, while it
finishes a proper startup shortly after. This change tries to improve the UX in
several ways:
* The limit before a forced emergency console startup is changed to 3 minutes
* Waiting can be interrupted with Ctrl+C (reset counter is cleared then)
* Some hints what to check have been added before starting the shell
* Also, because if the HA CLI failed for 5 times in a row in quick succession,
the CLI startup was then not retried anymore and user may have been left with
a black screen, the restart limits timeouts have been adjusted only to back
off and never mark the unit as failed
Closes#4273
* Use /bin/sh and printf to silence linter errors
This knob controls whether Linux throws away its congestion
window (cwnd) after a connection has been idle for at least one
retransmission timeout (RTO). With a value of 0, Linux keeps the
cwnd it had before the idle period and can send that amount
immediately when the application resumes writing (still bounded
by the receiver's advertised window and by pacing).
With slow start after idle enabled (the default), Linux allows
only about 10 MSS (~14 KiB) in the first burst after idle. Even
when a connection stays open to web clients, a short idle forces
multiple round trips to ramp back up.
On Wi-Fi, local connections often have very low RTTs, which drives
the RTO down. Between page navigations the connection is considered
idle by Linux. If the next request happens during a transient
latency spike on the Wi-Fi link, the sender starts with a tiny
cwnd and must grow it over many RTTs, so the spike causes outsized
and visible loading delays.
For devices behind typical Internet uplinks, the higher RTT makes
the initial ramp-up feel even slower until the window regains size.
However, here the connection does take longer to drop to idle, for
Linux standards. So the connection is less likely to be considered
idle between navigations.
This change does not affect flows with very small receive windows
(e.g. many microcontrollers), which are limited by the peer's
advertised window rather than the sender's cwnd.
Example RTOs on low jitter, low loss connections:
Defaults:
TCP_RTO_MIN = 200 ms
TCP_RTO_MAX = 120 s
low-jitter path so rttvar_us = 200 ms
HZ = 1000 or 250 or 100 (depending on the kernel settings)
*31 ms average RTT*
- SRTT ≈ 31 ms; RTTVAR ≈ 200 ms → Sum = 231 ms
- 'usecs_to_jiffies(231000)' = 231 jiffies (HZ 1000) -> RTO ≈ 231 ms
- If 'HZ = 250' (4 ms tick), ceil(231/4)=58 jiffies -> 232 ms RTO
- If 'HZ = 100' (10 ms tick), ceil(231/10)=23 jiffies -> 240 ms RTO
*178 ms average RTT*
- HZ=1000 (1 ms tick): 378 ms RTO
- HZ=250 (4 ms tick): ceil(378/4)=95 -> 380 ms RTO
- HZ=100 (10 ms tick): ceil(378/10)=38 -> 380 ms RTO
*292 ms average RTT*
- HZ=1000 (1 ms tick): 492 ms RTO
- HZ=250 (4 ms tick): ceil(492/4)=123 -> 492 ms RTO
- HZ=100 (10 ms tick): ceil(492/10)=50 -> 500 ms RTO
Any loss or jitter will increase those RTO values.
Set net.ipv4.tcp_thin_linear_timeouts=1 to switch retransmission
timeout (RTO) backoff from exponential to linear for 'thin' TCP flows.
This reduces tail latency for API-style connections that typically have
very few packets in flight, improving recovery from sporadic loss without
changing anything for larger TCP transfers.
Kernel definition: A flow is considered thin when 'tp->packets_out < 4'
and while not in the initial slow start.
See tcp_stream_is_thin(tp) in include/net/tcp.h.
To make system timezone configurable, we need to have /etc/localtime
writable, and it must be possible to atomically create a symlink from
this place, which means the whole parent folder must be writable. We
don't have /etc writable and can't use the usual bind mount for this.
Latest Systemd v258 has patch that allows setting an environment
variable that sets where the localtime should be written. This can be
persisted in the overlay partition, with a symlink from /etc/localtime
leading there, finally pointing to the actual zoneinfo file. If the
symlink doesn't exist, create it by hassos-overlay script (it's not
really needed as UTC is the default, but Systemd does the same if you
change from non-UTC timezone back to UTC).
Also disable BR2_TARGET_LOCALTIME, so /etc/localtime and /etc/timezone
(the latter is only informative and non-standard) are not written by the
tzdata package build.
In some cases, the wipe service may be called due to a race condition for the
second time during the boot, very likely failing because the filesystems are
already mounted. This can not be reproduced on OVA but can be fairly easy
triggered e.g. on RPi. As we want the service to be executed exactly only once,
we can do what's suggested in [1] and set the RemainAfterExit=yes. That should
ensure the unit is not ever started for the second time.
[1] https://www.github.com/systemd/systemd/issues/29367
Use simple shell script to perform device wipe instead of calling OS Agent to
do that through the UDisks2 API. While it might have been a good idea to use
high level interface for that back then, it turns out it causes more issues
than the benefits it could bring.
Main problem currently is that the OS Agent needs to read sysctl variables, but
those are only set after mounting the overlay partition. But at the same time,
the overlay partition can't be mounted if we want to wipe it - this creates a
dependency cycle through the haos-agent.service.
To get rid of the cycle and simplify things, use a shell script doing basically
the same what the OS Agent does. Since the wipe functionality only makes sense
to be implemented on HAOS targets (not on Supervised), there's little point of
having it in higher layer of abstraction that OS Agent provides.
It should be also checked if changes from #1291 are needed anymore, as the
driving factor for those have been probably the wipe feature in OS Agent too,
but at this point they seem to be harmless.
Allow configuration of the swap size via /etc/default/haos-swapfile file. By
setting the SWAPSIZE variable in this file, swapfile get recreated on the next
reboot to the defined size. Size can be either in bytes or with optional units
(B/K/M/G, accepting some variations but always interpreted as power of 10). The
size is then rounded to 4k block size. If no override is defined or the value
can't be parsed, it falls back to previously used 33% of system RAM.
Fixes#968
* Move rauc.db to boot partition
The RAUC metadata file contains information that is tightly related to the
system and kernel partitions. With the possibility to migrate data disk, the
rauc.db can contain bogus information when moved to a different system. Removal
of the file on "device wipe" is also not desirable, because the information
about slot status is lost.
Relocate the rauc.db to the boot partition after a system upgrade (as this
can't be handled by RAUC hooks, because it needs to be executed after all slots
and metadata is written) and adjust the script for recreating it. The downside
is that its content in /mnt/data would be recreated if the boot slot is changed
or system downgraded but this should be handled quite gracefully.
Also remove the raucdb-first-boot service which is no longer necessary
with the file not present in the data partition.
* Fix shellcheck and mount path
The /etc/usb_modeswitch.d is present and empty but it can't be written to allow
user modification. Bind-mount it like other /etc folders to make it possible to
adjust usb_modeswitch config.
Fixes#3785
If data disk is adopted on Yellow using the mechanism added in #3686, it
contains RAUC version information that is very likely invalid. In such case,
remove the file on first boot and have it recreated by the raucdb-update
service.
The timeout of 90s was introduced before it was ensured that the timesync
systemd unit starts after network is online. Now with that, it makes less sense
to wait that long - if network is unreachable at the point the time
synchronization starts, and the server fails to reply on the first sync, the
polling interval is exponentially increased and the benefit of waiting for more
attempts is doubtful.
Since another synchronization attempt is done after network changes its state,
we should rely on that instead of having the 90 seconds interval as a waiting
period for plugging the network cable. Worst case, there are other mechanisms
that should set the time to a reasonably accurate value, making the NTP sync
less importart for most of the cases.
* Relocate HAOS Systemd drop-ins to /usr/lib/systemd
With some exceptions, Systemd drop-ins overriding default unit configuration
have been placed to `/etc/systemd/system`. This is meant for user overrides of
those, or per `man 5 systemd.unit` for "system unites created by the
administrator". Relocate all of these to `/usr/lib/systemd` which should be
used as path for units "installed by the distribution package manager" which is
closer to what we're trying to achieve.
This will make it easier to detect changes to unit files once we enable the
possibility to edit the content of /etc.
* Patch systemd-timesyncd.service instead of replacing it fully
RAUC currently doesn't know the version of the booted slot when booted for the
first time or after wiping the data partition. As a result `ha os info` is
missing this information too.
As there's no built-in mechanism for generating these data by RAUC itself, add
a oneshot service that checks if the boot slot information is contained in the
rauc.db and if not, then generate it.
RAUC seems to cope quite well even with bogus data contained in rauc.db but in
any case, a test has been added to check that everything works as expected.
* Add fsfreeze support for QEMU/KVM/Proxmox installations
Add fsfreeze scripts which calls the new Supervisor API to freeze Home
Assistant Core and add-ons which support the backup freeze scripts
(`backup_pre` and `backup_post`).
This allows to create safe snapshots with databases running.
* Fix lint issues
Pull in the swapfile creation service haos-swapfile.service when
swap.target is reached. This makes sure the service is started even when
other targets are used (e.g. rescue.target).
* Delete Bluetooth device cache regularly
Delete stale Bluetooth devices from the BlueZ device cache every week.
This makes sure that the overlay partition doesn't run out of inodes
which has happened in real world scenarios where many new Bluetooth
devices are discovered.
BlueZ maintains these files on a best effort base. So removing them
while BlueZ is running should be safe.
An alternative considered was to lower BlueZ GATT caching (e.g. by
using Cache=yes instead of always, to cache only paired devices).
However, this would hurt performance and battery lifetime of Bluetooth
devices due to additional unnecessary GATT attributes reads. This is in
particular true for Bluetooth 5.1 devices which support the Database
Hash charactristic. Caching has also helped reliability with
intermittent connections (see
https://github.com/bluez/bluez/issues/191).
More importantly, besides the GATT attribute cache the same files are
also used to cache the device names as well. This is independent of the
above mentioned GATT cache configuration (see device_store_cached_name
in BlueZ). So disabling the GATT caching alone wouldn't solve the
particular problem we are facing.
See also: https://github.com/home-assistant/supervisor/issues/4490
* Use access timestamp instead of modification timestamp
The modification timestamp gets updated regularly (on each connect) it
seems. However, using access timestamp might be more accurate, as it
seems to preserves slightly more cache files. This additional devices
might be devices we don't regularly connect but are still around (and
therefor we shouldn't reread the GATT attributes regularly).
So deleting cache entries with access time older than 7 days. Which
essentially deletes all the entries of devices which haven't been seen
the last 7 days.
To read the current LED configuration correctly /mnt/boot is required.
This change makes sure that the boot partition is mounted when the OS
Agent starts.
* Enable Multi-Gen LRU
Multi-Gen LRU should improve performance under memory pressure. This is
especially useful for embedded platforms where memory is scarce.
* Add service to configure Multi-Gen LRU
Use min_ttl_ms of 1 which is the least aggressive in terms of lag. Since
we are a server application, we can tune trashing prevention with a
higher acceptable lag.
* Use zswap instead of swap in zram
This requires a swap file which will get generated automatically on
startup.
* Fix file size and free disk space comparison
* Set zswap factor to 33%
* Set vm.swappiness to 1
Decrease swapping to a minimum. This is also recommended for database
work loads by the MariaDB documentation. In practice it causes the least
amount of writes to disk when under memory pressure, while still making
swap available when needed.
* Deactivate any external data disk device on first boot (#2390)
* Use lsblk to determine the underlying device file
Comparing major number is not reliable, e.g. virtio disks have the same
major number despite being different devices. Use lsblk to find the
underlying device, and compare the device name instead.
This is more readable than passing arguments to the daemon directly. It
also shortens the ExecStart command significantly, which is stored in
every log entry in systemd-journald.
A higher file system commit interval can help to decrease the amount of
writes. In tests, a commit interval of higher than 30s seems not to help
much in practice. Settle with 30s for now.
* Bump buildroot
* buildroot 99b62b8bd3...97287bbebf (3):
> package/dbus-broker: bump to release 32
> package/dbus-broker: new package
> Merge pull request #3 from home-assistant/2022.02.x-haos-cgroup-v2
* Use dbus-broker as default D-Bus broker
The dbus-broker (Linux D-Bus Message Broker) aims to be a high
performance and reliable D-Bus broker which can be used as a drop in
replacement to the reference implementation D-Bus broker. In tests it
showed significantly better performance especially when routing BLE
messages.
* Allow dbus-broker to start early
For HAOS device wipe feature we need haos-agent.service and
udisk2.service early. Both require a working D-Bus broker.
The options PrivateTmp and PrivateDevices add additional After=
orderings which doesn't allow dbus-broker to be started early.
* Fix D-Bus dependency
D-Bus services should just depend on dbus.socket.
A faster restart policy is unlikely to help. Increasing the limit makes
it less likely to run into cloud service rate limits (e.g. container
registry).
* updated generic_raw_uart to latest version which comes with dualcopro
support for the HmIP-RFUSB usb rf-sticks by eQ3/ELV.
* remove 99-hmip-rfusb.rules to keep a HmIP-RFUSB device free from being
occupied by the cp210x driver but use the new generic_raw_uart support
instead allowing for advanced dualcopro support for HomeMatic/BidCos-RF
and homematicIP support in parallel.
* Add systemd-journal-remote to the image
This allows to access journald's log from within Supervisor and expose
more system logs to users.
* Allow to access systemd-journal-gatewayd from Supervisor
Create a systemd-journal-gatewayd.socket service using a Unix socket and
bind mount it into the Supervisor container. This allows to query
systemd-journald from Supervisor directly.
Currently the hassos-apparmor.service wants the
hassos-supervisor.service and vice-versa. This is unnecessary and leads
to activation of hassos-supervisor.service when reload/restart
hassos-apparmor.service (Supervisor is doing that on startup).
Make hassos-apparmor.service independent and add dependency as well as
ordering from hassos-supervisor.service side.
Since we start the HomeAssistant shell directly on tty the service
responsible for starting did not restart the shell on exit. Remove the
RemainAfterExit flag to make sure that the shell restarts on exit.
* Start ha-cli on tty1 instead of a getty
Instead of starting a getty start the ha-cli directly. This will show
the banner right on startup with the important information such as IP
address of the instance or the URL to reach it.
* Use default shell as root shell instead of HA CLI
Instead of using the ha-cli.sh script as login shell use the regular
shell. Amongst other things, this allows to run VS Code devcontainers
remotely via SSH or using scp. The HA CLI is still available using the
`ha` command.
* Enable systemd-time-wait-sync.service by default
Enable the systemd-time-wait-sync.service by default. This allows to use
the time-sync.target which allows to make sure services only get started
once the time is synchronized.
* Make sure time is synchronized when starting hassos-supervisor.service
Use the time-sync.target to make sure that the Supervisor gets stsarted
after the time has been synchronized.
* Set timeout for systemd-time-wait-sync.service
Don't delay startup forever in case time synchronization doesn't work.
This allows to boot the system even without Internet connection.
The latest version of OS Agent sets haos.wipe=1 as kernel argument to
trigger a device wipe. Let systemd pickup this kernel command line
argument and start haos-wipe.service.
This rather complex architecture allows to add other triggers in the
future, e.g. a button read in the boot loader.
There are incident reports on the internet where poeple report that
fsck.(v)fat actually leads to problems rather file system fixes. Around
the time when Home Assistant OS added fsck.fat for the boot partition,
reports of empty boot partitions or file with weired filenames started
to appear. This could be caused by fsck.fat.
Disable fsck on the boot partition.