1
0
mirror of https://github.com/home-assistant/operating-system.git synced 2026-05-03 23:18:02 +01:00
Commit Graph

135 Commits

Author SHA1 Message Date
Jan Čermák
1b511990e3 Allow overriding sysctl parameters via /etc/sysctl.d files (#3883)
Relocate current content of /etc/sysctl.d to /usr/lib and make the /etc folder
writable via a bind mount.
2025-02-19 15:33:16 +01:00
Jan Čermák
d42e34f646 Make swap size configurable (#3882)
Allow configuration of the swap size via /etc/default/haos-swapfile file. By
setting the SWAPSIZE variable in this file, swapfile get recreated on the next
reboot to the defined size. Size can be either in bytes or with optional units
(B/K/M/G, accepting some variations but always interpreted as power of 10). The
size is then rounded to 4k block size. If no override is defined or the value
can't be parsed, it falls back to previously used 33% of system RAM.

Fixes #968
2025-02-19 15:33:04 +01:00
Jan Čermák
48bf9b5056 Move rauc.db to boot partition (#3810)
* Move rauc.db to boot partition

The RAUC metadata file contains information that is tightly related to the
system and kernel partitions. With the possibility to migrate data disk, the
rauc.db can contain bogus information when moved to a different system. Removal
of the file on "device wipe" is also not desirable, because the information
about slot status is lost.

Relocate the rauc.db to the boot partition after a system upgrade (as this
can't be handled by RAUC hooks, because it needs to be executed after all slots
and metadata is written) and adjust the script for recreating it. The downside
is that its content in /mnt/data would be recreated if the boot slot is changed
or system downgraded but this should be handled quite gracefully.

Also remove the raucdb-first-boot service which is no longer necessary
with the file not present in the data partition.

* Fix shellcheck and mount path
2025-01-21 18:40:07 +01:00
Jan Čermák
6ef7a68a1d Make usb_modeswitch include directory writable (#3800)
The /etc/usb_modeswitch.d is present and empty but it can't be written to allow
user modification. Bind-mount it like other /etc folders to make it possible to
adjust usb_modeswitch config.

Fixes #3785
2025-01-16 18:11:35 +01:00
Jan Čermák
c7a9a0b906 Remove existing rauc.db from a data disk on the first boot (#3737)
If data disk is adopted on Yellow using the mechanism added in #3686, it
contains RAUC version information that is very likely invalid. In such case,
remove the file on first boot and have it recreated by the raucdb-update
service.
2024-12-12 20:44:15 +01:00
Jan Čermák
23039ceea7 Reduce timeout for network time synchronization to 15 seconds (#3669)
The timeout of 90s was introduced before it was ensured that the timesync
systemd unit starts after network is online. Now with that, it makes less sense
to wait that long - if network is unreachable at the point the time
synchronization starts, and the server fails to reply on the first sync, the
polling interval is exponentially increased and the benefit of waiting for more
attempts is doubtful.

Since another synchronization attempt is done after network changes its state,
we should rely on that instead of having the 90 seconds interval as a waiting
period for plugging the network cable. Worst case, there are other mechanisms
that should set the time to a reasonably accurate value, making the NTP sync
less importart for most of the cases.
2024-11-13 17:14:54 +01:00
Jan Čermák
2916a1c247 Relocate HAOS Systemd drop-ins to /usr/lib/systemd (#3582)
* Relocate HAOS Systemd drop-ins to /usr/lib/systemd

With some exceptions, Systemd drop-ins overriding default unit configuration
have been placed to `/etc/systemd/system`. This is meant for user overrides of
those, or per `man 5 systemd.unit` for "system unites created by the
administrator". Relocate all of these to `/usr/lib/systemd` which should be
used as path for units "installed by the distribution package manager" which is
closer to what we're trying to achieve.

This will make it easier to detect changes to unit files once we enable the
possibility to edit the content of /etc.

* Patch systemd-timesyncd.service instead of replacing it fully
2024-09-12 12:47:22 +02:00
Jan Čermák
6c7b6fdebe Generate version information for RAUC when rauc.db is empty (#3436)
RAUC currently doesn't know the version of the booted slot when booted for the
first time or after wiping the data partition. As a result `ha os info` is
missing this information too.

As there's no built-in mechanism for generating these data by RAUC itself, add
a oneshot service that checks if the boot slot information is contained in the
rauc.db and if not, then generate it.

RAUC seems to cope quite well even with bogus data contained in rauc.db but in
any case, a test has been added to check that everything works as expected.
2024-06-20 16:50:14 +02:00
Stefan Agner
1a6b7418f0 Improve Bluetooth cache cleanup command (#2906)
Use the find's delete flag to delete the files instead of spanning a
shell for each file.
2023-11-06 11:52:57 +01:00
Stefan Agner
2cbaaf9f3b Fix fsfreeze freeze support (#2787)
Pass the script argument properly to make sure the script gets actually
called from the QEMU guest agent.
2023-10-03 16:21:57 +02:00
Stefan Agner
893a49a3f3 Add fsfreeze support for QEMU/KVM/Proxmox installations (#2781)
* Add fsfreeze support for QEMU/KVM/Proxmox installations

Add fsfreeze scripts which calls the new Supervisor API to freeze Home
Assistant Core and add-ons which support the backup freeze scripts
(`backup_pre` and `backup_post`).

This allows to create safe snapshots with databases running.

* Fix lint issues
2023-10-02 08:30:20 +02:00
Stefan Agner
86b172b9c2 Create swapfile even when not using the multi-user.target (#2762)
Pull in the swapfile creation service haos-swapfile.service when
swap.target is reached. This makes sure the service is started even when
other targets are used (e.g. rescue.target).
2023-09-21 15:30:24 +02:00
Stefan Agner
f8f2e61967 Delete Bluetooth device cache regularly (#2751)
* Delete Bluetooth device cache regularly

Delete stale Bluetooth devices from the BlueZ device cache every week.
This makes sure that the overlay partition doesn't run out of inodes
which has happened in real world scenarios where many new Bluetooth
devices are discovered.

BlueZ maintains these files on a best effort base. So removing them
while BlueZ is running should be safe.

An alternative considered  was to lower BlueZ GATT caching (e.g. by
using Cache=yes instead of always, to cache only paired devices).
However, this would hurt performance and battery lifetime of Bluetooth
devices due to additional unnecessary GATT attributes reads. This is in
particular true for Bluetooth 5.1 devices which support the Database
Hash charactristic. Caching has also helped reliability with
intermittent connections (see
https://github.com/bluez/bluez/issues/191).

More importantly, besides the GATT attribute cache the same files are
also used to cache the device names as well. This is independent of the
above mentioned GATT cache configuration (see device_store_cached_name
in BlueZ). So disabling the GATT caching alone wouldn't solve the
particular problem we are facing.

See also: https://github.com/home-assistant/supervisor/issues/4490

* Use access timestamp instead of modification timestamp

The modification timestamp gets updated regularly (on each connect) it
seems. However, using access timestamp might be more accurate, as it
seems to preserves slightly more cache files. This additional devices
might be devices we don't regularly connect but are still around (and
therefor we shouldn't reread the GATT attributes regularly).

So deleting cache entries with access time older than 7 days. Which
essentially deletes all the entries of devices which haven't been seen
the last 7 days.
2023-09-14 23:13:40 +02:00
Steven Barth
6776b23c32 Add overlay for systemd config to enable watchdog configuration (#2628) 2023-07-04 20:34:55 +02:00
Stefan Agner
24217838e2 Start OS Agent only when boot partition is mounted (#2583)
To read the current LED configuration correctly /mnt/boot is required.
This change makes sure that the boot partition is mounted when the OS
Agent starts.
2023-06-10 00:43:51 +02:00
Stefan Agner
c7588e9350 Enable Multi-Gen LRU (#2392)
* Enable Multi-Gen LRU

Multi-Gen LRU should improve performance under memory pressure. This is
especially useful for embedded platforms where memory is scarce.

* Add service to configure Multi-Gen LRU

Use min_ttl_ms of 1 which is the least aggressive in terms of lag. Since
we are a server application, we can tune trashing prevention with a
higher acceptable lag.
2023-03-31 23:28:43 +02:00
Stefan Agner
75dcb932f8 Use zswap instead of swap in zram (#2420)
* Use zswap instead of swap in zram

This requires a swap file which will get generated automatically on
startup.

* Fix file size and free disk space comparison

* Set zswap factor to 33%

* Set vm.swappiness to 1

Decrease swapping to a minimum. This is also recommended for database
work loads by the MariaDB documentation. In practice it causes the least
amount of writes to disk when under memory pressure, while still making
swap available when needed.
2023-03-22 11:08:05 +01:00
Stefan Agner
5200096c4e Deactivate any external data disk device on first boot (#2390) (#2410)
* Deactivate any external data disk device on first boot (#2390)

* Use lsblk to determine the underlying device file

Comparing major number is not reliable, e.g. virtio disks have the same
major number despite being different devices. Use lsblk to find the
underlying device, and compare the device name instead.
2023-03-15 14:16:11 +01:00
Stefan Agner
66c15adbbf Move Docker configuration to daemon.json (#2116)
This is more readable than passing arguments to the daemon directly. It
also shortens the ExecStart command significantly, which is stored in
every log entry in systemd-journald.
2022-09-07 19:13:47 +02:00
Stefan Agner
b1df44421b Bump commit interval to 30s (#2103)
A higher file system commit interval can help to decrease the amount of
writes. In tests, a commit interval of higher than 30s seems not to help
much in practice. Settle with 30s for now.
2022-09-02 15:23:38 +02:00
Stefan Agner
2d8ec0c8ee Use dbus-broker as default D-Bus broker (#2053)
* Bump buildroot

* buildroot 99b62b8bd3...97287bbebf (3):
  > package/dbus-broker: bump to release 32
  > package/dbus-broker: new package
  > Merge pull request #3 from home-assistant/2022.02.x-haos-cgroup-v2

* Use dbus-broker as default D-Bus broker

The dbus-broker (Linux D-Bus Message Broker) aims to be a high
performance and reliable D-Bus broker which can be used as a drop in
replacement to the reference implementation D-Bus broker. In tests it
showed significantly better performance especially when routing BLE
messages.

* Allow dbus-broker to start early

For HAOS device wipe feature we need haos-agent.service and
udisk2.service early. Both require a working D-Bus broker.
The options PrivateTmp and PrivateDevices add additional After=
orderings which doesn't allow dbus-broker to be started early.

* Fix D-Bus dependency

D-Bus services should just depend on dbus.socket.
2022-08-10 17:01:02 +02:00
Stefan Agner
5932f1212e Increase Supervisor start rate limit (#2010)
A faster restart policy is unlikely to help. Increasing the limit makes
it less likely to run into cloud service rate limits (e.g. container
registry).
2022-07-08 22:35:52 +02:00
Stefan Agner
4310cfe916 Enable file system check for FAT boot partition (#1857) 2022-04-20 14:06:56 +02:00
Stefan Agner
f509e9ce5d Shutdown HA CLI properly (#1768)
Drop IgnoreOnIsolate to make sure the service is shutdown during
shutdown.
2022-02-25 19:17:57 +01:00
Stefan Agner
5fd943c936 Expose systemd-journal-gatewayd to Supervisor (#1627)
* Add systemd-journal-remote to the image

This allows to access journald's log from within Supervisor and expose
more system logs to users.

* Allow to access systemd-journal-gatewayd from Supervisor

Create a systemd-journal-gatewayd.socket service using a Unix socket and
bind mount it into the Supervisor container. This allows to query
systemd-journald from Supervisor directly.
2021-11-04 15:38:35 +01:00
Stefan Agner
74fe7d4cb8 Make AppArmor independent of Supervisor service (#1592)
Currently the hassos-apparmor.service wants the
hassos-supervisor.service and vice-versa. This is unnecessary and leads
to activation of hassos-supervisor.service when reload/restart
hassos-apparmor.service (Supervisor is doing that on startup).

Make hassos-apparmor.service independent and add dependency as well as
ordering from hassos-supervisor.service side.
2021-10-15 01:36:02 +02:00
Stefan Agner
66d5957310 Wait until Internet is available before starting AppArmor (#1547)
This makes sure that internet connectivity is available to replace the
AppArmor configuration in case the device has been wiped.
2021-09-20 13:44:09 +02:00
Stefan Agner
622cbb806d Restart console on tty1 on exit (#1387) (#1391)
Since we start the HomeAssistant shell directly on tty the service
responsible for starting did not restart the shell on exit. Remove the
RemainAfterExit flag to make sure that the shell restarts on exit.
2021-06-05 15:17:14 +02:00
Stefan Agner
40b4d5ca2e Start Home Assistant CLI on tty1 without login (#1366)
* Start ha-cli on tty1 instead of a getty

Instead of starting a getty start the ha-cli directly. This will show
the banner right on startup with the important information such as IP
address of the instance or the URL to reach it.

* Use default shell as root shell instead of HA CLI

Instead of using the ha-cli.sh script as login shell use the regular
shell. Amongst other things, this allows to run VS Code devcontainers
remotely via SSH or using scp. The HA CLI is still available using the
`ha` command.
2021-05-19 13:18:02 +02:00
Stefan Agner
2d3119ef22 Delay Supervisor start until time has been sychronized (#1360)
* Enable systemd-time-wait-sync.service by default

Enable the systemd-time-wait-sync.service by default. This allows to use
the time-sync.target which allows to make sure services only get started
once the time is synchronized.

* Make sure time is synchronized when starting  hassos-supervisor.service

Use the time-sync.target to make sure that the Supervisor gets stsarted
after the time has been synchronized.

* Set timeout for systemd-time-wait-sync.service

Don't delay startup forever in case time synchronization doesn't work.
This allows to boot the system even without Internet connection.
2021-05-12 17:47:42 +02:00
Stefan Agner
ae0aeb84f5 Update to OS Agent 1.0.0 (#1317)
* Update to OS Agent 1.0.0

* Use new D-Bus path/interface/object in haos-wipe.service
2021-04-08 20:22:19 +02:00
Stefan Agner
dde7f1d073 Bump to latest OS Agent version to support Device Wipe (#1292)
The latest version of OS Agent sets haos.wipe=1 as kernel argument to
trigger a device wipe. Let systemd pickup this kernel command line
argument and start haos-wipe.service.

This rather complex architecture allows to add other triggers in the
future, e.g. a button read in the boot loader.
2021-03-31 23:43:26 +02:00
Stefan Agner
b77d633382 Remove the no longer required busybox-acpid service (#1261)
The BusyBox option has been disabled in #1210.
2021-03-04 00:49:04 +01:00
Stefan Agner
907857985a Disable fsck.fat for boot partition (might help #1125) (#1190)
There are incident reports on the internet where poeple report that
fsck.(v)fat actually leads to problems rather file system fixes. Around
the time when Home Assistant OS added fsck.fat for the boot partition,
reports of empty boot partitions or file with weired filenames started
to appear. This could be caused by fsck.fat.

Disable fsck on the boot partition.
2021-01-29 15:02:08 +01:00
Stefan Agner
be2a64f4d2 Add hassos-apparmor dependency to supervisor (#1140)
The supervisor container requires the "hassio-supervisor" AppArmor
profile. Make sure our AppArmor service hassos-apparmor is a dependency
of the hassos-supervisor.service.
2020-12-29 13:46:40 +01:00
Stefan Agner
7959113c97 Use systemd-growfs (#1133)
* Use systemd-growfs instead of resize2fs (#1106)

Since systemd 236 systemd has a built-in file system growing mechanism.
The mechanism relies on the kernels online file system resize
capabilities instead of the external resize2fs utility. Online resizing
is supposedly much faster since the kernel takes care of things.

This also makes sure that external file systems get resized which
previously have not been taken care of.

* Drop HA OS specific file system resizing

Since we have systemd-growfs in place now we can drop our file system
resizing code.

* Make sure /dev/disk/by-label/hassos-data is present after resizing

Note: systemd will retry mnt-data.mount later, so at least in theory
this shouldn't really matter. However, the journal has a lot of churn
due to that reordering.
2020-12-28 23:46:55 +01:00
Stefan Agner
323f415fa8 Mount boot partition sync (#1092) (#1101)
When we write the update to the boot partiton, there is nothing which
makes sure that data is written to disk. This leaves a rather large
window (probably around 30s) where a machine reset/poweroff can lead
to a corrupted boot partition. Use the sync mount option to minimize the
corruption window.

Note that sync is not ideal for flash drives normally. But since we
write very little and typically only on OS update to the boot partition,
this shouldn't be a problem.
2020-12-17 14:09:43 +01:00
Stefan Agner
1a8f9ca2e3 Avoid waiting for external drive unnecessarily (#1066)
* Avoid waiting for external drive unnecessarily

Even though the condition to start hassos-data.service is not met (the
file /mnt/overlay/data-move is not there by default), it seems that
systemd waits for the dependencies for hassos-data.service. Don't
Require or Wants any dependencies which might not be present by
default.

* Use systemd to wait for partition using partlabel device

* Use sfdisk which allows to wipe filesystem signatures

Even though we zap the partition table using sgdisk, the file system
superblock (which contains the file system label) does survive. This
can cause problems when trying to reuse a disk previously already
labeled using hassos-data: It might take precendence on next boot
over the existing data partition on the eMMC.

Make sure to clean all file system signatures using sfdisk.
2020-12-08 01:11:00 +01:00
Stefan Agner
6672046b6f Make the datactl command more robust (#1059)
* Make the datactl command more robust

Validate target disk (partition) size to avoid a copy attempt which will
fail. If e2image operation fails, make sure the leftover copy is not
regonized as data partition.

* Fix hassos-data service device unit dependencies
2020-12-04 20:55:35 +01:00
Stefan Agner
46bb12844f Rewrite datactl command (#1046)
* Rewrite datactl command

Prepare the target partition as part of the datactl command. Rely on
partlabel for the target disk since we are always using GPT on the
target disk. Use systemd and partlabel mechanism to wait and find
the target data disk. Keep using the file system label to identify
the source disk.

Also use e2image instead of raw dd to move data. This should
speed up the processes significantly.

* Fix corner case when reusing same disk again
2020-12-03 20:05:02 +01:00
Stefan Agner
a0871be6c0 Bump buildroot to 2020.11-rc1 (#985)
* Update buildroot-patches for 2020.11-rc1 buildroot

* Update buildroot to 2020.11-rc1

Signed-off-by: Stefan Agner <stefan@agner.ch>

* Don't rely on sfdisk --list-free output

The --list-free (-F) argument does not allow machine readable mode. And
it seems that the output format changes over time (different spacing,
using size postfixes instead of raw blocks).

Use sfdisk json output and calculate free partition space ourselfs. This
works for 2.35 and 2.36 and is more robust since we rely on output which
is meant for scripts to parse.

* Migrate defconfigs for Buildroot 2020.11-rc1

In particular, rename BR2_TARGET_UBOOT_BOOT_SCRIPT(_SOURCE) to
BR2_PACKAGE_HOST_UBOOT_TOOLS_BOOT_SCRIPT(_SOURCE).

* Rebase/remove systemd patches for systemd 246

* Drop apparmor/libapparmor from buildroot-external

* hassos-persists: use /run as directory for lockfiles

The U-Boot tools use /var/lock by default which is not created any more
by systemd by default (it is under tmpfiles legacy.conf, which we no
longer install).

* Disable systemd-update-done.service

The service is not suited for pure read-only systems. In particular the
service needs to be able to write a file in /etc and /var. Remove the
service. Note: This is a static service and cannot be removed using
systemd-preset.

* Disable apparmor.service for now

The service loads all default profiles. Some might actually cause
problems. E.g. the profile for ping seems not to match our setup for
/etc/resolv.conf:
[85503.634653] audit: type=1400 audit(1605286002.684:236): apparmor="DENIED" operation="open" profile="ping" name="/run/resolv.conf" pid=27585 comm="ping" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
2020-11-13 18:25:44 +01:00
Stefan Agner
25a0dd3082 Use systemd-resolved to announce hostname via mDNS and LLMNR (#986)
Drop AVAHI and use systemd-resolved to announce hostname via mDNS
and LLMNR. Also continue to offer the _workstation._tcp.local service
since it is used by the CoreDNS mDNS plug-in.
2020-11-13 17:43:46 +01:00
Stefan Agner
4f28a284be Make self healing capabilities more robust (#960)
In case a container image is corrupted `docker inspect` might fail:
  # docker inspect --format='{{.Id}}' "${SUPERVISOR_IMAGE}"

  Error response from daemon: readlink /mnt/data/docker/overlay2: invalid argument

In that same state the `docker images` command still shows the images.
Since `docker inspect` returns an error SUPERVISOR_IMAGE_ID will be empty
and a simple `docker pull` will be attempted. That does not suffice to
recover from a corrupted container image.

Use `docker images` to get the image ids and make sure to delete all
image ids found by that command.

Also don't use RuntimeDirectory since it deletes the runtime directory
between the service start attempts which defeats the purpose.
2020-11-09 13:05:54 +01:00
Stefan Agner
503117d8bf Move RuntimeDirectory to the Service section (#957)
RuntimeDirectory needs to be in the [Service] section to take effect.
2020-11-04 16:55:19 +01:00
Stefan Agner
2d257bd671 Simplify self healing capabilities of Supervisor service (#952)
* Simplify self healing capabilities of Supervisor service

Instead of relying on time based information on how long the container
has been running use a startup marker file to infer if the last startup
has been successful.

* Update buildroot-external/rootfs-overlay/usr/sbin/hassos-supervisor

Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>

Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>
2020-11-04 10:05:38 +01:00
Aman Gupta Karmani
a8bad54efc automatically fsck to repair issues after an unclean shutdown (#938)
* automatically fsck to repair partitions

* add fsck.fat so rpi boot partition can be repaired

* Use Wants= instead of Requires=

Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>

* add dosfstools to all images

* run hassos-data and hassos-expand after fsck

Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>
2020-10-30 21:52:24 +01:00
Aman Gupta Karmani
3337cd0f79 Fix var-lib-NetworkManager.mount dependencies (#895) 2020-10-12 21:41:12 +02:00
Stefan Agner
1708ed11b4 Fix Docker socket path (#885)
The Docker socket path is /run/docker.sock. Also only one path can be
used per property. This fixes the supervisor service, which currently
refuses to start due to missing Docker socket.
2020-10-06 12:17:39 +02:00
Pascal Vizeli
f219f239d8 Improve handling with services on supervisor (#867)
* Improve handling with services on supervisor

* add condition

* move dbus to required, since we can't start the supervisor
2020-09-24 13:40:39 +02:00
Pascal Vizeli
50176a0e3b Add support for snapshots/restore on OS level (#801) 2020-08-03 16:28:08 +02:00