operating-system

mirror of https://github.com/home-assistant/operating-system.git synced 2026-05-03 23:18:02 +01:00

Author	SHA1	Message	Date
Jan Čermák	31f347ee0f	Remove handling of Docker key.json (#4361 ) The deprecated-key-path option is no longer handled, but it doesn't cause problems because the key is explicitly ignored. It was completely removed in Docker 19.03.0 [1]. As such, the option and the pre-start script to fix the corrupted key.json can be removed now, as it has no effect, only printing confusing message when Docker service fails to start. [1] `98fc09128b`	2025-10-28 18:36:57 +01:00
Jan Čermák	af9131cd10	Use Docker containerd snapshotter for new and wiped installs (#4360 ) Prefer the containerd snapshotter by using it by default for new installs and when no Docker data is present (e.g. after datadisk wipe). The snapshotter is enabled by a dockerd flag which is set when a flag file is present in the data partition. This flag file can be used also to opt-in for this snapshotter on legacy installs (high level API through OS Agent and Supervisor TBD), to migrate to the containerd snapshotter this file can be simply created manually. Testing shown no major problems when migrating, the old overlay2 folder can be (and should be - to avoid situations where the data disk might run out of space) deleted before the docker.service is started in the docker-prepare script. Note that there's no offline migration path, OS needs to be connected to the internet to re-download the images when migrating. This could be theoretically possible through docker image save/load functions but guarding for enough of space and other edge cases would be probably too complex to justify it. Refs #4252 Refs #4253 - easier opt-in method is still needed Closes #4254 - migration is handled seamlessly by Docker	2025-10-28 18:36:48 +01:00
Jan Čermák	bde19002df	Improve UX of HA CLI wrapper and emergency console (#4326 ) * Improve UX of HA CLI wrapper and emergency console For many users, the emergency console gives feeling that the system is completely broken. However, there are various cases when the system just takes just a bit longer to start up and the emergency message is shown, while it finishes a proper startup shortly after. This change tries to improve the UX in several ways: * The limit before a forced emergency console startup is changed to 3 minutes * Waiting can be interrupted with Ctrl+C (reset counter is cleared then) * Some hints what to check have been added before starting the shell * Also, because if the HA CLI failed for 5 times in a row in quick succession, the CLI startup was then not retried anymore and user may have been left with a black screen, the restart limits timeouts have been adjusted only to back off and never mark the unit as failed Closes #4273 * Use /bin/sh and printf to silence linter errors	2025-10-01 18:23:28 +02:00
Jan Čermák	7243db762e	Make system timezone setting persistenly configurable (#4224 ) To make system timezone configurable, we need to have /etc/localtime writable, and it must be possible to atomically create a symlink from this place, which means the whole parent folder must be writable. We don't have /etc writable and can't use the usual bind mount for this. Latest Systemd v258 has patch that allows setting an environment variable that sets where the localtime should be written. This can be persisted in the overlay partition, with a symlink from /etc/localtime leading there, finally pointing to the actual zoneinfo file. If the symlink doesn't exist, create it by hassos-overlay script (it's not really needed as UTC is the default, but Systemd does the same if you change from non-UTC timezone back to UTC). Also disable BR2_TARGET_LOCALTIME, so /etc/localtime and /etc/timezone (the latter is only informative and non-standard) are not written by the tzdata package build.	2025-08-13 18:15:57 +02:00
Jan Čermák	3e3372b7dc	Remove old migrations from RAUC hook (#4083 ) As we're moving to another major release and 15.2 will be mandated update before 16.0, we can (or even must) remove some old migrations.	2025-05-28 17:06:52 +02:00
Jan Čermák	24640c11ae	Ensure haos-wipe service can be called only once per boot (#3924 ) In some cases, the wipe service may be called due to a race condition for the second time during the boot, very likely failing because the filesystems are already mounted. This can not be reproduced on OVA but can be fairly easy triggered e.g. on RPi. As we want the service to be executed exactly only once, we can do what's suggested in [1] and set the RemainAfterExit=yes. That should ensure the unit is not ever started for the second time. [1] https://www.github.com/systemd/systemd/issues/29367	2025-03-12 20:07:26 +01:00
Jan Čermák	6c4f32a8c0	Use shell script instead of OS Agent for device wipe (#3916 ) Use simple shell script to perform device wipe instead of calling OS Agent to do that through the UDisks2 API. While it might have been a good idea to use high level interface for that back then, it turns out it causes more issues than the benefits it could bring. Main problem currently is that the OS Agent needs to read sysctl variables, but those are only set after mounting the overlay partition. But at the same time, the overlay partition can't be mounted if we want to wipe it - this creates a dependency cycle through the haos-agent.service. To get rid of the cycle and simplify things, use a shell script doing basically the same what the OS Agent does. Since the wipe functionality only makes sense to be implemented on HAOS targets (not on Supervised), there's little point of having it in higher layer of abstraction that OS Agent provides. It should be also checked if changes from #1291 are needed anymore, as the driving factor for those have been probably the wipe feature in OS Agent too, but at this point they seem to be harmless.	2025-03-06 16:39:40 +01:00
Jan Čermák	1b511990e3	Allow overriding sysctl parameters via /etc/sysctl.d files (#3883 ) Relocate current content of /etc/sysctl.d to /usr/lib and make the /etc folder writable via a bind mount.	2025-02-19 15:33:16 +01:00
Jan Čermák	d42e34f646	Make swap size configurable (#3882 ) Allow configuration of the swap size via /etc/default/haos-swapfile file. By setting the SWAPSIZE variable in this file, swapfile get recreated on the next reboot to the defined size. Size can be either in bytes or with optional units (B/K/M/G, accepting some variations but always interpreted as power of 10). The size is then rounded to 4k block size. If no override is defined or the value can't be parsed, it falls back to previously used 33% of system RAM. Fixes #968	2025-02-19 15:33:04 +01:00
Jan Čermák	48bf9b5056	Move rauc.db to boot partition (#3810 ) * Move rauc.db to boot partition The RAUC metadata file contains information that is tightly related to the system and kernel partitions. With the possibility to migrate data disk, the rauc.db can contain bogus information when moved to a different system. Removal of the file on "device wipe" is also not desirable, because the information about slot status is lost. Relocate the rauc.db to the boot partition after a system upgrade (as this can't be handled by RAUC hooks, because it needs to be executed after all slots and metadata is written) and adjust the script for recreating it. The downside is that its content in /mnt/data would be recreated if the boot slot is changed or system downgraded but this should be handled quite gracefully. Also remove the raucdb-first-boot service which is no longer necessary with the file not present in the data partition. * Fix shellcheck and mount path	2025-01-21 18:40:07 +01:00
Jan Čermák	6ef7a68a1d	Make usb_modeswitch include directory writable (#3800 ) The /etc/usb_modeswitch.d is present and empty but it can't be written to allow user modification. Bind-mount it like other /etc folders to make it possible to adjust usb_modeswitch config. Fixes #3785	2025-01-16 18:11:35 +01:00
Jan Čermák	c7a9a0b906	Remove existing rauc.db from a data disk on the first boot (#3737 ) If data disk is adopted on Yellow using the mechanism added in #3686, it contains RAUC version information that is very likely invalid. In such case, remove the file on first boot and have it recreated by the raucdb-update service.	2024-12-12 20:44:15 +01:00
Jan Čermák	23039ceea7	Reduce timeout for network time synchronization to 15 seconds (#3669 ) The timeout of 90s was introduced before it was ensured that the timesync systemd unit starts after network is online. Now with that, it makes less sense to wait that long - if network is unreachable at the point the time synchronization starts, and the server fails to reply on the first sync, the polling interval is exponentially increased and the benefit of waiting for more attempts is doubtful. Since another synchronization attempt is done after network changes its state, we should rely on that instead of having the 90 seconds interval as a waiting period for plugging the network cable. Worst case, there are other mechanisms that should set the time to a reasonably accurate value, making the NTP sync less importart for most of the cases.	2024-11-13 17:14:54 +01:00
Jan Čermák	2916a1c247	Relocate HAOS Systemd drop-ins to /usr/lib/systemd (#3582 ) * Relocate HAOS Systemd drop-ins to /usr/lib/systemd With some exceptions, Systemd drop-ins overriding default unit configuration have been placed to `/etc/systemd/system`. This is meant for user overrides of those, or per `man 5 systemd.unit` for "system unites created by the administrator". Relocate all of these to `/usr/lib/systemd` which should be used as path for units "installed by the distribution package manager" which is closer to what we're trying to achieve. This will make it easier to detect changes to unit files once we enable the possibility to edit the content of /etc. * Patch systemd-timesyncd.service instead of replacing it fully	2024-09-12 12:47:22 +02:00
Jan Čermák	6c7b6fdebe	Generate version information for RAUC when rauc.db is empty (#3436 ) RAUC currently doesn't know the version of the booted slot when booted for the first time or after wiping the data partition. As a result `ha os info` is missing this information too. As there's no built-in mechanism for generating these data by RAUC itself, add a oneshot service that checks if the boot slot information is contained in the rauc.db and if not, then generate it. RAUC seems to cope quite well even with bogus data contained in rauc.db but in any case, a test has been added to check that everything works as expected.	2024-06-20 16:50:14 +02:00
Stefan Agner	1a6b7418f0	Improve Bluetooth cache cleanup command (#2906 ) Use the find's delete flag to delete the files instead of spanning a shell for each file.	2023-11-06 11:52:57 +01:00
Stefan Agner	2cbaaf9f3b	Fix fsfreeze freeze support (#2787 ) Pass the script argument properly to make sure the script gets actually called from the QEMU guest agent.	2023-10-03 16:21:57 +02:00
Stefan Agner	893a49a3f3	Add fsfreeze support for QEMU/KVM/Proxmox installations (#2781 ) * Add fsfreeze support for QEMU/KVM/Proxmox installations Add fsfreeze scripts which calls the new Supervisor API to freeze Home Assistant Core and add-ons which support the backup freeze scripts (`backup_pre` and `backup_post`). This allows to create safe snapshots with databases running. * Fix lint issues	2023-10-02 08:30:20 +02:00
Stefan Agner	86b172b9c2	Create swapfile even when not using the multi-user.target (#2762 ) Pull in the swapfile creation service haos-swapfile.service when swap.target is reached. This makes sure the service is started even when other targets are used (e.g. rescue.target).	2023-09-21 15:30:24 +02:00
Stefan Agner	f8f2e61967	Delete Bluetooth device cache regularly (#2751 ) * Delete Bluetooth device cache regularly Delete stale Bluetooth devices from the BlueZ device cache every week. This makes sure that the overlay partition doesn't run out of inodes which has happened in real world scenarios where many new Bluetooth devices are discovered. BlueZ maintains these files on a best effort base. So removing them while BlueZ is running should be safe. An alternative considered was to lower BlueZ GATT caching (e.g. by using Cache=yes instead of always, to cache only paired devices). However, this would hurt performance and battery lifetime of Bluetooth devices due to additional unnecessary GATT attributes reads. This is in particular true for Bluetooth 5.1 devices which support the Database Hash charactristic. Caching has also helped reliability with intermittent connections (see https://github.com/bluez/bluez/issues/191). More importantly, besides the GATT attribute cache the same files are also used to cache the device names as well. This is independent of the above mentioned GATT cache configuration (see device_store_cached_name in BlueZ). So disabling the GATT caching alone wouldn't solve the particular problem we are facing. See also: https://github.com/home-assistant/supervisor/issues/4490 * Use access timestamp instead of modification timestamp The modification timestamp gets updated regularly (on each connect) it seems. However, using access timestamp might be more accurate, as it seems to preserves slightly more cache files. This additional devices might be devices we don't regularly connect but are still around (and therefor we shouldn't reread the GATT attributes regularly). So deleting cache entries with access time older than 7 days. Which essentially deletes all the entries of devices which haven't been seen the last 7 days.	2023-09-14 23:13:40 +02:00
Stefan Agner	24217838e2	Start OS Agent only when boot partition is mounted (#2583 ) To read the current LED configuration correctly /mnt/boot is required. This change makes sure that the boot partition is mounted when the OS Agent starts.	2023-06-10 00:43:51 +02:00
Stefan Agner	c7588e9350	Enable Multi-Gen LRU (#2392 ) * Enable Multi-Gen LRU Multi-Gen LRU should improve performance under memory pressure. This is especially useful for embedded platforms where memory is scarce. * Add service to configure Multi-Gen LRU Use min_ttl_ms of 1 which is the least aggressive in terms of lag. Since we are a server application, we can tune trashing prevention with a higher acceptable lag.	2023-03-31 23:28:43 +02:00
Stefan Agner	75dcb932f8	Use zswap instead of swap in zram (#2420 ) * Use zswap instead of swap in zram This requires a swap file which will get generated automatically on startup. * Fix file size and free disk space comparison * Set zswap factor to 33% * Set vm.swappiness to 1 Decrease swapping to a minimum. This is also recommended for database work loads by the MariaDB documentation. In practice it causes the least amount of writes to disk when under memory pressure, while still making swap available when needed.	2023-03-22 11:08:05 +01:00
Stefan Agner	5200096c4e	Deactivate any external data disk device on first boot (#2390 ) (#2410 ) * Deactivate any external data disk device on first boot (#2390) * Use lsblk to determine the underlying device file Comparing major number is not reliable, e.g. virtio disks have the same major number despite being different devices. Use lsblk to find the underlying device, and compare the device name instead.	2023-03-15 14:16:11 +01:00
Stefan Agner	66c15adbbf	Move Docker configuration to daemon.json (#2116 ) This is more readable than passing arguments to the daemon directly. It also shortens the ExecStart command significantly, which is stored in every log entry in systemd-journald.	2022-09-07 19:13:47 +02:00
Stefan Agner	b1df44421b	Bump commit interval to 30s (#2103 ) A higher file system commit interval can help to decrease the amount of writes. In tests, a commit interval of higher than 30s seems not to help much in practice. Settle with 30s for now.	2022-09-02 15:23:38 +02:00
Stefan Agner	2d8ec0c8ee	Use dbus-broker as default D-Bus broker (#2053 ) * Bump buildroot * buildroot 99b62b8bd3...97287bbebf (3): > package/dbus-broker: bump to release 32 > package/dbus-broker: new package > Merge pull request #3 from home-assistant/2022.02.x-haos-cgroup-v2 * Use dbus-broker as default D-Bus broker The dbus-broker (Linux D-Bus Message Broker) aims to be a high performance and reliable D-Bus broker which can be used as a drop in replacement to the reference implementation D-Bus broker. In tests it showed significantly better performance especially when routing BLE messages. * Allow dbus-broker to start early For HAOS device wipe feature we need haos-agent.service and udisk2.service early. Both require a working D-Bus broker. The options PrivateTmp and PrivateDevices add additional After= orderings which doesn't allow dbus-broker to be started early. * Fix D-Bus dependency D-Bus services should just depend on dbus.socket.	2022-08-10 17:01:02 +02:00
Stefan Agner	5932f1212e	Increase Supervisor start rate limit (#2010 ) A faster restart policy is unlikely to help. Increasing the limit makes it less likely to run into cloud service rate limits (e.g. container registry).	2022-07-08 22:35:52 +02:00
Stefan Agner	4310cfe916	Enable file system check for FAT boot partition (#1857 )	2022-04-20 14:06:56 +02:00
Stefan Agner	f509e9ce5d	Shutdown HA CLI properly (#1768 ) Drop IgnoreOnIsolate to make sure the service is shutdown during shutdown.	2022-02-25 19:17:57 +01:00
Stefan Agner	5fd943c936	Expose systemd-journal-gatewayd to Supervisor (#1627 ) * Add systemd-journal-remote to the image This allows to access journald's log from within Supervisor and expose more system logs to users. * Allow to access systemd-journal-gatewayd from Supervisor Create a systemd-journal-gatewayd.socket service using a Unix socket and bind mount it into the Supervisor container. This allows to query systemd-journald from Supervisor directly.	2021-11-04 15:38:35 +01:00
Stefan Agner	74fe7d4cb8	Make AppArmor independent of Supervisor service (#1592 ) Currently the hassos-apparmor.service wants the hassos-supervisor.service and vice-versa. This is unnecessary and leads to activation of hassos-supervisor.service when reload/restart hassos-apparmor.service (Supervisor is doing that on startup). Make hassos-apparmor.service independent and add dependency as well as ordering from hassos-supervisor.service side.	2021-10-15 01:36:02 +02:00
Stefan Agner	66d5957310	Wait until Internet is available before starting AppArmor (#1547 ) This makes sure that internet connectivity is available to replace the AppArmor configuration in case the device has been wiped.	2021-09-20 13:44:09 +02:00
Stefan Agner	622cbb806d	Restart console on tty1 on exit (#1387 ) (#1391 ) Since we start the HomeAssistant shell directly on tty the service responsible for starting did not restart the shell on exit. Remove the RemainAfterExit flag to make sure that the shell restarts on exit.	2021-06-05 15:17:14 +02:00
Stefan Agner	40b4d5ca2e	Start Home Assistant CLI on tty1 without login (#1366 ) * Start ha-cli on tty1 instead of a getty Instead of starting a getty start the ha-cli directly. This will show the banner right on startup with the important information such as IP address of the instance or the URL to reach it. * Use default shell as root shell instead of HA CLI Instead of using the ha-cli.sh script as login shell use the regular shell. Amongst other things, this allows to run VS Code devcontainers remotely via SSH or using scp. The HA CLI is still available using the `ha` command.	2021-05-19 13:18:02 +02:00
Stefan Agner	2d3119ef22	Delay Supervisor start until time has been sychronized (#1360 ) * Enable systemd-time-wait-sync.service by default Enable the systemd-time-wait-sync.service by default. This allows to use the time-sync.target which allows to make sure services only get started once the time is synchronized. * Make sure time is synchronized when starting hassos-supervisor.service Use the time-sync.target to make sure that the Supervisor gets stsarted after the time has been synchronized. * Set timeout for systemd-time-wait-sync.service Don't delay startup forever in case time synchronization doesn't work. This allows to boot the system even without Internet connection.	2021-05-12 17:47:42 +02:00
Stefan Agner	ae0aeb84f5	Update to OS Agent 1.0.0 (#1317 ) * Update to OS Agent 1.0.0 * Use new D-Bus path/interface/object in haos-wipe.service	2021-04-08 20:22:19 +02:00
Stefan Agner	dde7f1d073	Bump to latest OS Agent version to support Device Wipe (#1292 ) The latest version of OS Agent sets haos.wipe=1 as kernel argument to trigger a device wipe. Let systemd pickup this kernel command line argument and start haos-wipe.service. This rather complex architecture allows to add other triggers in the future, e.g. a button read in the boot loader.	2021-03-31 23:43:26 +02:00
Stefan Agner	b77d633382	Remove the no longer required busybox-acpid service (#1261 ) The BusyBox option has been disabled in #1210.	2021-03-04 00:49:04 +01:00
Stefan Agner	907857985a	Disable fsck.fat for boot partition (might help #1125 ) (#1190 ) There are incident reports on the internet where poeple report that fsck.(v)fat actually leads to problems rather file system fixes. Around the time when Home Assistant OS added fsck.fat for the boot partition, reports of empty boot partitions or file with weired filenames started to appear. This could be caused by fsck.fat. Disable fsck on the boot partition.	2021-01-29 15:02:08 +01:00
Stefan Agner	be2a64f4d2	Add hassos-apparmor dependency to supervisor (#1140 ) The supervisor container requires the "hassio-supervisor" AppArmor profile. Make sure our AppArmor service hassos-apparmor is a dependency of the hassos-supervisor.service.	2020-12-29 13:46:40 +01:00
Stefan Agner	7959113c97	Use systemd-growfs (#1133 ) * Use systemd-growfs instead of resize2fs (#1106) Since systemd 236 systemd has a built-in file system growing mechanism. The mechanism relies on the kernels online file system resize capabilities instead of the external resize2fs utility. Online resizing is supposedly much faster since the kernel takes care of things. This also makes sure that external file systems get resized which previously have not been taken care of. * Drop HA OS specific file system resizing Since we have systemd-growfs in place now we can drop our file system resizing code. * Make sure /dev/disk/by-label/hassos-data is present after resizing Note: systemd will retry mnt-data.mount later, so at least in theory this shouldn't really matter. However, the journal has a lot of churn due to that reordering.	2020-12-28 23:46:55 +01:00
Stefan Agner	323f415fa8	Mount boot partition sync (#1092 ) (#1101 ) When we write the update to the boot partiton, there is nothing which makes sure that data is written to disk. This leaves a rather large window (probably around 30s) where a machine reset/poweroff can lead to a corrupted boot partition. Use the sync mount option to minimize the corruption window. Note that sync is not ideal for flash drives normally. But since we write very little and typically only on OS update to the boot partition, this shouldn't be a problem.	2020-12-17 14:09:43 +01:00
Stefan Agner	1a8f9ca2e3	Avoid waiting for external drive unnecessarily (#1066 ) * Avoid waiting for external drive unnecessarily Even though the condition to start hassos-data.service is not met (the file /mnt/overlay/data-move is not there by default), it seems that systemd waits for the dependencies for hassos-data.service. Don't Require or Wants any dependencies which might not be present by default. * Use systemd to wait for partition using partlabel device * Use sfdisk which allows to wipe filesystem signatures Even though we zap the partition table using sgdisk, the file system superblock (which contains the file system label) does survive. This can cause problems when trying to reuse a disk previously already labeled using hassos-data: It might take precendence on next boot over the existing data partition on the eMMC. Make sure to clean all file system signatures using sfdisk.	2020-12-08 01:11:00 +01:00
Stefan Agner	6672046b6f	Make the datactl command more robust (#1059 ) * Make the datactl command more robust Validate target disk (partition) size to avoid a copy attempt which will fail. If e2image operation fails, make sure the leftover copy is not regonized as data partition. * Fix hassos-data service device unit dependencies	2020-12-04 20:55:35 +01:00
Stefan Agner	46bb12844f	Rewrite datactl command (#1046 ) * Rewrite datactl command Prepare the target partition as part of the datactl command. Rely on partlabel for the target disk since we are always using GPT on the target disk. Use systemd and partlabel mechanism to wait and find the target data disk. Keep using the file system label to identify the source disk. Also use e2image instead of raw dd to move data. This should speed up the processes significantly. * Fix corner case when reusing same disk again	2020-12-03 20:05:02 +01:00
Stefan Agner	4f28a284be	Make self healing capabilities more robust (#960 ) In case a container image is corrupted `docker inspect` might fail: # docker inspect --format='{{.Id}}' "${SUPERVISOR_IMAGE}" Error response from daemon: readlink /mnt/data/docker/overlay2: invalid argument In that same state the `docker images` command still shows the images. Since `docker inspect` returns an error SUPERVISOR_IMAGE_ID will be empty and a simple `docker pull` will be attempted. That does not suffice to recover from a corrupted container image. Use `docker images` to get the image ids and make sure to delete all image ids found by that command. Also don't use RuntimeDirectory since it deletes the runtime directory between the service start attempts which defeats the purpose.	2020-11-09 13:05:54 +01:00
Stefan Agner	503117d8bf	Move RuntimeDirectory to the Service section (#957 ) RuntimeDirectory needs to be in the [Service] section to take effect.	2020-11-04 16:55:19 +01:00
Stefan Agner	2d257bd671	Simplify self healing capabilities of Supervisor service (#952 ) * Simplify self healing capabilities of Supervisor service Instead of relying on time based information on how long the container has been running use a startup marker file to infer if the last startup has been successful. * Update buildroot-external/rootfs-overlay/usr/sbin/hassos-supervisor Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch> Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>	2020-11-04 10:05:38 +01:00
Aman Gupta Karmani	a8bad54efc	automatically fsck to repair issues after an unclean shutdown (#938 ) * automatically fsck to repair partitions * add fsck.fat so rpi boot partition can be repaired * Use Wants= instead of Requires= Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch> * add dosfstools to all images * run hassos-data and hassos-expand after fsck Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>	2020-10-30 21:52:24 +01:00

1 2 3

139 Commits