Not all disks have all SMART attributes available, e.g. Sentry showed
devices with missing "wctemp". In practice, any SMART attribute could
be missing. Make sure we handle this gracefully.
* Use context manager for Job concurrency control
* Allow to release lock outside of Job running context
* Improve JobGroup locking with external ownership tracking
Track lock ownership by job UUID instead of execution context. This
allows external lock release via job parameter.
* Fix acquire lock in nested Jobs
* Simplify nested lock tracking
* Simplify Job group lock acquisition logic
* Simplify by using helper methods
* Allow throttling with group concurrency
* Use Lock instead of Semaphore for job concurrency control
Use the same synchronization primitive (Lock) for job concurrency
control as used in job groups.
* Go back to lock ownership tracking with references
* Drop unused property `active_job_id`
* Drop unused property `can_acquire`
* Replace assert with cast
* Add unsupported reason os_version and evaluation
* Order enum and add tests
* Apply suggestions from code review
* Apply suggestions from code review
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
This avoids that we display a 10% life time use for a brand new
eMMC storage. The values are estimates anyways, and there is a
separate value which represents life time completely used (100%).
* Block OS updates when the system is unhealthy
In #6024 we mark a system as unhealthy when multiple OS installations
were found. The idea was to block OS updates in this case. However, it
turns out that the OS update job was not checking the system health
and thus allowed updates even when the system was marked as unhealthy.
This commit adds the `JobCondition.HEALTHY` condition to the OS update
job, ensuring that OS updates are only performed when the system is
healthy.
Users can force an OS update still by using
`ha jobs options --ignore-conditions healthy`.
* Add test for update of unhealthy system
---------
Co-authored-by: Jan Čermák <sairon@sairon.cz>
* Split execution limit in concurrency and throttle parameters
Currently the execution limit combines two ortogonal features: Limit
concurrency and throttle execution. This change separates the two
features, allowing for more flexible configuration of job execution.
Ultimately I want to get rid of the old limit parameter. But for ease
of review and migration, I'd like to do this in two steps: First
introduce the new parameters, and map the old limit parameters to the
new parameters. Then, in a second step, remove the old limit parameter
and migrate all users to the new concurrency and throttle parameters
as needed.
* Introduce common lock release method
* Fix THROTTLE_WAIT behavior
The concurrency QUEUE does not really QUEUE throttle limits.
* Add documentation for new concurrency/throttle Job options
* Handle group options for concurrency and throttle separately
* Fix GROUP_THROTTLE_WAIT concurrency setting
We need to use the QUEUE concurrency setting instead of GROUP_QUEUE
for the GROUP_THROTTLE_WAIT execution limit. Otherwise the
test_jobs_decorator.py::test_execution_limit_group_throttle_wait
test deadlocks.
The reason this deadlocks is because GROUP_QUEUE concurrency doesn't
really work because we only can release a group lock if the job is
actually running.
Or put differently, throttling isn't supported with GROUP_*
concurrency options.
* Prevent using any throttling with group concurrency
The group concurrency modes (reject and queue) are not compatible with
any throttling, since we currently can't unlock the group lock when
a job doesn't get started (which is the case when throttling is
applied).
* Fix commit in group rate limit
* Explain the deadlock issue with group locks in code
* Handle locking correctly on throttle limit exceptions
* Introduce pytest for new job decorator combinations
* Enable IPv6 by default for new installations
Enable IPv6 by default for new Supervisor installations. Let's also
make the `enable_ipv6` attribute nullable, so we can distinguish
between "not set" and "set to false".
* Add pytest
* Add log message that system restart is required for IPv6 changes
* Fix API pytest
* Create resolution center issue when reboot is required
* Order log after actual setter call
* Add resolution check for duplicate OS installations
* Only create single issue/use separate unhealthy type
* Check MBR partition UUIDs as well
* Use partlabel
* Use generator to avoid code duplication
* Add list of devices, avoid unnecessary exception handling
* Run check only on HAOS
* Fix message formatting
* Fix and simplify pytests
* Fix UnhealthyReason sort order
* Check for duplicate data disks only when the OS is available
Supervised installations do not have a specific data disk, so only
check for duplicate data disks on Home Assistant OS.
* Enable OS for multiple data disks check test
* Drop ensure_builtin_repositories
With the new Repository classes we have the is_builtin property, so we
can easily make sure that built-ins are not removed. This allows us to
further cleanup the code by removing the ensure_builtin_repositories
function and the ALL_BUILTIN_REPOSITORIES constant.
* Make sure we add built-ins on load
* Reuse default set and avoid unnecessary copy
Reuse default set and avoid unnecessary copying during validation if
the default is not being used.
* Add Supervisor connectivity check after DNS restart
When the DNS plug-in got restarted, check Supervisor connectivity
in case the DNS plug-in configuration change influenced Supervisor
connectivity. This is helpful when a DHCP server gets started after
Home Assistant is up. In that case the network provided DNS server
(local DNS server) becomes available after the DNS plug-in restart.
Without this change, the Supervisor connectivity will remain false
until the a Job triggers a connectivity check, for example the
periodic update check (which causes a updater and store reload) by
Core.
* Fix pytest and add coverage for new functionality
* Rename repository fixture to test_repository
Also don't remove the built-in repositories. The list was incomplete,
and tests don't seem to require that anymore.
* Get rid of StoreType
The type doesn't have much value, we have constant strings anyways.
* Introduce types.py
* Use slug to determine which repository urls to return
* Simplify BuiltinRepository enum
* Mock GitRepo load
* Improve URL handling and repository creation logic
* Refactor update_repositories
* Get rid of get_from_url
It is no longer used in production code.
* More refactoring
* Address pylint
* Introduce is_git_based property to Repository class
Return all git based URLs, including the Core repository.
* Revert "Introduce is_git_based property to Repository class"
This reverts commit dfd5ad79bf.
* Fold type.py into const.py
Align more with how Supervisor code is typically structured.
* Update supervisor/store/__init__.py
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Apply repository remove suggestion
* Fix tests
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
When authentication using JSON payload or URL encoded payload fails,
use the generic HTTP response code 401 Unauthorized instead of 400
Bad Request.
This is a more appropriate response code for authentication errors
and is consistent with the behavior of other authentication methods.
* Improve DNS plug-in restart
Instead of simply go by PrimaryConnectioon change, use the DnsManager
Configuration property. This property is ultimately used to write the
DNS plug-in configuration, so it is really the relevant information
we pass on to the plug-in.
* Check for changes and restart DNS plugin
* Check for changes in plug-in DNS
Cache last local (NetworkManager) provided DNS servers. Check against
this DNS server list when deciding when to restart the DNS plug-in.
* Check connectivity unthrottled in certain situations
* Fix pytest
* Fix pytest
* Improve test coverage for DNS plugins restart functionality
* Apply suggestions from code review
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Debounce local DNS changes and event based connectivity checks
* Remove connection check logic
* Remove unthrottled connectivity check
* Fix delayed call
* Store restart task and cancel in case a restart is running
* Improve DNS configuration change tests
* Remove stale code
* Improve DNS plug-in tests, less mocking
* Cover multiple private functions at once
Improve tests around notify_locals_changed() to cover multiple
functions at once.
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Use Docker BuildKit to build addons
* Improve error message as suggested by CodeRabbit
* Fix container.remove() tests missing v=True
* Ignore squash rather than falling back to legacy builder
* Use version rather than tag to avoid confusion in run_command()
* Fix tests differently
* Use PropertyMock like other tests
* Restore position of fix_label fn
* Exempt addon builder image from unsupported checks
* Refactor tests
* Fix tests expecting wrong builder image
* Remove harcoded paths
* Fix tests
* Remove get_addon_host_path() function
* Use docker buildx build rather than docker build
Co-authored-by: Stefan Agner <stefan@agner.ch>
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Rename detect-blocking-io API value to match other APIs
For the new detect-blocking-io option, use dashes instead of
underscores in `on-at-startup` for consistency with other API
endpoints.
This is a breaking change, but since the API is really new and not
really used yet, it is fairly safe to do so.
* Fix pytest
* Fix mypy issues in store module
* Fix mypy issues in utils module
* Fix mypy issues in all remaining source files
* Fix ingress user typeddict
* Fixes from feedback
* Fix mypy issues after installing docker-types
* Avoid aiodns resolver memory leak
In certain cases, the aiodns resolver can leak memory. This also
leads to Fatal `Python error… ffi.from_handle()`. This addresses
the issue by ensuring that the resolver is properly closed
when it is no longer needed.
* Address coderabbitai feedback
* Fix pytest
* Fix pytest
Configurable and w/ migrations between IPv4-Only and Dual-Stack
Signed-off-by: David Rapan <david@rapan.cz>
Co-authored-by: Stefan Agner <stefan@agner.ch>
This reverts commit 63fde3b410.
This change introduced another more severe regression, causing all
add-ons that haven't been started since Supervisor startup to cause
errors during their backup. More sophisticated check would have to be
implemented to address edge cases during backups for non-existing
add-ons (or their config actually).
Fixes#5924
* Avoid early DNS plug-in start
A connectivity check can potentially be triggered before the DNS
plug-in is loaded. Avoid calling restart on the DNS plug-in before
it got initially loaded. This prevents starting before attaching.
The attaching makes sure that the DNS plug-in container is recreated
before the DNS plug-in is initially started, which is e.g. needed
by a potentially hassio network configuration change (e.g. the
migration required to enable/disable IPv6 on the hassio network,
see #5879).
* Mock DNS plug-in running
* Use journal-gatewayd's new /boots endpoint to list boots
Current method we use for getting boots has several known downsides, for
example it can miss some incomplete boots and the performance might be
worse than what we could get by using Systemd directly. Systemd was
missing a method to get list boots through the journal-gatewayd but that
should be addressed by the new /boots endpoint added in [1] which
returns application/json-seq response containing all boots as reported
in `journalctl --list-boots`.
Implement Supervisor methods to parse this format and use the endpoint
at first, falling back to the old method if it fails.
[1] https://github.com/systemd/systemd/pull/37574
* Log info instead of warning when /boots is not present
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Split records only by RS instead of LF in journal_boots_reader
* Strip only RS, json.loads is fine with whitespace
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
Process NetworkManager interface updates in case PrimaryConnection
changes. This makes sure that the /network/interface/default/info
endpoint can be used to get the IP address of the primary interface.
* Use add-on config timestamp to determine add-on update age
Instead of using the current timestamp when loading the add-on config,
simply use the add-on config modification timestamp. This way, we can
get a timetsamp even when Supervisor got restarted. It also simplifies
the code a bit.
* Fix pytest
* Patch stat() instead of modifing fixture files
* feat: Add IPv6 address generation mode & privacy extensions
Signed-off-by: David Rapan <david@rapan.cz>
* Use NetworkManager fixture for settings init tests
This fixes the test by since the extended implementation now can read
the version of NetworkManager.
* Add pytest for addr_gen_mode
---------
Signed-off-by: David Rapan <david@rapan.cz>
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Trigger auto-update through Core WebSocket call
Instead of auto-updating add-ons on Supervisor side trigger an update
through Core via a WebSocket command. This makes sure that the backup
is categorized correctly and all backup features like retention are
applied.
* Add pytest
* Fix pytest
* Fix pytest
* Fix pytest
* Fix pytest
* Fix pytest cleaner
* Set timestamp of add-on far into the past
Instead of copying the backup in the main job, lets copy them in
separate job per location. This allows to use the same backup error
handling mechanism as for add-ons and folders.
This makes the stage introduced in #5784 somewhat redundant, but
before removing it, let's see if this approach works out.
* Harmonize folder and add-on backup error handling
Align add-on and folder backup error handling in that in both cases
errors are recorded on the respective backup Jobs, but not raised to
the caller. This allows the backup to complete successfully even if
some add-ons or folders fail to back up.
Along with this, also record errors in the per-add-on and per-folder
backup jobs, as well as the add-on and folder root job.
And finally, align the exception handling to only catch expected
exceptions for add-ons too.
* Fix pytest