Address Copilot review feedback on the cache PR:
* Split collect_tests into _collect_tests_uncached (the original
directory-based pre-cache flow) and _collect_tests_cached (walks
the tree to build per-file hashes). Without --cache we now skip
the walk + per-file hash entirely.
* A single-file root has no ancestor conftests to walk, so the
conftest_hash would always be empty and stale counts could survive
a real conftest change; bypass the cache for the file-root case.
* Save the cache file with explicit utf-8 encoding and
ensure_ascii=False.
Persist the result of pytest --collect-only between CI runs as a JSON
file keyed by content hash, so unchanged test files are served from
cache and only edited or new files are re-collected. The cache is
self-healing:
* Missing, corrupt, or wrong-version files fall back to a full collect.
* Any conftest.py change anywhere under the test root invalidates the
whole cache, so fixture parametrization shifts cannot silently skew
counts.
* Files pytest returns nothing for (helper modules named test_*.py with
no test functions) are cached as zero so they don't get re-collected
forever.
Walking is done once with os.walk (~2x faster than Path.rglob) and
collects test files plus conftests in a single pass. When the cache
is fully cold we feed pytest top-level directories rather than
thousands of file paths so cold runs stay as fast as before the cache
landed.
Wire the new --cache flag through the prepare-pytest-full job and back
the cache file with actions/cache so PRs can pick up the latest dev
snapshot via restore-keys. Local timings: cold 11s, warm with no diff
0.4s, warm with one file edited 2.3s.
Only pass directories and test_*.py files to pytest --collect-only so
helpers like tests/components/conftest.py and tests/components/common.py
are not treated as explicit collection targets, and bail out with a
clear error if no eligible paths are found instead of running pytest
with no arguments.
cProfile showed 99.6% of split_tests.py wall time was spent in the
single pytest --collect-only subprocess. Fan out the collection across
``os.cpu_count()`` workers; round-robin chunking keeps each batch
roughly equal, and tests/components is expanded one level deeper so
the ~1000 integration subdirectories distribute evenly. Local wall
time dropped from ~132s to ~11s on an 18-core box. Bucket output is
unchanged because we still parse the same pytest -qq output, just
aggregated from multiple invocations.