Determine an ideal max worker count based on true capacity. The true value
is rarely attainable in a thread-worker model so the result is clamped by
both the rlimit-derived budget and the static WORKER_LIMIT range.
Map cores to their associated hardware queue. Shared queues repeat across
cores; sparse unavailable cores default to 0; undetected hardware falls back
to the core identity as a best-guess maintaining core locality.
Determine the worker groupings. Each indice represents a hardware queue and
contains the number of workers which will service it. The vector is
truncated to the number of cores on systems with multiple hardware queues
per core, and the per-pool count is capped well below NVMe capacity.
Sum the total number of possible tags. Without hardware detection this
reduces to the default worker count. The thread-worker model never
approaches actual NVMe capacity, but the value still informs request
capacity downstream.