After eight major versions and ten years of accumulating features, compatibility shims, and Redis keys, I needed to stop and ask: what does a lock actually need?
The answer turned out to be two Redis keys.
v9 is a ground-up rewrite. I deleted ~5,700 lines of code, reduced Redis keys per lock from 13 to 2, and shipped built-in metrics so you can finally see your locks working. Here's how I got there and what it means for your applications.
The problem with v8
Every lock in v8 created up to 13 Redis keys: QUEUED lists, PRIMED lists, INFO hashes, LOCKED hashes, RUN suffixes, changelog entries, expiring digest sets, and more. Each key had its own lifecycle, its own cleanup logic, and its own failure mode.
This complexity had real costs:
- Performance: Lock acquisition required multiple Lua scripts touching many keys. Under contention, throughput suffered.
- Memory: 13 keys per lock meant significant Redis memory overhead for applications with millions of unique digests.
- Reliability: More moving parts meant more ways for locks to become orphaned. The reaper system grew to 615 lines across 6 files just to clean up after itself.
- Debugging: When a lock got stuck, figuring out why meant inspecting a dozen Redis keys and understanding which Lua script was responsible for each one.
I maintained backward compatibility through all of this. Every edge case got its own workaround. The codebase reflected a decade of "just add one more key."
v9 asked: what if I didn't?
Two keys. That's it.
uniquejobs::LOCKED # Hash — who holds the lock (JID → metadata)
uniquejobs:digests # ZSet — global index of all active digests
The LOCKED hash tells you everything: which job IDs hold the lock, when they acquired it, what worker class and queue they belong to, and what lock type is active. The digests sorted set is a global index that the reaper uses to find orphaned locks.
That's the entire data model.
How locking works
The lock Lua script is 38 lines:
-- Already locked by this job? Idempotent return.
if redis.call("HEXISTS", locked, job_id) == 1 then
return job_id
end
-- At capacity? Deny.
if redis.call("HLEN", locked) >= limit then
return nil
end
-- Acquire: store metadata, register in global index.
redis.call("HSET", locked, job_id, metadata)
redis.call("ZADD", digests, score, digest)
-- Apply TTL if configured.
if pttl and pttl > 0 then
redis.call("PEXPIRE", locked, pttl)
end
return job_id
Two Redis commands for a lock. One Lua script. No intermediate states, no QUEUED-to-PRIMED transitions, no race windows between steps. The lock either exists in the hash or it doesn't.
Unlocking is similarly minimal: HDEL the job from the hash, and if the hash is empty, ZREM the digest and UNLINK the key. Zero keys remain after the last holder releases.
Why this matters
In v8, a crashed process could leave behind orphaned QUEUED, PRIMED, INFO, and RUN keys that the reaper might miss. In v9, a crashed process leaves behind exactly one thing: a LOCKED hash entry. The reaper scans the digests ZSET, checks if the LOCKED hash still exists, and removes stale entries. It's 27 lines of Lua.
Performance
I benchmarked v8.0.12, v8.1.0, and v9.0.0 on the same hardware:
Benchmark | v8.0.12 | v8.1.0 | v9.0.0 | Improvement |
|---|---|---|---|---|
Lock + Unlock | 6,520 i/s | 6,855 i/s | 8,684 i/s | +33% |
Lock + Execute | 8,445 i/s | -- | 14,406 i/s | +71% |
No contention | 1,805 i/s | ~6,855 i/s | 8,019 i/s | +344% |
Under contention | 2,920 i/s | ~5,400 i/s | 5,461 i/s | +87% |
500 jobs / 20 threads | 3.18 i/s | 9.54 i/s | 9.52 i/s | +199% |
Memory usage dropped significantly:
Scenario | v8.0.12 | v9.0.0 | Reduction |
|---|---|---|---|
100 lock/unlock cycles | ~2.4 MB | 1.78 MB | -25% |
100 execute cycles | ~2.8 MB | 1.33 MB | -52% |
The no-contention case improved by 344% because v8 was doing unnecessary work checking QUEUED and PRIMED lists on every lock attempt. v9 does a single HEXISTS + HLEN + HSET.
Lock metrics: seeing your locks in action
v8 had no built-in way to answer "how many locks were acquired this hour?" or "which lock type has the most denials?" You could configure the reflection system and build your own tracking, but most people didn't.
v9 includes built-in metrics with zero configuration. Every lock acquisition, denial, release, and failure is recorded directly to Redis:
uniquejobs:metrics|260328|16:42 # Hash — counters per lock type per minute
The Sidekiq Web UI "Locks" tab now shows a metrics table:
Type | Acquired | Denied | Released | Failures |
|---|---|---|---|---|
until_executed | 1,204 | 87 | 1,191 | 0 |
while_executing | 523 | 0 | 523 | 2 |
until_and_while_executing | 312 | 14 | 640 | 0 |
total | 2,039 | 101 | 2,354 | 2 |
These numbers tell you things that were previously invisible:
- Acquired vs Denied reveals your conflict rate. A high denial rate on
until_executedmight mean your jobs are too slow or your lock TTL is too long. - Released > Acquired for
until_and_while_executingis normal — this lock type acquires once on the client but releases twice (the "until" lock on the server, then the "while executing" runtime lock after the job completes). - Failures > 0 means something went wrong during execution. Check your error tracker.
Cross-process recording
Getting metrics right was trickier than I expected. My first implementation used the reflection system: lock events fired reflections, and a server-side listener accumulated counters in memory, flushing to Redis every 60 seconds.
The problem: reflections only have listeners in the Sidekiq server process. When a job is enqueued from a Rails web process, the lock is acquired there — where nobody is listening. The until_executing lock type showed 0 Acquired despite working correctly because all its locks are acquired client-side.
The fix was direct Redis recording via HINCRBY from the locksmith itself, bypassing the reflection pipeline entirely for metrics. Now every lock acquisition and release writes to Redis immediately, regardless of which process it happens in.
ReliableFetch
v9 includes an optional fetch strategy that provides crash recovery:
Sidekiq.configure_server do |config|
config[:fetch_class] = SidekiqUniqueJobs::Fetch::Reliable
end
Standard Sidekiq uses BRPOP to fetch jobs from queues. If the process crashes between fetching and completing a job, that job is lost. ReliableFetch uses LMOVE to atomically move jobs from the queue to a per-process working list:
local job = redis.call("LMOVE", queue, working, "RIGHT", "LEFT")On startup, the fetch strategy scans for orphaned working lists from dead processes and requeues their jobs. During graceful shutdown, it requeues in-progress jobs while preserving their locks (so duplicates don't slip in during the restart window).
The fetch script also validates locks at pop time — if a job's lock has already been released (maybe by the reaper), the job is still processed but the server knows not to try unlocking.
What I deleted
The most satisfying part of this rewrite was deletion:
Component | v8 | v9 |
|---|---|---|
Locksmith | 449 lines | 163 lines |
Reaper | 615 lines (6 files) | 27 lines (1 Lua script) |
Lua scripts | 15 scripts | 8 scripts |
README | 1,082 lines | 180 lines |
Net lines | -- | -5,700 |
The entire orphan cleanup system — Manager, Observer, RubyReaper, LuaReaper, Resurrector — collapsed into a single 27-line Lua script that scans the digests ZSET and removes entries whose LOCKED hash no longer exists.
The QUEUED list, PRIMED list, INFO hash, and RUN suffix key — all gone. These existed to support a state machine (queued → primed → locked → executing) that added complexity without adding safety. The v9 model is binary: you're in the LOCKED hash or you're not.
The Changelog feature (a Redis ZSET recording every lock operation) was removed entirely. It wrote to Redis on every lock and unlock, consuming memory and I/O for data that few users ever looked at. The reflection system provides the same observability without the storage cost, and the new metrics give you the aggregate view that's actually useful.
Supply chain security
While I was at it, I modernized the release process:
- OIDC Trusted Publishing: No long-lived API keys. RubyGems.org verifies the GitHub Actions workflow identity via OpenID Connect.
- Sigstore attestations: Every gem is signed with a keyless signature logged in a public transparency log.
- SHA-256 + SHA-512 checksums: Generated in CI and attached to every GitHub release.
- Gem content verification: CI unpacks the built gem and fails if test files, rake tasks, or other dev artifacts leaked in.
Releasing is one command:
rake release[9.0.0]
Upgrading from v8
v9 automatically migrates your lock data on first startup. The UpgradeLocks module scans for v8 key patterns (QUEUED, PRIMED, INFO, RUN variants), removes the obsolete keys, and merges the old expiring_digests sorted set into the unified digests ZSET.
No manual steps. No downtime. Start the new version and it handles the rest.
- Ruby: 3.2+ (was 2.7+)
- Sidekiq: 8.0+ (was 7.0+)
- Redis: 6.2+ (for LMOVE support)
What's next
v9.0.0.alpha1 is available on RubyGems. I'm running it in production and stabilizing for a final release. If you're on Sidekiq 8+ and Ruby 3.2+, give it a try:
gem "sidekiq-unique-jobs", "~> 9.0.0.alpha1"
