Chasing a Ghost Worker: Debugging Stale Sidekiq Jobs

Today was one of those days where the problem is simpler than it looks — but getting to the answer required digging through Redis internals.

The Mystery: `NameError` for a Class That Doesn’t Exist

The morning started with a production alert: Sidekiq throwing NameError: uninitialized constant LegacyWorker on repeat. Confusing, because I had just removed that worker. The refactor inlined the Redis call directly in UserEvent#increment_user_event_count — INCR is atomic and takes microseconds, so there’s no reason to involve Sidekiq at all.

def increment_user_event_count
  # Direct Redis call - no need for Sidekiq job, INCR is atomic and takes microseconds
  $redis.incr("user_event_count_#{self.event_id}")
rescue Redis::BaseError => e
  Rails.logger.warn("Failed to increment user_event_count for event #{self.event_id}: #{e.message}")
rescue => e
  Rails.logger.warn("Unexpected error incrementing user_event_count: #{e.message}")
end

So why was Sidekiq still complaining about a class I’d deleted?

First Hypothesis: Old Jobs in the Retry Set

My first instinct was right — jobs already enqueued in Redis were cycling through Sidekiq’s retry mechanism. Delete the worker, jobs still live in Redis, Sidekiq tries to process them, can’t find the class, throws NameError, reschedules for retry. Classic.

Sidekiq::RetrySet.new.select { |j| j.klass == "LegacyWorker" }.each(&:delete)

Seemed to work… until the errors came back almost immediately. Something was actively re-enqueuing them.

The Real Problem: 11K Jobs in the Default Queue

I checked the actual queue — not just the retry set:

Sidekiq::Queue.new("default").count { |j| j.klass == "LegacyWorker" }
# => 11645

11,645 jobs sitting in the default queue. The cycle: job fails → moves to retry set → I delete it → next job from the queue fails → repeat. I was bailing water with a thimble.

The select approach also had a subtle bug at scale — with 40,000+ retry entries, pagination caused it to miss jobs between iterations. Switched to each inside a loop:

loop do
  count = 0
  Sidekiq::Queue.new("default").each { |j| j.delete and count += 1 if j.klass == "LegacyWorker" }
  Sidekiq::RetrySet.new.each { |j| j.delete if j.klass == "LegacyWorker" }
  puts "Deleted #{count}"
  break if count == 0
end

Lessons Learned

Drain existing jobs when you delete a worker class. “Nothing’s calling it anymore” doesn’t stop the backlog from erroring out until it’s fully cleared.

select vs each matters at scale. With tens of thousands of entries, select paginates and misses jobs mid-iteration. each traverses the full set. For large backlogs, loop until the count hits zero.

Check the queue, not just the retry set. The real reservoir was the main default queue. Always check all four:

Sidekiq::Queue.new("default")
Sidekiq::RetrySet.new
Sidekiq::ScheduledSet.new
Sidekiq::DeadSet.new

I’ve wrapped all of this into a sidekiq_tricks.md reference doc. Sidekiq’s console API is powerful but not exactly memorable — a cheat sheet is worth keeping around.

Chasing a Ghost Worker: Debugging Stale Sidekiq Jobs

Chasing a Ghost Worker: Debugging Stale Sidekiq Jobs

The Mystery: NameError for a Class That Doesn’t Exist

First Hypothesis: Old Jobs in the Retry Set

The Real Problem: 11K Jobs in the Default Queue

Lessons Learned

The Mystery: `NameError` for a Class That Doesn’t Exist