-
Notifications
You must be signed in to change notification settings - Fork 18
Monitoring and Alerting
The current monitoring and alerting implementations are simplistic. They simply write log messages that can be interpreted by Librato:
-
rake monitor:failed
- Check the number of failed jobs and stat the results to the log in a way Librato can understand -
rake monitor:queue_sizes
- Stat the sizes of all queues to the log in a way Librato can understand -
rake monitor:stale_workers
- Check the number of stale workers and stat the results to the log in a way Librato can understand
This is because that's what Stitch Fix uses for monitoring and alerting, so it gives me the chance to try it out in a real production environment.
You may not want to do it this way, but the code is designed to be re-usable to your purposes.
Here's how the failed worker check is setup:
task :failed => :environment do
monitor = Monitoring::Monitor.new(
checker: Monitoring::FailedJobCheck.new,
notifier: Monitoring::LibratoNotifier.new(prefix: "resque.failed_jobs"))
monitor.monitor!
end
Monitoring::Monitor
is completely agnostic and essentially just calls the checker and passes the results to the notifier.
The checker simply implements check!
and returns an array of CheckResult
objects. There are three checkers included:
-
Monitoring::FailedJobCheck
- checks for failed jobs -
Monitoring::QueueSizeCheck
- checks the size of each queue in each resque -
Monitoring::StaleWorkerCheck
- checks for workers running "too long" (based on the configuration of eachResqueInstance
)
Generally, these checkers can be used for the things you should care about about your Resques. The notifications are what you will want to customize.
There are only two notifiers now, Monitoring::LibratoNotifier
and Monitoring::PerQueueLibratoNotifier
. These simply log counts in a way that Librato can interpret.
You can simple implement your own to do whatever. For example, you could have a MailerNotifier
that might look like this:
module Monitoring
class MailerNotifier < Notifier
def initialize(mailer_class: nil)
@mailer_class = mailer_class
end
def notify!(check_results)
check_results.each do |check_result|
if items.size > 0
@mailer_class.notify_mail(check_result.resque_name,check_result.check_count)
end
end
end
end
end
This is just a sketch, but you get the idea. To use this, just add or change the rake task:
task :failed => :environment do
monitor = Monitoring::Monitor.new(
checker: Monitoring::FailedJobCheck.new,
notifier: Monitoring::MailerNotifier.new(mailer_class: FailedJobMailer)
monitor.monitor!
end
Again, just an example.