Skip to content
This repository has been archived by the owner on Mar 14, 2019. It is now read-only.

Monitoring and Alerting

David Copeland edited this page Jan 31, 2016 · 3 revisions

The current monitoring and alerting implementations are simplistic. They simply write log messages that can be interpreted by Librato:

  • rake monitor:failed - Check the number of failed jobs and stat the results to the log in a way Librato can understand
  • rake monitor:queue_sizes - Stat the sizes of all queues to the log in a way Librato can understand
  • rake monitor:stale_workers - Check the number of stale workers and stat the results to the log in a way Librato can understand

This is because that's what Stitch Fix uses for monitoring and alerting, so it gives me the chance to try it out in a real production environment.

You may not want to do it this way, but the code is designed to be re-usable to your purposes.

Adding Your Own

Here's how the failed worker check is setup:

task :failed => :environment do
  monitor = Monitoring::Monitor.new(
     checker: Monitoring::FailedJobCheck.new,
    notifier: Monitoring::LibratoNotifier.new(prefix: "resque.failed_jobs"))
  monitor.monitor!
end

Monitoring::Monitor is completely agnostic and essentially just calls the checker and passes the results to the notifier.

Checkers

The checker simply implements check! and returns an array of CheckResult objects. There are three checkers included:

  • Monitoring::FailedJobCheck - checks for failed jobs
  • Monitoring::QueueSizeCheck - checks the size of each queue in each resque
  • Monitoring::StaleWorkerCheck - checks for workers running "too long" (based on the configuration of each ResqueInstance)

Generally, these checkers can be used for the things you should care about about your Resques. The notifications are what you will want to customize.

Notifiers

There are only two notifiers now, Monitoring::LibratoNotifier and Monitoring::PerQueueLibratoNotifier. These simply log counts in a way that Librato can interpret.

You can simple implement your own to do whatever. For example, you could have a MailerNotifier that might look like this:

module Monitoring
  class MailerNotifier < Notifier
    def initialize(mailer_class: nil)
      @mailer_class = mailer_class
    end

    def notify!(check_results)
      check_results.each do |check_result|
        if items.size > 0
          @mailer_class.notify_mail(check_result.resque_name,check_result.check_count)
        end
      end
    end
  end
end

This is just a sketch, but you get the idea. To use this, just add or change the rake task:

task :failed => :environment do
  monitor = Monitoring::Monitor.new(
     checker: Monitoring::FailedJobCheck.new,
    notifier: Monitoring::MailerNotifier.new(mailer_class: FailedJobMailer)
  monitor.monitor!
end

Again, just an example.