Skip to content

CenterDevice/ceres-lambda

Repository files navigation

CEnteRdEvice Sre (ceres) — Lambda Functions

Ceres the goddess of agriculture, grain crops, fertility and motherly relationships sometimes used to use AWS Lambda Functions to get ops stuff done.

CenterDevice ceres lambda license MIT blue

Functions

  • aws-watchtower — Reacts on AWS Cloudwatch Events like AWS EC2 Autoscaling Group life cycle events and in case of instance terminations sets corresponding silences in Bosun to avoid unknown alerts.

    • If ASG life cycle events indicates an EC2 instance is getting scaled-down, we want a corresponding silence for the durAtion of asg.scaledown_silence_duration. In this way, we prevent unknown alarms from hitting the Slack channels. Unfortunately, these events may arrive too late and thus, the silence is set too late in which case the unknown bursts hit us anyway. The reason behind this is that — according to AWS SAs — ASGs trigger these events only after failing health checks. For this reason, we have a second mechanism based on the shutting-down state of EC2 instances. If an EC2StateChangeEvent with state shutting-down is received and the corresponding EC2 instance is part of an autoscaling group, we silence alarms, especially unknown bursts, for the duration of ec2.scaledown_silence_duration. This duration is much smaller then asg.scaledown_silence_duration. The idea is that in case an ASG scales down an EC2 instance, we receive this event much earlier than the ASG life cycle event. But this event might have other reasons. So we set the silence for a short period of time until either the ASG life cycle event sets a long silence or this silence expires and triggers alarms.

  • aws-scaletower — Checks EC2 instances for running out of IO burst balance. If a burst balance below a threshold or with predicted time until the balance is exhausted is identified, the EC2 instance will be terminated and automatically replaced by the corresponding autoscaling group.

  • security-watchtower — Checks credentials at DUO and AWS for last time used. If credentials with a longer period of inactivity are identified, those credentials will be destroyed.