Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cloudwatch): support the creation of CloudWatch anomaly detection alarm #31232

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

scorbiere
Copy link
Contributor

Issue #10540

Closes #10540.

Reason for this change

This change add the possibility to create an anomaly detection alarm based on a Metric or MathExpression object.

Description of changes

Creation of a helper function in Metric or MathExpression classes which will wrap the actual metric into an anomaly detection metric expression before creating the CloudWatch alarm.

This PR also include a refactor of the CreateAlarmOptions class in order to reflect the incompatibilities between standard alarm and anomaly detection ones.

Description of how you validated changes

Executed the new integration test (packages/@aws-cdk-testing/framework-integ/test/aws-cloudwatch/test/integ.anomaly-detection-alarm.ts) on a test account.

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team August 27, 2024 23:49
@github-actions github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p1 labels Aug 27, 2024
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Aug 27, 2024
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@scorbiere scorbiere changed the title fix(aws-cloudwatch): support the creation of CloudWatch anomaly detection alarm fix(cloudwatch): support the creation of CloudWatch anomaly detection alarm Aug 27, 2024
@aws-cdk-automation aws-cdk-automation dismissed their stale review August 27, 2024 23:53

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Aug 28, 2024
packages/aws-cdk-lib/aws-cloudwatch/README.md Show resolved Hide resolved
packages/aws-cdk-lib/aws-cloudwatch/lib/alarm.ts Outdated Show resolved Hide resolved
packages/aws-cdk-lib/aws-cloudwatch/lib/metric.ts Outdated Show resolved Hide resolved
Comment on lines +230 to +237
if (isAnomalyDetection) {
// Ensure returnData is set to true for both metrics in anomaly detection
const cfnMetrics = alarm.metrics as any[];
if (cfnMetrics && cfnMetrics.length >= 2) {
cfnMetrics[0].returnData = true;
cfnMetrics[1].returnData = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is cfnMetrics used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in this block. To make sure the returnData property is set to true. TBH, the documentation is not super clear about which one should have the property set to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe provide the documentation in the comment? This code seems weird and providing the doc in the comment for future reference would help.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 8179b2d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

packages/aws-cdk-lib/aws-cloudwatch/README.md Show resolved Hide resolved
To create an alarm based on anomaly detection, you can use the `createAnomalyDetectionAlarm` method on a `Metric` or `MathExpression` object:

```ts
declare const queue: sqs.Queue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: customers may not get why using a queue here. Can we use the example in the integration test?

    const metric = new Metric({
      namespace: 'AWS/EC2',
      metricName: 'CPUUtilization',
      statistic: 'Average',
      period: Duration.minutes(5),
    });

    metric.createAnomalyDetectionAlarm(this, 'AnomalyAlarm', {
      bounds: 2,
      evaluationPeriods: 3,
      datapointsToAlarm: 2,
      comparisonOperator: ComparisonOperator.LESS_THAN_LOWER_OR_GREATER_THAN_UPPER_THRESHOLD,
    });

throw new Error(`Invalid comparison operator for anomaly detection alarm: ${props.comparisonOperator}`);
}
const anomalyDetectionExpression = new MathExpression({
expression: `ANOMALY_DETECTION_BAND(m0, ${props.bounds ?? 1})`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since bounds is a required property, do we need null check here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is useless now that the property is required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we have a default value for it?

@@ -190,7 +213,8 @@ export class Alarm extends AlarmBase {

// Evaluation
comparisonOperator,
threshold: props.threshold,
threshold: isAnomalyDetection ? undefined : props.threshold,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if it's anomaly detection and threshold value is passed in? Does it ignore the value or throw an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the comparison operator is for an anomaly detection alarm, there will be an error when CloudFormation will try to execute the template. Because providing a threshold for anomaly detection is not valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this validation on CDK side to preemptively throw?

@xazhao
Copy link
Contributor

xazhao commented Aug 29, 2024

General question: There's a CfnAnomalyDetector L1 construct which is not used here. Could you help me understand the diff between CfnAnomalyDetector and what we are achieving in this PR?

@scorbiere scorbiere changed the title fix(cloudwatch): support the creation of CloudWatch anomaly detection alarm feat(cloudwatch): support the creation of CloudWatch anomaly detection alarm Aug 30, 2024
@@ -190,7 +213,8 @@ export class Alarm extends AlarmBase {

// Evaluation
comparisonOperator,
threshold: props.threshold,
threshold: isAnomalyDetection ? undefined : props.threshold,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this validation on CDK side to preemptively throw?

@@ -206,12 +230,22 @@ export class Alarm extends AlarmBase {
...metricProps,
});

if (isAnomalyDetection) {
// Ensure returnData is set to true for both metrics in anomaly detection
const cfnMetrics = alarm.metrics as any[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make it stronger type instead or any?

Comment on lines +230 to +237
if (isAnomalyDetection) {
// Ensure returnData is set to true for both metrics in anomaly detection
const cfnMetrics = alarm.metrics as any[];
if (cfnMetrics && cfnMetrics.length >= 2) {
cfnMetrics[0].returnData = true;
cfnMetrics[1].returnData = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe provide the documentation in the comment? This code seems weird and providing the doc in the comment for future reference would help.

@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Aug 30, 2024
Copy link
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API-wise, it seems that to create an Anomaly detection alarm, 3 things need to happen:

  • You must set thresholdMetricId
  • You must wrap metric in a MetricMath that uses ANOMALY_DETECTION_BAND
  • You must not set a threshold
  • For an operator you must pick one of the special 3 comparison operators.

That feels like a lot of work, no?

Comment on lines +19 to +24
/**
* In an alarm based on an anomaly detection model, this is the ID of the ANOMALY_DETECTION_BAND function used as the threshold for the alarm.
*
* @default - No metric id.
*/
readonly thresholdMetricId?: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really expose or rely on people specifying metric IDs anywhere else.

Why don't we expect them to pass in a MetricMath object here, which must of the kind ANOMALY_DETECTION_BAND, and we render it into the metrics[] array and render its id correctly to the underlying L1 property?

(Or in fact if not given we could create the ANOMALY_DETECTION_BAND() automatically with a sensible stdDev like in createAnomalyDetectionAlarm)

@@ -190,7 +213,8 @@ export class Alarm extends AlarmBase {

// Evaluation
comparisonOperator,
threshold: props.threshold,
threshold: isAnomalyDetection ? undefined : props.threshold,
thresholdMetricId: isAnomalyDetection ? (props.thresholdMetricId || 'expr_1') : undefined,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How likely is expr_1 to be correct?

Comment on lines +237 to +238
cfnMetrics[0].returnData = true;
cfnMetrics[1].returnData = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the logic here, in that we have at least 2 metrics (but could be 5 or 10), and we switch the first 2 to returnData ? How do we know these are the right ones? Which ones are you trying to target?

return new Alarm(scope, id, {
...props,
metric: anomalyDetectionExpression,
threshold: 0, // This value will be ignored for anomaly detection alarms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we make threshold optional then, and do a runtime validation for its presence?

* - The minimum value should be greater than 0. A value of 0 or negative values would not make sense in the context of calculating standard deviations.
* - There is no strict maximum value defined, as standard deviations can theoretically extend infinitely. However, in practice, values beyond 5 or 6 standard deviations are rarely used, as they would result in an extremely wide anomaly detection band, potentially missing significant anomalies.
*/
readonly bounds: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we call this stdDevs then?

/**
* Comparison to use to check if metric is breaching
*/
readonly comparisonOperator: ComparisonOperator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "within bounds" operator would make a nice default here.

* @param metric The metric to create the alarm for
* @returns The newly created Alarm
*/
public static createAnomalyDetectionAlarmFromMetric(scope: Construct, id: string, props: CreateAnomalyDetectionAlarmProps, metric: IMetric): Alarm {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that this function is intended as a convenience function -- the main interface for users is still new Alarm(), and so that API should still be ergonomic.

Don't make design decisions that make it easy to implement this function, but harder to use the new Alarm() API.

Copy link
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. contribution/core This is a PR that came from AWS. effort/medium Medium work item – several days of effort p1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[aws-cloudwatch] Missing ThresholdMetricId
5 participants