Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cloudwatch): set threshold metric id for anomaly detection band alarms #10545

Conversation

rrhodes
Copy link
Contributor

@rrhodes rrhodes commented Sep 25, 2020

Adding support for the ThresholdMetricId CloudWatch alarm property. This is required for alarms based on metrics applying the CloudWatch ANOMALY_DETECTION_BAND function.

Fixes #10540

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Copy link
Contributor

@NetaNir NetaNir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrhodes thanks for submitting this!
See my comments in the code. This seems more like a feature than a fix? If that's the case can you add a README section? It will be easier to discuss the API

@@ -162,6 +168,9 @@ export class Alarm extends AlarmBase {
extendedStatistic: renderIfExtendedStatistic(props.statistic),
});
}
if (ANOMALY_DETECTION_COMPARISON_OPERATORS.includes(comparisonOperator) && !props.thresholdMetricId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this break existing customers using ant of the comparison operators in ANOMALY_DETECTION_COMPARISON_OPERATORS?

Copy link
Contributor Author

@rrhodes rrhodes Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following issue #10540, my understanding is that customers could not apply any of these anomaly detection comparison operators without a threshold metric ID. This leads me to believe it would not break any existing customer functionality? If I've misunderstood the problem I'll be happy to revisit this.

*
* @default - Not configured.
*/
readonly thresholdMetricId?: string;
Copy link
Contributor

@NetaNir NetaNir Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat under modeled, can this be Metric? Which gets me to wonder if this is the right API. How will this look with metric.CreateAlarm() ?

Copy link
Contributor Author

@rrhodes rrhodes Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could explore a ‘thresholdMetric’ property instead of type Metric which under the hood would extract the ID and assign it to ‘thresholdMetricId’ in the CloudFormation construct, if that would be preferred?

I thought it would be best to keep this similar to the CloudFormation implementation, which just expects the ID.

I've amended metric.createAlarm() to pull the metric threshold ID from CreateAlarmOptions, so it would look like

createAlarm(this, 'Alarm', {
  ...,
  thresholdMetricId: 'ad1',
})

@rrhodes
Copy link
Contributor Author

rrhodes commented Sep 29, 2020

@rrhodes thanks for submitting this!
See my comments in the code. This seems more like a feature than a fix? If that's the case can you add a README section? It will be easier to discuss the API

@NetaNir, I see the argument both ways for feature vs fix. We're adding a new property, which resolves a bug. I chose to reference this as a fix since it was motivated by issue #10540. If you'd still prefer me to change this to a feature branch then let me know and I'll be happy to change that.

@NetaNir
Copy link
Contributor

NetaNir commented Sep 30, 2020

@rrhodes There are a lot of places we change the CloudFormation implementation in favor of a more ergonomic API, this is especially true for the CloudWatch L2. This is one is a bit tricky, which is why I wondered how the README section will look. From the CloudFormation resource perspective, the thresholdMetricId must be an id of an expression defined in Metrics. If Metrics is not provided CloudWatch will throw:

ThresholdMetricId should not be set unless list of Metrics is also set

If Metrics is provided and the thresholdMetricId does not match any of the ids on Metrics it will throw:

Metrics list must contain exactly one metric matching the ThresholdMetricId parameter

Since the CDK does not emits Metrics when creating an alarm, either by using new Alarm(...) or new Metric(...).createAlarm(...), configuring thresholdMetricId will always throw the first exception.

Metrics is created when using metricMath. This means that it only makes sense to use LESS_THAN_LOWER_OR_GREATER_THAN_UPPER_THRESHOLD on an alarm created using MathExpression(...).createAlarm(...). In that case we can extract the thresholdMetricId from the expression if LESS_THAN_LOWER_OR_GREATER_THAN_UPPER_THRESHOLD is set. Alternatively, we might create a higher abstraction for anomaly detection alarm to make it more discoverable, something like createAnomalyDetectionAlarm(), but Im not convinced this is require.

Let me know what you think. Happy to flush out the design together.

@rrhodes
Copy link
Contributor Author

rrhodes commented Oct 1, 2020

Thanks for the feedback, @NetaNir. I'll revisit this tomorrow and come back to you with a proposal - happy to work on this together to reach the best solution.

@rrhodes rrhodes changed the title fix(cloudwatch): support threshold metric id for anomaly detection alarms feat(cloudwatch): anomaly detection alarms with threshold metric id Oct 2, 2020
@mergify mergify bot dismissed NetaNir’s stale review October 3, 2020 10:59

Pull request has been modified.

@rrhodes rrhodes marked this pull request as draft October 3, 2020 11:00
@rrhodes rrhodes changed the title feat(cloudwatch): anomaly detection alarms with threshold metric id feat(cloudwatch): set threshold metric id for anomaly detection band alarms Oct 3, 2020
@rrhodes
Copy link
Contributor Author

rrhodes commented Oct 3, 2020

@NetaNir, thanks again for your feedback about MathExpression, this has been hugely valuable to better understand how the package works. I agree a higher abstraction for anomaly detection should not be required for this feature.

My current understanding is that each metric for MathExpression is assigned a unique entry ID when the metric is rendered with renderMetric. If any of the metrics apply CloudWatch's ANOMALY_DETECTION_BAND function in their expression, I want to extract this entry ID and assign it as the threshold metric ID for the metric's alarm. I've altered the renderMetric logic to achieve this.

I'd like to hear your thoughts on this approach before I proceed any further. Would you like me to define a new integration test to cover any of this logic or are you happy with the additional unit tests on their own?

NetaNir
NetaNir previously requested changes Oct 9, 2020
Copy link
Contributor

@NetaNir NetaNir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to deploy this change? Please add an integ test so we can make sure the implementation works.

If you can add a code sample to the README section we can discuss the API there

@@ -158,6 +158,11 @@ The most important properties to set while creating an Alarms are:
- `evaluationPeriods`: how many consecutive periods the metric has to be
breaching the the threshold for the alarm to trigger.

If you create an alarm for a `MathExpression` metric which applies the CloudWatch
[anomaly detection band](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Anomaly_Detection.html)
function, a threshold metric ID will be automatically assigned to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is implementation details and should no be in the README

@@ -163,6 +169,10 @@ export class Alarm extends AlarmBase {
});
}

if (metricProps.thresholdMetricId && !ANOMALY_DETECTION_COMPARISON_OPERATORS.includes(comparisonOperator)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this still here? We should not expose thresholdMetricId as it can be used only on very specific conditions

Period: 300,
Stat: 'Average',
},
ReturnData: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail on deployment. Anomaly detection based alarm requires the all metric in Metrics to have ReturnData set to true.

test.done();
},

'alarm for math expression with anomaly detection function cannot use default comparison operator'(test: Test) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to not allow it in the API.

label: conf.renderingProperties?.label,
returnData: entry.tag ? undefined : false, // Tag stores "primary" attribute, default is "true"
};
},
withExpression(expr, conf) {
return {
expression: expr.expression,
id: entry.id || uniqueMetricId(),
id: entry.id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not removed, but relocated above so I could apply this in anomalyDetectionMetricIds

@rrhodes
Copy link
Contributor Author

rrhodes commented Oct 9, 2020

Were you able to deploy this change? Please add an integ test so we can make sure the implementation works.

If you can add a code sample to the README section we can discuss the API there

Thanks for the feedback, @NetaNir. I will revisit this later today with a Readme sample to discuss further.

@gitpod-io
Copy link

gitpod-io bot commented Oct 12, 2020

@mergify mergify bot dismissed NetaNir’s stale review October 12, 2020 16:59

Pull request has been modified.

[anomaly detection](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Anomaly_Detection.html):

```ts
const invocationAnomalies = new MathExpression({
Copy link
Contributor Author

@rrhodes rrhodes Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NetaNir This is how I envisioned the new feature would work with MathExpression. This new feature appears to be considerably more work than I anticipated when I first picked it up. If it's a priority for you and / or your colleagues, I'm happy to hand this over, otherwise with your advice I can proceed. Let me know your thoughts before I make any further developments.

Copy link
Contributor

@NetaNir NetaNir Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version of the API looks ok. Since this feature is non trivial, please add an integration test so we can verify the deployment succeed.

@rrhodes rrhodes force-pushed the bugfix/issue-10540-threshold-metric-id-for-anomaly-detection-alarms branch 6 times, most recently from 0318c98 to 43bed7a Compare October 18, 2020 22:13
@rrhodes rrhodes force-pushed the bugfix/issue-10540-threshold-metric-id-for-anomaly-detection-alarms branch from 43bed7a to 2339520 Compare October 18, 2020 22:16
@rrhodes rrhodes marked this pull request as ready for review October 18, 2020 22:18
@alextriaca
Copy link

Sorry to pester. Is there any ETA on when this PR will be merged? Would be awesome to see this in master!

@rrhodes
Copy link
Contributor Author

rrhodes commented Oct 30, 2020

Hi @NetaNir, any further feedback regarding this PR?

@NetaNir
Copy link
Contributor

NetaNir commented Oct 31, 2020

You didn't te request a review, so it didn't show up in my todo. I'll have a look next week

@rrhodes
Copy link
Contributor Author

rrhodes commented Oct 31, 2020

You didn't te request a review, so it didn't show up in my today. I'll have a look next week

Apologies, I thought re-opening this from draft to "ready for review" would deliver a notification. Thanks in advance for next week's review, much appreciated.

@alextriaca
Copy link

Bump 😉

@rrhodes
Copy link
Contributor Author

rrhodes commented Nov 10, 2020

Hi @NetaNir, any additional feedback on this PR?

Will investigate build failure tomorrow.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: 69b4b89
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -140,7 +140,7 @@ export class Alarm extends AlarmBase {
/**
* This metric as an annotation
*/
private readonly annotation: HorizontalAnnotation;
private readonly annotation?: HorizontalAnnotation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this made optional?

*/
readonly threshold: number;
readonly threshold?: number;
Copy link
Contributor

@NetaNir NetaNir Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that anomaly detection based alarms does not allow to set threshold and requires setting thresholdMetricId instead.

However, making this property optional sacrifice the common use case ergonomics, in favor of the anomaly detection use case. Making this property optional means that we will need to add a synth time check that verifies: If this is an anomaly detection based alarm and threshold is set then throw an error . While it can be done, it makes the developer experience not so great.

Since there are several such subtleties in the anomaly detection alarm (e.g the allowed operators, annotation, returnData), it probably makes more sense to implement a createAnomalyDetectionAlarm.

I agree that this feature seemed trivial initially but it has many pitfalls, as such it will require more design work. Before we do another revision lets discuss the API in details, feel free to comment on this PR or create a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[aws-cloudwatch] Missing ThresholdMetricId
5 participants