Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes on ECS run task operator #31838

Merged
merged 9 commits into from
Jun 16, 2023
Merged

Conversation

vandonr-amz
Copy link
Contributor

fixing a bunch of problems around the ECS run task operator:

  • it was checking for task status even when we were not waiting for completion, which would result in a failure if the check happens fast enough because the task would still be pending
  • added a warning about trying to pull logs and not waiting for completion, which is just a weird thing to do, because logs might be truncated, they might not even be there yet. I'm assuming it was always undefined behavior, so I'm going ahead and not even starting the thread if wait for completion is false.
  • added a log when logs are not present (yet) so that if the issue persists, users have a message guiding them towards resolution rather than complete silence

The system tests were not properly configured for logs too. I added the necessary configuration, a cleanup step, and moved the sensor to ecs_fargate so that we can have the normal behavior querying logs in the vanilla ecs test.

Comment on lines 488 to 490
@staticmethod
def _get_ecs_task_id(task_arn: str) -> str:
return task_arn.split("/")[-1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this because I thought that remembering to update the ecs_task_id every time the arn gets changed was a bit brittle. This method makes the dependency arn->task_id explicit.
Yes it means we're going to recompute it each time we need it, but I think it's negligible.

# A bit brutal to delete the whole group, I know,
# but we don't have the access to the arn of the task which is used in the stream name
# and also those logs just contain "hello world", which is not very interesting.
client.delete_log_group(logGroupName=group_name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to fail if the group does not exist, so in a way it makes sure the log configuration stays correct.

Comment on lines +521 to +523
if not self.wait_for_completion:
return

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever logs users were getting for the short period of time without a wait_for_completion they will no longer get. So we're calling it a bug fix with no deprecation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, idk, we may want to keep the existing behavior, but what I don't like about it is that it made the operator slower just for the sake of maybe getting a couple of logs...
Since we were starting the thread, which slept for 30 seconds (or configured value) before checking if it was stopped, this operator would take 30 seconds to return no matter what, when the job was done in a second and a half.
It's like "don't wait for completion but still wait a bit"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree with Raph, I dont think this is a desired behavior but more an forgotten edge case. I would call it a bug fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, I'll call that quorum then, let's call it a bug fix 👍

Comment on lines +521 to +523
if not self.wait_for_completion:
return

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree with Raph, I dont think this is a desired behavior but more an forgotten edge case. I would call it a bug fix

tests/system/providers/amazon/aws/example_ecs.py Outdated Show resolved Hide resolved
@o-nikolas o-nikolas merged commit e0f21f4 into apache:main Jun 16, 2023
@vandonr-amz vandonr-amz deleted the vandonr/tests branch June 16, 2023 19:28
vandonr-amz added a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jun 23, 2023
This method is just causing trouble by handling several things, it's hiding the logic.
A bug fixed in apache#31838 was reintroduced in apache#31881 because the check that was skipped on `wait_for_completion` was not skipped anymore.

The bug is that checking the status will always fail if not waiting for completion, because obviously the task is not ready just after creation.
o-nikolas pushed a commit that referenced this pull request Jun 23, 2023
This method is just causing trouble by handling several things, it's hiding the logic.
A bug fixed in #31838 was reintroduced in #31881 because the check that was skipped on `wait_for_completion` was not skipped anymore.

The bug is that checking the status will always fail if not waiting for completion, because obviously the task is not ready just after creation.
ferruzzi pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jun 27, 2023
This method is just causing trouble by handling several things, it's hiding the logic.
A bug fixed in apache#31838 was reintroduced in apache#31881 because the check that was skipped on `wait_for_completion` was not skipped anymore.

The bug is that checking the status will always fail if not waiting for completion, because obviously the task is not ready just after creation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants