Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove TaskContextLogger #43183

Merged
merged 2 commits into from
Oct 19, 2024
Merged

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Oct 18, 2024

It was a fine idea, but we ultimately realized that it's better to just add records to the Log table, which can be viewed easily in the UI

It was a fine idea, but we ultimately realized that it's better to just add Log records.
@kaxil
Copy link
Member

kaxil commented Oct 18, 2024

It was a fine idea, but we ultimately realized that it's better to just add Log records.

What are the "Log records" you are mentioning, could you expand a little

@dstandish
Copy link
Contributor Author

It was a fine idea, but we ultimately realized that it's better to just add Log records.

What are the "Log records" you are mentioning, could you expand a little

The Log table

Also knows as eventLogs in the rest API

@dstandish dstandish added the airflow3.0:breaking Candidates for Airflow 3.0 that contain breaking changes label Oct 18, 2024
@dstandish dstandish merged commit 72f2b2e into apache:main Oct 19, 2024
80 checks passed
@dstandish dstandish deleted the remove-task-context-logger branch October 19, 2024 00:50
@@ -238,10 +237,6 @@ def set_context(self, ti: TaskInstance, *, identifier: str | None = None) -> Non
self.handler.setLevel(self.level)
return SetContextPropagate.MAINTAIN_PROPAGATE if self.maintain_propagate else None

@cached_property
def supports_task_context_logging(self) -> bool:
return "identifier" in inspect.signature(self.set_context).parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intend to keep accepting the identifier param in the set_context method above that we had introduced in https:/apache/airflow/pull/32646/files. The docstring for that param in the method is currently tightly related to the task context logger.

This param was also extended to be used in the set_context method of the provider log handlers with the below PRs

  1. AWS S3: Extend task context logging support for remote logging using AWS S3 #32950
  2. GCP GCS: Extend task context logging support for remote logging using GCS #32970
  3. Azure WASB: Extend task context logging support for remote logging using WASB (Azure Blob Storage) #32972
  4. ElasticSearch: Extend task context logging support for remote logging using Elasticsearch #32977

It seems the conditionals around AF versions in those provider PRs were removed & cleaned up when those providers started having Airflow min version as 2.8 -- in PR #42764

If we decide to drop the param, perhaps would also be a good idea to add a TODO in those providers saying that this param would no longer be needed/used when min AF version is 3.0 & can be cleaned up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. I thought about it and figured that it's ok to leave it even if it is not currently used. Unless there's some backcompat issue with leaving it but i don't think there is.

Do you see any other backcompat concerns with this PR otherwise?

Copy link
Member

@potiuk potiuk Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think we should drop it - and actually - if we are going to add documentation for Airlfow 3, I think we should promote and look more closely into open-telemetry trace integration, which is essentially doing exactly what have been done here (and more).

Open Telemetry span is equivalent to the identifier, but there is already tooling (all tools that support open-telemetry spans) that understand and are able to visualise spans across distributed systems. So I'd say we should make it clearer and explain to people on how they can connect those tools to see relations between things like errors raised in other parts of the system that are related to particular tasks.

This was a very important point raised in the debugging survey - that it is difficult to debug issues in a distributed environment, and IMHO open-telemetry and better integration with it is the answer to that need and maybe we shoudl somehow think about making it more obvious (and learn ourselves how to do it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see any other backcompat concerns with this PR otherwise?

I can't think of any backcompat issues otherwise :), just felt that we would have there an unused param and thought to highlight it once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
airflow3.0:breaking Candidates for Airflow 3.0 that contain breaking changes area:logging area:providers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants